Architecture Guide

Deep dive into Pomai Search internals.

System Overview

flowchart TB Client --> Engine[SearchEngine] Engine --> Impl[SearchEngine::Impl] Impl --> Shard1[Shard 1] Impl --> Shard2[Shard 2] Impl --> ShardN[Shard N] Impl --> QPool[Query Thread Pool] Impl --> IPool[Ingest Thread Pool]

Shard Structure

Each shard is an independent search unit containing:

VectorStore: Contiguous float buffer aligned for AVX2 SIMD.
Index: The core search structure (Flat, HNSW, IVF).
KeywordIndex: Inverted index for BM25/TF-IDF scoring.
Metadata: Key → ID mapping and TTL tracking.

Indexing Implementations

1. FlatIndex

Exact search (O(N)). Brute-force scan with SIMD optimization. Best for small datasets or 100% recall requirements.

2. HnswIndex

Graph-based (O(log N)). Builds a multi-layer graph where upper layers serve as expressways to the target neighborhood. Supports M (connectivity) and ef (beam width) tuning.

3. IvfSq8Index

Quantized Inverted File. Compresses vectors from float32 to uint8 (4x savings). Uses k-means clustering to route queries to relevant buckets.

Concurrency Model

Query Parallelism: Queries are scattered to shards via std::future and gathered/merged.
Locking: Uses std::shared_mutex. Readers are lock-free relative to each other; writers acquire exclusive locks per shard.

Design Principles

Why Sharding? Parallelism and scalability across cores.
Why Consistent Hashing? Deterministic routing without coordination.
Why Native Serialization? O(N) startup time (exact memory dump) vs O(N log N) rebuilds.