Architecture Guide
Deep dive into Pomai Search internals.
System Overview
flowchart TB
Client --> Engine[SearchEngine]
Engine --> Impl[SearchEngine::Impl]
Impl --> Shard1[Shard 1]
Impl --> Shard2[Shard 2]
Impl --> ShardN[Shard N]
Impl --> QPool[Query Thread Pool]
Impl --> IPool[Ingest Thread Pool]
Shard Structure
Each shard is an independent search unit containing:
- VectorStore: Contiguous float buffer aligned for AVX2 SIMD.
- Index: The core search structure (Flat, HNSW, IVF).
- KeywordIndex: Inverted index for BM25/TF-IDF scoring.
- Metadata: Key → ID mapping and TTL tracking.
Indexing Implementations
1. FlatIndex
Exact search (O(N)). Brute-force scan with SIMD optimization. Best for small datasets or 100% recall requirements.
2. HnswIndex
Graph-based (O(log N)). Builds a multi-layer graph where upper layers serve as expressways to the target
neighborhood. Supports M (connectivity) and ef (beam width) tuning.
3. IvfSq8Index
Quantized Inverted File. Compresses vectors from float32 to uint8 (4x savings). Uses k-means clustering to route queries to relevant buckets.
Concurrency Model
- Query Parallelism: Queries are scattered to shards via
std::futureand gathered/merged. - Locking: Uses
std::shared_mutex. Readers are lock-free relative to each other; writers acquire exclusive locks per shard.
Design Principles
- Why Sharding? Parallelism and scalability across cores.
- Why Consistent Hashing? Deterministic routing without coordination.
- Why Native Serialization? O(N) startup time (exact memory dump) vs O(N log N) rebuilds.