Benchmarking & Quality

PomaiDB benchmarks exist to earn trust, not to market speed. This document covers our trust model, recall methodology, and the latest live results.

1. Benchmark Trust Model

We strictly define what PomaiDB guarantees versus what it does not.

Guarantees vs Benchmarks

Guarantee	What it means	Benchmark	Enforcement
Correctness	Approximate search results match a brute-force oracle.	Recall Correctness	Recall@1/10/100 must be ≥ 0.94.
Tail latency	Latency under mixed load is visible and does not hide p999.	Mixed Load Tail	p50/p95/p99/p999 reported.
Crash safety	Data is not lost after SIGKILL during ingest.	Crash Recovery	Recovered count validated.

2. Recall & Methodology

PomaiDB targets Recall@10 >= 0.95 for production search workloads.

Search Implementation

Indexing: IvfCoarse with KMeans++ initialization. Vectors are buffered until sufficient samples are collected for robust training.
Search: Query top nprobe centroids, then perform exact scan (SIMD Dot Product) on candidate buckets.
Tie-Breaking: Uses VectorId (ascending) to guarantee determinism.

Tuning Advice

Low Recall? Increase nprobe to search more buckets. This trades latency for accuracy.
High Latency? Reduce nprobe or increase nlist (more clusters, smaller buckets).

3. Benchmarking Guide

Quick Start

# Trust benchmarks (Standard Scripts)
./scripts/pomai-bench recall
./scripts/pomai-bench mixed-load
./scripts/pomai-bench crash-recovery

# Comprehensive benchmark (Build from source)
cmake --build build --target comprehensive_bench
./build/comprehensive_bench --dataset small

Dataset Sizes

Size	Vectors	Dimensions	Queries
small	10,000	128	1,000
medium	100,000	256	5,000
large	1,000,000	768	10,000

Interpreting Results

Good Performance Baseline (Medium Dataset):

✅ P99 latency < 1ms
✅ Throughput > 5K QPS
✅ Recall@10 > 0.90

4. Live Results (CBR-S Suite)

Performance metrics from the latest CI run. Comparing Fanout (Baseline) vs CBR-S (Smart Routing).

Tail Latency

P99 / P999

microseconds, not vibes

Quality

Recall@k

vs brute-force oracle

Scaling

QPS & RSS

throughput + memory

Small Uniform dim=128 n=60k q=800 topk=10 shards=4

Mode	Recall@10	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	1.0000	3,740.5	103.8	4.00	PASS
cbrs	1.0000	8,390.8	71.1	1.06	PASS
cbrs_no_dual	1.0000	8,488.5	70.9	1.06	PASS

Medium Clustered dim=256 n=150k q=800 topk=10 shards=4

Mode	Recall@10	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	1.0000	13,579.1	30.2	4.00	PASS
cbrs	1.0000	22,433.8	25.3	1.70	PASS
cbrs_no_dual	1.0000	22,626.6	25.2	1.70	PASS

Large Clustered dim=256 n=400k q=400 topk=10 shards=8

Mode	Recall@10	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	1.0000	29,113.1	11.1	8.00	PASS
cbrs	1.0000	58,733.7	10.3	2.00	PASS
cbrs_no_dual	1.0000	63,280.6	10.6	2.00	PASS

High Dim (Top 1) dim=512 n=200k q=400 topk=1 shards=4

Mode	Recall@1	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	0.1000	30,522.0	14.5	4.00	PASS
cbrs	0.1000	73,674.3	9.4	1.87	PASS
cbrs_no_dual	0.1000	68,675.5	9.6	1.87	PASS

Overlap Hard dim=256 n=120k q=700 topk=10 shards=4

Mode	Recall@10	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	1.0000	11,061.3	37.4	4.00	PASS
cbrs	1.0000	25,408.4	25.1	1.74	PASS
cbrs_no_dual	1.0000	25,199.8	25.1	1.74	PASS

Epoch Drift Hard dim=256 n=120k q=800 topk=10 shards=4

Mode	Recall@10	P99 (µs)	Query QPS	Shards Visited	Verdict
fanout	1.0000	17,004.5	31.5	4.00	PASS
cbrs	0.9703	25,772.1	28.1	1.00	PASS
cbrs_no_dual	0.5000	24,737.5	28.5	1.00	WARN

Interpretation Guide

Good signs

P99 stays close to P50 (no tail explosions).
QPS increases with threads until expected saturation.
Recall@10 stable across dataset modes and routing epochs.
routed_shards_avg drops significantly vs shard count, without recall loss.

Red flags

P99 ≫ P50 (10x+) — tail variance, often contention or IO.
QPS decreases with more threads — thrash or locks.
Recall drops in overlap/epoch drift — routing or snapshot semantics issues.