← Back to Hub

Benchmarking & Quality

PomaiDB benchmarks exist to earn trust, not to market speed. This document covers our trust model, recall methodology, and the latest live results.


1. Benchmark Trust Model

We strictly define what PomaiDB guarantees versus what it does not.

Guarantees vs Benchmarks

Guarantee What it means Benchmark Enforcement
Correctness Approximate search results match a brute-force oracle. Recall Correctness Recall@1/10/100 must be ≥ 0.94.
Tail latency Latency under mixed load is visible and does not hide p999. Mixed Load Tail p50/p95/p99/p999 reported.
Crash safety Data is not lost after SIGKILL during ingest. Crash Recovery Recovered count validated.

2. Recall & Methodology

PomaiDB targets Recall@10 >= 0.95 for production search workloads.

Search Implementation

Tuning Advice


3. Benchmarking Guide

Quick Start

# Trust benchmarks (Standard Scripts)
./scripts/pomai-bench recall
./scripts/pomai-bench mixed-load
./scripts/pomai-bench crash-recovery

# Comprehensive benchmark (Build from source)
cmake --build build --target comprehensive_bench
./build/comprehensive_bench --dataset small

Dataset Sizes

Size Vectors Dimensions Queries
small 10,000 128 1,000
medium 100,000 256 5,000
large 1,000,000 768 10,000

Interpreting Results

Good Performance Baseline (Medium Dataset):


4. Live Results (CBR-S Suite)

Performance metrics from the latest CI run. Comparing Fanout (Baseline) vs CBR-S (Smart Routing).

Tail Latency
P99 / P999
microseconds, not vibes
Quality
Recall@k
vs brute-force oracle
Scaling
QPS & RSS
throughput + memory
Small Uniform dim=128 n=60k q=800 topk=10 shards=4
Mode Recall@10 P99 (µs) Query QPS Shards Visited Verdict
fanout 1.0000 3,740.5 103.8 4.00 PASS
cbrs 1.0000 8,390.8 71.1 1.06 PASS
cbrs_no_dual 1.0000 8,488.5 70.9 1.06 PASS
Medium Clustered dim=256 n=150k q=800 topk=10 shards=4
Mode Recall@10 P99 (µs) Query QPS Shards Visited Verdict
fanout 1.0000 13,579.1 30.2 4.00 PASS
cbrs 1.0000 22,433.8 25.3 1.70 PASS
cbrs_no_dual 1.0000 22,626.6 25.2 1.70 PASS
Large Clustered dim=256 n=400k q=400 topk=10 shards=8
Mode Recall@10 P99 (µs) Query QPS Shards Visited Verdict
fanout 1.0000 29,113.1 11.1 8.00 PASS
cbrs 1.0000 58,733.7 10.3 2.00 PASS
cbrs_no_dual 1.0000 63,280.6 10.6 2.00 PASS
High Dim (Top 1) dim=512 n=200k q=400 topk=1 shards=4
Mode Recall@1 P99 (µs) Query QPS Shards Visited Verdict
fanout 0.1000 30,522.0 14.5 4.00 PASS
cbrs 0.1000 73,674.3 9.4 1.87 PASS
cbrs_no_dual 0.1000 68,675.5 9.6 1.87 PASS
Overlap Hard dim=256 n=120k q=700 topk=10 shards=4
Mode Recall@10 P99 (µs) Query QPS Shards Visited Verdict
fanout 1.0000 11,061.3 37.4 4.00 PASS
cbrs 1.0000 25,408.4 25.1 1.74 PASS
cbrs_no_dual 1.0000 25,199.8 25.1 1.74 PASS
Epoch Drift Hard dim=256 n=120k q=800 topk=10 shards=4
Mode Recall@10 P99 (µs) Query QPS Shards Visited Verdict
fanout 1.0000 17,004.5 31.5 4.00 PASS
cbrs 0.9703 25,772.1 28.1 1.00 PASS
cbrs_no_dual 0.5000 24,737.5 28.5 1.00 WARN

Interpretation Guide

Good signs

  • P99 stays close to P50 (no tail explosions).
  • QPS increases with threads until expected saturation.
  • Recall@10 stable across dataset modes and routing epochs.
  • routed_shards_avg drops significantly vs shard count, without recall loss.

Red flags

  • P99 ≫ P50 (10x+) — tail variance, often contention or IO.
  • QPS decreases with more threads — thrash or locks.
  • Recall drops in overlap/epoch drift — routing or snapshot semantics issues.