CBR-S: Content-Based Routing Strategy in PomaiDB
This note formalizes Global-Centroid Shard Assignment, Routed Query Execution, and the Dual-Epoch Probe mechanism used to preserve recall during routing-table transitions.
0) Introduction
The Content-Based Routing - Sharded (CBR-S) algorithm, as conceptualized for systems like PomaiDB, is a systems-level strategy for managing and querying large-scale vector datasets in distributed environments. It targets three persistent challenges in modern vector search infrastructure: scalability (controlling shard fanout as the corpus grows), efficiency (minimizing end-to-end query latency and tail behavior), and dynamic handling (remaining robust under routing-table and topology transitions).
CBR-S builds on the general idea of content-based routing, where data characteristics directly influence placement and retrieval decisions in a distributed architecture. This perspective is widely explored in content-centric / information-centric networking and related routing designs, where naming and content locality can be leveraged to optimize retrieval paths and reduce unnecessary network-wide fanout. In particular, probe-based routing mechanisms for content-centric networks motivate the use of targeted probing to improve hit probability under uncertainty. [1]
A cornerstone of CBR-S is its global-centroid shard assignment. Using clustering methods analogous to \(k\)-means, the system maintains a global set of centroids that partition the vector space by similarity. Each incoming vector is assigned to a shard based on proximity to these centroids, promoting co-location of similar vectors and enabling query-time routing to a small subset of shards. This follows the classic objective of minimizing within-cluster squared error, making centroids optimal representatives (in the least-squares sense) of their partitions. Update-efficient content-based indexing designs in cloud environments further motivate incremental maintenance strategies for such partitions. [2]
The second component, routed query execution, avoids full fanout by probing only shards associated with the most relevant centroids. This is conceptually aligned with coarse-to-fine candidate selection patterns common in approximate nearest neighbor (ANN) systems (e.g., partition probing as in IVF). A practical refinement is probe expansion via a margin heuristic: when a query lies near a Voronoi boundary between top centroids, the algorithm expands the probe set to mitigate routing ambiguity and improve recall.
Finally, CBR-S introduces a safety-critical mechanism, the dual-epoch probe, to preserve high recall during routing-table transitions (e.g., re-sharding, centroid rebuilds). The system retains both the current routing table and an optional previous one, and unions their routed shard sets for query execution. This prevents recall collapse when data placed under an older epoch remains physically resident on shards that are not selected by the new routing configuration, aligning with fault-tolerant design principles for mutable distributed databases.
For point operations such as Get, Exists, and Delete, PomaiDB
prioritizes absolute correctness over routing efficiency by performing shard fanout (unless a
separate
directory mapping is maintained). This acknowledges a fundamental trade-off: content-based placement
can
shift over time as centroids evolve, so correctness requires either (i) a durable mapping from key
\( o\)
shard, or (ii) exhaustive search for point lookups.
Note on novelty: while individual ingredients (content-based routing, clustering, KMeans++ seeding, incremental updates, probe-based querying) are established in the literature, CBR-S emphasizes their integration as an operational contract for a sharded, mutable vector database, with dual-epoch probing as a systems-level mechanism to protect recall during topology change.
0.5) Related works
The design space behind a system like CBR-S spans multiple research threads: distributed approximate nearest neighbor (ANN) search, content-centric networking and routing, dynamic indexing and update management, and hardware-aware acceleration. This section synthesizes representative directions that inform CBR-S-style architectures.
Distributed Approximate Nearest Neighbor (ANN) search
The core problem CBR-S addresses is efficient ANN search over massive, distributed datasets. In-memory ANN structures (e.g., HNSW-family designs) can achieve excellent latency/recall on a single host, but they face practical challenges once datasets exceed single-node memory limits. One line of recent work studies how to push vector search onto disaggregated memory, leveraging remote memory (e.g., RDMA) while keeping index traversal efficient and communication overhead low. [6]
A second line focuses on partitioning and sharding as a first-class primitive: decomposing large ANN problems into smaller sub-problems by partitioning input points into neighborhood-preserving shards, and routing queries to only a subset of shards to reduce fanout and latency. Recent work on graph partitioning for large-scale nearest neighbor search explores how partition structure affects recall/efficiency trade-offs. [4] CBR-S fits in this family by using global centroids as a routing surface: vectors are assigned to shards by proximity to global centroids, and queries probe the most relevant centroid-owned shards.
Beyond graph-based and centroid-based methods, factorization-based ANN approaches (e.g., low-rank approximations for nearest neighbor search) demonstrate alternative approximation regimes that can be complementary to routing: they reduce computation and/or memory footprint by exploiting structure in the embedding space. [5]
When the corpus cannot fit in memory, disk-based and log-structured vector search systems become important. LSM-style designs aim to maintain high ingest/update throughput while supporting fast approximate search over multi-level structures, trading extra read amplification for update efficiency and operational robustness. [10]
Content-centric networking and routing
CBR-S’s content-based routing perspective connects naturally to Information-Centric Networking (ICN) / Content-Centric Networking (CCN), where retrieval is driven by content identity and locality rather than host addressing. Probe-based routing in content-centric networks motivates targeted probing policies that improve hit probability under uncertainty and partial knowledge—an idea that resonates with centroid-probing and probe-expansion heuristics in routed ANN. [1]
Dynamic updates and index management
A major systems challenge at scale is sustaining low-latency ANN search under mutable data: frequent inserts, updates, and deletions. Incremental in-place update techniques aim to reduce or avoid full index rebuilds, enabling billion-scale indices to evolve with production workloads. [9] Adaptive indexing systems further study how to maintain recall and latency under skewed, shifting query and data distributions, often by adjusting partitions and memory placement as the workload changes. [7] For centroid-based routing, the operational analogue is online centroid maintenance (e.g., incremental means) so routing surfaces track drift without expensive recomputation. Update-efficient content-based indexing in cloud settings provides additional motivation for incremental strategies that are parallel-friendly and minimize global coordination. [2]
CBR-S’s dual-epoch probe can be viewed as a topology-transition safety mechanism: during routing-table transitions (re-sharding or rebuilds), it queries the union of shard sets derived from both the current and previous epochs to reduce recall loss when physical placement lags logical routing. This is a systems-level response to the “mutable routing surface” problem in distributed vector databases.
Hardware constraints and acceleration
At billion-scale, performance is frequently bounded by memory bandwidth and data movement rather than raw compute. Hardware-aware work explores multiple paths: GPU-oriented vector data management for sophisticated filtered queries, [8] and emerging processing-in-memory (PIM) approaches that attempt to mitigate bandwidth bottlenecks by pushing parts of ANN computation closer to memory. [3] These directions inform practical engineering choices in systems like PomaiDB, where routing and candidate selection are often designed to reduce memory traffic and preserve tail latency under realistic hardware limits.
Takeaway: CBR-S is best understood as a synthesis—routing surfaces (global centroids) to control fanout, adaptive probing to stabilize recall, epoch-aware safety to survive topology changes, and hardware-aware implementation to reduce data movement.
1) Problem setting and notation
Let the system have \(S \in \mathbb{N}\) shards, indexed by \( \{0,\dots,S-1\} \). Vectors lie in \( \mathbb{R}^d \) and the database stores items \(x \in \mathbb{R}^d\). A routing table (an epoch) consists of:
- \(k\) global centroids \( \mu_0,\dots,\mu_{k-1} \in \mathbb{R}^d\)
- an ownership map \( \pi: \{0,\dots,k-1\} \rightarrow \{0,\dots,S-1\}\)
- per-centroid counts \( n_g \in \mathbb{N}\)
- an epoch identifier \( e \in \mathbb{N}\)
Routing uses squared Euclidean distance: \[ D(x,\mu_g) = \|x-\mu_g\|_2^2 = \sum_{i=1}^{d}(x_i-\mu_{g,i})^2. \]
2) Global-Centroid shard assignment
2.1 Centroid ownership
CBR-S assigns each centroid to a shard by a deterministic modulo rule: \[ \pi(g) = g \bmod S. \] This gives stable ownership and roughly uniform distribution of centroids across shards.
2.2 Content-based ingest routing
For an incoming vector \(x\), select its nearest centroid: \[ g^*(x)=\arg\min_{g \in \{0,\dots,k-1\}} D(x,\mu_g), \] and route to shard: \[ \mathrm{shard}(x) = \pi(g^*(x)). \] Intuition: semantically similar vectors tend to fall into nearby Voronoi cells, hence tend to co-locate (modulo centroid ownership).
3) Warmup → READY lifecycle (centroid construction)
The router proceeds through modes: DISABLED (fallback), WARMUP (collect samples), then READY (routing table available).
3.1 Warmup reservoir size
Let the routing centroid count be \(k_r\) (often \(k_r = 2S\)) and a warmup multiplier \(m\). The warmup sample reservoir size is: \[ M = k_r \cdot m. \] The system collects (up to) the first \(M\) vectors before building centroids.
3.2 Deterministic kmeans-lite (KMeans++ seeding + Lloyd)
Given warmup samples \( \{x_1,\dots,x_M\} \), centroids are constructed by:
- KMeans++-style seeding (deterministic seed)
- A fixed number of Lloyd iterations
Assignment step:
\[ a(i)=\arg\min_{g} D(x_i,\mu_g) \]Update step:
\[ \mu_g \leftarrow \frac{1}{|\{i: a(i)=g\}|}\sum_{i: a(i)=g} x_i \]Counts are set by \(n_g \leftarrow |\{i: a(i)=g\}|\), and ownership by \(\pi(g)=g\bmod S\). The resulting routing table becomes epoch \(e\) and is persisted.
4) Online centroid maintenance during ingest
After READY, centroids are updated online on each routed insert. For a new vector \(x\), let \(g^*=\arg\min_g D(x,\mu_g)\). Then update:
- Count increment: \(\; n_{g^*} \leftarrow n_{g^*} + 1\)
- Incremental mean: \[ \mu_{g^*} \leftarrow \mu_{g^*} + \frac{1}{n_{g^*}}(x - \mu_{g^*}) \]
This maintains \(\mu_{g^*}\) as the empirical mean of all assigned vectors over time.
5) Routed query execution (fanout reduction)
Instead of searching all shards, a query \(q\) computes the top-\(p\) nearest centroids and searches only the shards owning those centroids.
5.1 Probe count
Let \(p\) be the configured probe count (clamped): \[ p \leftarrow \max(1,\min(p,k)). \]
5.2 Boundary (margin) heuristic
Let \(g_1,g_2\) be the closest two centroids with distances \(d_1=D(q,\mu_{g_1})\) and \(d_2=D(q,\mu_{g_2})\). If the margin is small: \[ (d_2 - d_1) < \varepsilon, \] with \(\varepsilon\) a small constant (e.g., \(0.05\)), then expand probes: \[ p \leftarrow \max(p,3)\quad \text{(if } k \ge 3 \text{)}. \] Intuition: near Voronoi boundaries, routing uncertainty increases.
5.3 Shard set from top-\(p\) centroids
Let \(G_p(q)\) be the set of the \(p\) nearest centroids: \[ G_p(q) = \mathrm{TopP}_g(D(q,\mu_g)). \] The routed shard set is: \[ \mathcal{S}_{cur}(q) = \{\pi(g): g \in G_p(q)\}. \] Duplicate shard IDs are removed before executing shard-local searches and merging top-\(K\).
6) Dual-Epoch probe (topology transition safety)
Routing tables can change (e.g., rebuilds, topology transitions). During these windows, vectors may still be physically located according to the previous epoch’s routing. If routing uses only the current epoch, recall can drop.
6.1 Two routing tables in memory
Maintain current table \(T^{(e)}\) and (optionally) previous table \(T^{(e-1)}\). For a query \(q\), compute routed shard sets in both epochs:
Current epoch:
\[ \mathcal{S}_{cur}(q) = \{\pi^{(e)}(g): g \in G^{(e)}_p(q)\}. \]Previous epoch:
\[ \mathcal{S}_{prev}(q) = \{\pi^{(e-1)}(g): g \in G^{(e-1)}_p(q)\}. \]Union for execution:
\[ \mathcal{S}(q) = \mathcal{S}_{cur}(q)\ \cup\ \mathcal{S}_{prev}(q). \]This increases fanout only during transitions while guarding recall against stale physical placement.
6.2 Recall intuition
Let \(X\) denote the true nearest neighbors of \(q\). A miss occurs if the shard containing some \(x\in X\) is not in \(\mathcal{S}(q)\). Dual-epoch routing is never worse than current-only routing because: \[ \mathcal{S}_{cur}(q) \subseteq \mathcal{S}(q). \] It is strictly better when relevant items are still reachable only under \(T^{(e-1)}\).
7) Point operations correctness note
If ingest is routed by content, shard location is not derivable from ID alone. To preserve correctness for point lookups (Get/Exists/Delete), the system may fanout across shards unless it maintains a separate directory mapping (vector → shard).
9) Future work and extensions
While CBR-S already combines content-based routing, centroid-driven sharding, and epoch-aware safety, several directions offer promising opportunities to further improve scalability, adaptability, and robustness. These extensions draw on recent advances in vector search, distributed systems, and machine learning.
9.1 Adaptive global-centroid shard assignment
The current design initializes centroids using KMeans++-style seeding followed by Lloyd iterations, and maintains them online via incremental mean updates. Although this approach is simple and stable, it assumes relatively smooth distributional drift. Future work could explore continuously adaptive clustering techniques that respond more aggressively to non-stationary data streams.
One direction is to incorporate online clustering methods that explicitly model concept drift, allowing centroid positions (and potentially the number of centroids) to adapt without explicit reinitialization. Adaptive indexing systems such as Quake demonstrate how hierarchical partitioning can be reshaped under skewed, evolving workloads. Similar ideas could be applied to routing surfaces, enabling centroid hierarchies that expand, contract, or split as the data distribution evolves.
Beyond k-means, density-aware or adaptive variants (e.g., high-dimensional adaptations of DBSCAN or adaptive k-means) could allow CBR-S to represent non-spherical or unevenly populated regions of the vector space more faithfully, reducing routing error for complex embeddings.
9.2 Learning-assisted routed query execution
Routed query execution currently relies on a margin heuristic to expand probes when a query lies near a Voronoi boundary. This heuristic could be generalized using data-driven models that predict routing uncertainty directly from query features and historical outcomes.
For example, a lightweight reinforcement learning agent could dynamically tune the probe count
(routing_probe) to optimize the recall–latency trade-off under different workload
conditions. Query-conditioned routing policies could also prioritize shards based on learned
relevance scores rather than purely geometric proximity.
Another extension is tighter integration with filtered or attribute-aware search. GPU-oriented systems such as VecFlow show that combining vector similarity with predicate filtering can unlock richer query semantics. Incorporating such filters into the routing layer would allow CBR-S to prune shards that are unlikely to satisfy both vector and attribute constraints.
9.3 Beyond dual-epoch probing
The dual-epoch probe ensures recall safety during routing-table transitions by querying both current and previous epochs. While effective, this approach doubles routing state in the worst case. Future designs could explore delta-based epoch management, where only the differences between successive routing tables are retained.
More generally, predictive or proactive transition mechanisms could precompute shard assignments for upcoming epochs, reducing query overhead during re-sharding. Lightweight consistency protocols might also offer recall guarantees without explicitly maintaining two full routing tables.
9.4 Hardware-aware and heterogeneous deployments
At billion-scale, vector search is often limited by memory bandwidth rather than compute. Emerging hardware trends—such as processing-in-memory (PIM), heterogeneous memory hierarchies, and disaggregated memory accessed via RDMA—open new opportunities for routing-aware optimization.
Systems like MemANNS and d-HNSW demonstrate that pushing parts of ANN computation closer to memory can substantially reduce data movement. For CBR-S, tighter coupling between routing decisions and physical data placement across heterogeneous tiers (DRAM, persistent memory, SSD, remote memory) could further reduce tail latency and improve energy efficiency.
9.5 Robustness to out-of-distribution queries
Modern AI workloads frequently encounter out-of-distribution queries or abrupt shifts in embedding geometry. Future work could integrate anomaly detection into the routing layer, identifying queries that fall far from known centroid regions and triggering broader probing or fallback strategies.
Such mechanisms would allow CBR-S to degrade gracefully under distributional shock, prioritizing correctness and recall when assumptions about locality temporarily break down.
Overall, these directions highlight how CBR-S can evolve from a static routing strategy into a more adaptive, learning-informed, and hardware-aware substrate for large-scale vector databases.
8) Summary
CBR-S uses global centroids and deterministic ownership \(\pi(g)=g\bmod S\). Ingest routes each vector to the shard owning its nearest centroid and updates the centroid online. Queries route to shards owning the top-\(p\) closest centroids, with a boundary heuristic to expand probes. During topology transitions, Dual-Epoch Probe unions shard sets across current and previous epochs to protect recall.
References
- Tsai, P.-H., Zhang, J., & Tsai, M.-H. (2021). An Efficient Probe-based Routing for Content-Centric Networking. arXiv. DOI: 10.48550/ARXIV.2109.15088
- Zhu, N., Lu, Y., He, W., Yu, H., & Ge, J. (2018). Towards Update-Efficient and Parallel-Friendly Content-Based Indexing Scheme in Cloud Computing. International Journal of Semantic Computing, 12(02), 191–213. DOI: 10.1142/S1793351X1840010X
- Chen, S., Zhou, A. C., Shi, Y., Li, Y., & Yao, X. (2024). UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture (v2). arXiv. DOI: 10.48550/ARXIV.2410.23805
- Gottesbüren, L., Dhulipala, L., Jayaram, R., & Lacki, J. (2024). Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search (v1). arXiv. DOI: 10.48550/ARXIV.2403.01797
- Jääsaari, E., Hyvönen, V., & Roos, T. (2024). LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search (v1). arXiv. DOI: 10.48550/ARXIV.2410.18926
- Liu, Y., Fang, F., & Qian, C. (2025). Efficient Vector Search on Disaggregated Memory with d-HNSW (v1). arXiv. DOI: 10.48550/ARXIV.2505.11783
- Mohoney, J., Sarda, D., Tang, M., Chowdhury, S. R., Pacaci, A., Ilyas, I. F., Rekatsinas, T., & Venkataraman, S. (2025). Quake: Adaptive Indexing for Vector Search (v2). arXiv. DOI: 10.48550/ARXIV.2506.03437
- Xi, J., Mo, C., Karsin, B., Chirkin, A., Li, M., & Zhang, M. (2025). VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs (v1). arXiv. DOI: 10.48550/ARXIV.2506.00812
- Xu, Y., Liang, H., Li, J., Xu, S., Chen, Q., Zhang, Q., Li, C., Yang, Z., Yang, F., Yang, Y., Cheng, P., & Yang, M. (2024). SPFresh: Incremental In-Place Update for Billion-Scale Vector Search. arXiv. DOI: 10.48550/ARXIV.2410.14452
- Zhong, S., Mo, D., & Luo, S. (2025). LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search (v1). arXiv. DOI: 10.48550/ARXIV.2505.17152
Citation
If you found this survey useful, please cite as:
Cite This Work
If you find this survey useful in your research, please cite it using the following BibTeX entry:
@article{Van2026CBRS,
title = {CBR-S: Content-Based Routing Strategy in PomaiDB for Scalable Vector Search},
author = {Quan Van},
journal = {Pomai Research},
year = {2026},
month = {Feb},
url = {https://pomaidb-web.vercel.app/research/cbr-s.paper-style.html}
}
APA Format:
Van, Q. (2026). CBR-S: Content-Based Routing Strategy in PomaiDB for Scalable Vector
Search.
Pomai Research.