A Comprehensive Survey of Vector Databases and Similarity Search Systems

Scope, Definitions, and Survey Method

Goals and Scope (Part A — survey report)

This report systematically surveys the field of vector databases and vector systems, including similarity search, ANN/ANNS (approximate nearest neighbor search), and embedding stores. The scope covers: (i) theoretical foundations of k-nearest neighbor (kNN) and ANNS, (ii) index structures and search algorithms (exact vs approximate; hashing, trees, quantization, graphs, disk-based indexes), (iii) scalability and distributed/cloud-managed architectures, (iv) durability, consistency, and dynamic/real-time updates, (v) integration with ML/LLM pipelines (hybrid search, RAG, multimodal retrieval), and (vi) safety–privacy–fairness and legal/ethical risks. [47] [48] [35]

Working Definitions and Concept Boundaries

A vector system is any software system that supports: (1) storing/reading vectors (embeddings), and (2) executing kNN/ANNS queries under a distance metric or similarity (L2, cosine, inner product/dot product, etc.), typically with index optimizations to avoid full linear scans. [37] [30]

A vector database (vector DB / VDBMS) is broader than a vector index library: beyond ANN indexes, it provides DBMS-grade data management such as CRUD/upsert/delete, metadata filtering, partitions/namespaces, access control, horizontal scaling, durability (persistence), and distributed operations (replication, fault tolerance). This distinction aligns with recent VDBMS surveys and vendor technical documentation. [47] [12]

Survey Methodology and Selection Criteria

We use a “systematic narrative survey” approach with a default time window of 2000–2025, covering the rapid evolution of ANNS from algorithms (LSH, PQ, HNSW, DiskANN) to systems integration (VDBMS, vector-enabled DBs, serverless filtering, filtered vector search). Earlier foundations (k-d tree, metric trees/vp-tree) are included as historical anchors. [1] [3] [4] [48]

Primary sources: arXiv, ACL Anthology, NeurIPS proceedings, PVLDB, official project documentation (GitHub/Docs), and benchmark sites (ANN-Benchmarks, BigANN). [10] [11] [12]
Query keywords (examples): “vector database management system”, “approximate nearest neighbor search”, “HNSW”, “product quantization”, “filtered vector search”, “disk-based ANN”, “hybrid retrieval”, “serverless vector database”, “metadata filtering”, “RAG retrieval augmented generation”, “multimodal retrieval CLIP”. [35] [37] [48]
Selection criteria: prioritize primary/seminal papers and official system docs; for commercial systems, prefer official whitepapers, vendor research papers, or peer-reviewed publications by the providers when available. [25]
Limitations and transparency: some digital libraries may restrict access; when necessary, we use preprints/mirrors. If a trustworthy, primary benchmark number is unavailable, we mark it as “unknown” and explain why. [11]

Theoretical Foundations and Index Taxonomy

Problem Definition and Similarity Metrics

For a query vector q, kNN/ANNS typically returns top-k points x that minimize distance d(q, x) (e.g., L2) or maximize similarity (dot/inner product). Cosine similarity is commonly reduced to dot product after L2-normalization. [30] [7]

From Exact NN to Approximate NN (ANN/ANNS)

In high dimensions and at large scale, exact NN often becomes impractical, motivating ANNS that trades accuracy for speed and memory. A major theoretical branch uses locality-sensitive hashing (LSH) with approximation guarantees under certain assumptions. [4] [5]

Historical Milestones (Selected)

Trees / partitioning: k-d tree (axis-aligned splits), ball tree (hyperspheres), vp-tree/metric trees for general metric spaces. [1] [2] [3]
LSH: foundational theory for approximate NN, including p-stable LSH for l_p (notably l_2). [4] [5]
Quantization: Product Quantization (PQ) as a key milestone for compressing vectors and accelerating approximate distances. [6]
Graph-based: HNSW as a widely deployed multi-layer proximity graph index due to strong recall–latency behavior and incremental inserts. [7]
Disk-based ANN: DiskANN demonstrates high accuracy at low latency and high throughput at billion scale on SSDs. [14]

Algorithm / Index Taxonomy by Core Mechanism

Exact scan / exact indexes (baseline): linear scan; exact trees in low dimensions. [1]
Hashing-based: LSH (including p-stable). Pros: theoretical guarantees in some models; cons: many tables, memory-heavy, can underperform on modern embedding distributions. [4][5]
Tree-based / space partitioning: k-d tree, ball tree, vp-tree; often strong in low-to-mid dimensions or structured data. [1][2][3]
Quantization-based (PQ / IVF-PQ / scalar quantization): short codes and lookup tables, good for memory compression and speed. [6][8]
Graph-based (HNSW and variants): excellent empirical performance with tunable parameters (M, efConstruction, efSearch). [7]
Disk / out-of-core: DiskANN/SSD-optimized graphs for billion-scale retrieval with modest RAM footprints. [14]
Hybrid & filtered vector search: combine metadata filtering and/or lexical retrieval with vector search; a major current research bottleneck. [48]

System Architecture and Vector Data Management

From Index Libraries to Vector DBMS (VDBMS)

The ecosystem commonly falls into three layers: (i) libraries (algorithms + index APIs), (ii) engines/services (vector search service runtime), (iii) VDBMS or vector-enabled databases (durability, distribution, governance, security). Recent surveys emphasize a shift from “algorithm-centric” to “system-centric” driven by LLM/RAG demand. [47] [35]

Typical Cloud-Native VDBMS Architecture

Storage/compute separation and componentization (ingest nodes, query nodes, index builders, metadata store, object storage/WAL) are common patterns for scaling and fault tolerance. [15] [17]
Dynamic updates and reindexing: many systems use segment-based designs (growing vs sealed), asynchronous index building, and sometimes temporary indexes for “query immediately after ingest”. [15]

Examples (Academic–Industrial)

Purpose-built VDBMS papers report dynamic updates and real-time queries with snapshot-like consistency, separating query execution, GPU acceleration, and storage, with distributed scaling. [15]
Serverless commercial research highlights metadata filtering as a central algorithm+systems challenge in serverless vector databases. [25] [48]
Vector search inside traditional search engines commonly “co-resides” with inverted indexes (BM25), enabling practical hybrid retrieval. [21] [22] [23]

Evaluation, Datasets, and Comparative Analysis

Core Metrics

Recall@k / Recall@r: fraction of queries where true NN appears in top-k/top-r. [10]
Latency (p50/p95/p99) and Throughput/QPS: online vs batch behavior. [12]
Index build time and ingest/update cost: critical under streaming updates. [15]
Memory footprint / index size / disk usage: operational cost and scaling. [14]
Cost per query (managed/serverless): compute + storage + network; BigANN emphasizes cost/power normalization. [12]

Benchmark Methodology and Reproducibility

ANN-Benchmarks standardizes quality–performance evaluation across many ANN libraries. [10]
A reproducibility protocol recommends publishing full configs/hardware/versions to reduce bias from parameter/hardware differences. [11]
BigANN targets billion-scale workloads with defined datasets, reference hardware, and QPS–recall measurement. [12]

Recommended Datasets (by common practice)

BIGANN/SIFT1B: billion-scale SIFT descriptors used by BigANN. [12]
SIFT1M: smaller-scale SIFT commonly used in ANN-Benchmarks. [10]
Deep1B: billion-scale deep descriptors (CVPR 2016). [13]
GloVe embeddings: standard word embeddings (ANN-Benchmarks includes glove-100-angular variants). [31][10]
MS MARCO: QA/ranking dataset used in dense retrieval and RAG pipelines. [32][36]
LAION-400M: large multimodal dataset (CLIP embeddings widely used for multimodal retrieval). [33][37]
ImageNet features: a common vision benchmark foundation. [34]

Integration Patterns and Safety/Privacy

Hybrid Search (BM25 + Vectors)

BM25 remains a lexical retrieval baseline; dense retrieval can outperform BM25 on some tasks and is commonly used as the retriever in RAG. Sparse neural retrieval methods (e.g., SPLADE) offer another route toward hybrid retrieval (dense + sparse). [38] [36] [39]

RAG (Retrieval-Augmented Generation)

RAG combines parametric memory (a generative model) with non-parametric memory (a dense vector index) to improve factuality and updatability. This makes vector databases a core infrastructure component for many GenAI applications, especially when provenance (evidence) matters. [35]

Multimodal Retrieval

CLIP provides a shared embedding space for text and images; LAION-400M releases data that supports large-scale multimodal similarity search. [37] [33]

Streaming Ingestion and Real-Time Updates

Real systems often separate index building from serving queries and use log/WAL-based ingestion to support recovery and shard-level consistency. [15] [17]

Security, Privacy, and Fairness

Encryption and access control: production deployments typically require transport encryption (TLS), at-rest encryption (e.g., AES-256), and RBAC/API keys—important because embeddings may encode sensitive information. [26] [27] [29]
Privacy inference risks: membership inference and model inversion show that models/outputs can leak information about training data. In vector DB contexts, embeddings can also leak membership or sensitive attributes; access control and leakage mitigation should be considered. [42] [43]
Differential Privacy (DP): a foundational framework for limiting individual data leakage; DP-SGD adapts DP to deep learning training. [40] [41]
Bias/fairness in embeddings: work on word embedding bias shows embeddings can reflect or amplify social bias, creating risks in high-stakes use cases. [44]
Legal frameworks: GDPR imposes personal data protections; the EU AI Act introduces risk-tiered AI rules that can affect storage, retrieval, and auditability in AI systems using vector DBs. [45] [46]

Open Challenges and Research Directions

Benchmark vs real workload gap: dynamic data, metadata filtering, and mixed structured+vector workloads can drastically change trade-offs compared to “pure vector” benchmarks. [48] [47]
Theory guarantees vs engineering knobs: LSH offers theoretical approximation factors, but many production systems pick HNSW/graph/quantization for empirical performance; bridging formal guarantees to parameters like ef, M, PQ code length remains an open area. [4] [7] [6]
Index maintenance under high update rates: deletes/updates in graph indexes cause fragmentation and quality drift, motivating compaction/rebuild strategies and better dynamic algorithms. [7] [15]
Energy/cost-aware ANNS: move from recall–latency to quality–latency–cost–energy optimization. [12]
Explainability/provenance for embedding retrieval: harder to explain “why retrieved” than lexical methods; RAG adds pressure for provenance and auditable retrieval. [35] [38]
Benchmark standardization: lacks unified, widely adopted protocols for filtered search, streaming updates, multi-vector queries, multimodal, and cloud cost. [10] [11] [12]

Proposed Title

Vector Databases and Similarity Search Systems: A Systematic Survey, Taxonomy, Comparative Analysis, and Open Challenges for Modern AI Infrastructure [47][35]

Abstract

Vector databases and similarity search systems have become core infrastructure for modern AI applications such as semantic search, recommendation, and especially Retrieval-Augmented Generation (RAG). Rapid progress in this area is driven by algorithmic advances (LSH, product quantization, HNSW, SSD-based indexes like DiskANN) and systems advances (cloud-native VDBMS, vector-in-RDBMS/search engines, efficient metadata filtering, serverless operation). This paper presents: (i) a systematic narrative survey for 2000–2025, (ii) a unified taxonomy across algorithms, system architectures, and application patterns, (iii) qualitative and semi-quantitative comparative analysis grounded in primary literature and public benchmarks, and (iv) open challenges in theoretical guarantees, streaming updates, filtered vector search, benchmarking standardization, and security–privacy–fairness. [47] [10] [12] [48]

Keywords

vector database; approximate nearest neighbor search; HNSW; product quantization; filtered vector search; hybrid retrieval; RAG; multimodal retrieval; benchmarking; privacy. [7] [6] [48] [35] [10]

Proposed Figures (Embed-Ready Blocks)

Figure A — Timeline (Mermaid)

timeline
  title Milestones in vector indexing and vector systems (selected)
  1970s : k-d tree (Bentley)
  1980s : Ball tree (Omohundro)
  1990s : VP-tree / metric trees (Yianilos)
  1998  : LSH foundations (Indyk–Motwani)
  2004  : p-stable LSH (Datar et al.)
  2011  : Product Quantization (PQ)
  2016  : HNSW (preprint); ANN-Benchmarks emerges
  2017  : FAISS GPU similarity search (preprint)
  2019  : DiskANN (NeurIPS)
  2020  : RAG; DPR; vector search inside search engines
  2021  : Milvus (SIGMOD) — purpose-built VDBMS
  2024  : VDBMS survey; filtered vector search accelerates
  2025  : Filtering/metadata research + industrial scaling

Figure B — Taxonomy (Mermaid)

flowchart TB %% Định nghĩa Style Science classDef default fill:#fff,stroke:#1a1a1a,stroke-width:1px,color:#1a1a1a,font-family:monospace; classDef primary fill:#fff,stroke:#0055ff,stroke-width:2px,color:#0055ff,font-weight:bold; A["Vector Search Problem"]:::primary --> B1["Exact"] A --> B2["Approximate (ANNS)"] B1 --> T1["Scan / Exact Index"] B1 --> T2["Low-dim Trees"] B2 --> C1["Hashing-based"] B2 --> C2["Tree / Partition-based"] B2 --> C3["Quantization-based"] B2 --> C4["Graph-based"] B2 --> C5["Disk / Out-of-core"] B2 --> C6["Filtered / Hybrid"] C1 --> LSH["LSH (p-stable...)"] C2 --> KD["k-d tree / ball tree / vp-tree"] C3 --> PQ["PQ / IVF-PQ / Scalar Quant."] C4 --> HNSW["HNSW and variants"] C5 --> DISK["DiskANN / SSD graphs"] C6 --> HYB["Hybrid lexical+vector; metadata filtering"]

Figure C — Distributed Vector System Architecture (Mermaid)

flowchart LR classDef default fill:#fff,stroke:#333,stroke-width:1px,color:#1a1a1a,font-family:monospace; classDef storage fill:#f9f9f9,stroke:#888,stroke-dasharray: 5 5; subgraph Client Q["Query / Upsert API"] end subgraph ControlPlane META["Metadata / Schema / Access Control"] SCHED["Scheduler / Coordinator"] end subgraph DataPlane ING["Ingest / Streaming Node"] SEG["Segment Manager"] IDX["Index Builder"] QN["Query Node"] end subgraph Storage WAL["WAL / Log Broker"]:::storage OBJ["Object Storage / Filesystem"]:::storage KV["Meta store (KV / RDB)"]:::storage end Q --> META Q --> QN Q --> ING ING --> WAL ING --> SEG SEG --> IDX IDX --> OBJ QN --> OBJ QN --> META META --> KV WAL --> OBJ SCHED --> ING SCHED --> QN SCHED --> IDX

Figure D — Hybrid Search + RAG Pipeline (Mermaid)

flowchart TB classDef default fill:#fff,stroke:#1a1a1a,stroke-width:1px,color:#1a1a1a,font-family:monospace; classDef highlight fill:#f0f7ff,stroke:#0055ff,stroke-width:2px; U["User Query"]:::highlight --> E["Embed Query (Dense)"] U --> S["Sparse Encoder / BM25 Query"] subgraph Retrieval direction TB V["Vector ANN Search"] --> C1["Top-N Candidates"] K["Keyword / Sparse Search"] --> C2["Top-M Candidates"] end E --> V S --> K C1 --> MRG["Merge and Reweight"] C2 --> MRG MRG --> RR["Reranker (Cross-encoder / LLM)"] RR --> CTX["Context Selection"] CTX --> LLM["LLM Generation"] LLM --> A["Answer + Citations / Provenance"]:::highlight

Figure E — Recall vs Latency (Python snippet)


            import matplotlib.pyplot as plt

# NOTE:
# - DiskANN numbers should come from the DiskANN paper or your reproduced runs.
# - Others below are placeholders: replace with your measured results under a fixed protocol.
labels = ["DiskANN (paper)", "HNSW (placeholder)", "IVF-PQ (placeholder)", "ScaNN/AVQ (placeholder)"]
latency_ms = [2.8, 5.0, 8.0, 4.0]     # replace with your measurements
recall =     [0.95, 0.92, 0.90, 0.93] # replace with your measurements

plt.figure()
plt.scatter(latency_ms, recall)
for x, y, s in zip(latency_ms, recall, labels):
    plt.text(x, y, s)

plt.xlabel("Latency (ms) — lower is better")
plt.ylabel("Recall@1 (or Recall@k) — higher is better")
plt.title("Illustrative Recall-Latency trade-off (replace placeholders with reproducible results)")
plt.ylim(0.7, 1.0)
plt.xlim(0, max(latency_ms) + 2)
plt.show()

Figure F — Reproducible Experiment Workflow (Mermaid)

flowchart TD %% Định nghĩa Style Science (Lab-style) classDef default fill:#fff,stroke:#1a1a1a,stroke-width:1px,color:#1a1a1a,font-family:monospace; classDef highlight fill:#fff,stroke:#0055ff,stroke-width:2px,color:#0055ff,font-weight:bold; classDef storage fill:#f9f9f9,stroke:#888,stroke-dasharray: 5 5; A["Choose Dataset + Query Set + Ground Truth"]:::highlight --> B["Normalize Vectors: dtype, norm, metric"] B --> C["Choose Algorithm + Parameters (Grid/Random Search)"] C --> D["Build Index + Measure Build Time + Size"] D --> E["Warm-up + Benchmark (Latency/QPS/Recall)"] E --> F["Measure p50/p95/p99 + CPU/GPU/Mem/Disk"] F --> G["Record Hardware + Versions + Seed + Config"] G --> H["Publish Scripts + Configs + Logs + Checksums"]:::highlight %% Ghi chú quy trình khép kín H -.-> |Validation| A

Table 1 — Algorithm Comparison (Minimum)

Algorithm / Index	Group	Core Idea	Query Complexity (qualitative)	Memory	Dynamic Updates	Best Fit	Primary Source
k-d tree	Tree-based	Axis-aligned partitioning	Strong at low dimension; degrades as dimension increases	Medium	Yes	Exact/approx NN in low dimensions	[1]
Ball tree	Tree-based	Nodes are hyperspheres	Often better than k-d tree for some distributions	Medium	Yes	Metric L2, mid dimension	[2]
VP-tree / Metric tree	Tree-based	Vantage points for general metric spaces	Data-dependent; good for general metric spaces	Medium	Yes	Metric spaces beyond Euclidean	[3]
LSH (foundations)	Hashing-based	Probabilistic hashing preserves “near”	Expected sublinear under assumptions	High	Yes	When theory guarantees are required	[4]
p-stable LSH	Hashing-based	p-stable distributions for lp (esp. l2)	Good for l2 with proper tuning	High	Yes	Approx NN under lp norms	[5]
Product Quantization (PQ)	Quantization	Subspace quantization + lookup tables	Fast table lookup; depends on code settings	Low	Sometimes	Large compression, fast serving	[6]
HNSW	Graph-based	Multi-layer navigable small world graph	Very fast in practice; tunable via ef/M	Higher than PQ	Yes (incremental)	Diverse workloads; high recall at low latency	[7]
DiskANN	Disk-based graph	SSD-optimized graph layout + search	Low latency, high QPS at billion scale on SSD	Low RAM + SSD	Depends	Billion-scale out-of-core	[14]

Table 2 — System Comparison (Minimum 12)

Performance numbers are often not directly comparable across systems due to differing datasets, hardware, and tuning; use a fixed protocol (ANN-Benchmarks / BigANN) for Q1-level claims. [10][11][12]

System	Group	License	Core Methods	Metrics	Dynamic Updates	Distributed	GPU	Notes	Primary Source
FAISS	Library	MIT	IVF/PQ, HNSW, many index types	L2, IP, cosine	Depends on index	No (library)	Yes	Widely used baseline library	[8]
Annoy	Library	Apache-2.0	Random projection trees; mmap index	Angular/L2 (practice)	Mostly offline	No	No	Memory-mapped read-optimized	(project docs)
hnswlib	Library	Apache-2.0	HNSW	Configurable	Yes	No	No	Popular lightweight HNSW impl	[7]
NMSLIB	Library/toolkit	Apache-2.0	Multiple methods incl. HNSW	Multiple spaces	Yes	No	No	Toolkit for many ANN methods	(project docs)
ScaNN	Library	Apache-2.0	AVQ + pruning + reorder	MIPS/L2	Yes	No	CPU-optimized	Strong quality–speed when tuned	[9]
Milvus	VDBMS	Apache-2.0	IVF/PQ, HNSW; segment-based	Multiple	Yes	Yes	Yes	Purpose-built VDBMS	[15]
Weaviate	VDBMS	BSD-3-Clause	HNSW, flat; dynamic switching	Configurable	Yes	Yes (cloud-native)	Varies	Popular open-source VDBMS	[47]
Qdrant	VDBMS	(repo)	HNSW + payload filtering; quantization	Configurable	Yes	Deployment options	Varies	Filtering-first design	(project docs)
pgvector	DB extension	(repo)	HNSW, IVFFlat	L2, IP, cosine, L1, etc.	Yes	Via Postgres	No	Vector search inside Postgres	[30]
Vespa	Search + Vector	Apache-2.0	HNSW + ranking pipeline	Configurable	Yes (real-time)	Yes	Varies	Hybrid ranking at scale	(project docs)
Elasticsearch kNN (Lucene)	Search + Vector	Multiple	HNSW per segment (Lucene)	kNN metrics	Yes	Yes (cluster)	No	Strong hybrid story with BM25	[21][23]
Pinecone	Managed VDB	Commercial	Not fully disclosed	Dot/L2/cosine (docs)	Yes	Yes	Varies	Notable research on serverless filtering	[25][26]

Recommended Reproducible Benchmark Protocol (Summary)

To compare fairly, report at least one “medium” dataset (SIFT1M or GloVe) and one “large” dataset (Deep1B or SIFT1B), fix dtype/normalization/metric, tune key parameters (HNSW: M, efConstruction, efSearch; IVF: lists/probes; PQ: code size; ScaNN: tree+quant+reorder), and report latency p50/p95/p99, QPS under defined concurrency, recall@k against ground truth, and resource usage (index size, RSS, disk footprint, build time). Publish configs/logs/checksums. [10][11][12]

References

Click any in-text citation like [14] to jump here. Each reference has an anchor so citations scroll to the corresponding item.

Bentley, J. L. Multidimensional Binary Search Trees in Database Applications. IEEE TSE, 1979. Back to top
Omohundro, S. M. Five Balltree Construction Algorithms, 1989. Back to top
Yianilos, P. N. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. 1993. DOI: 10.1145/313559.313789. Back to top
Indyk, P., Motwani, R. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998. Back to top
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. Locality-sensitive hashing scheme based on p-stable distributions. SoCG 2004. DOI: 10.1145/997817.997857. Back to top
Jégou, H., Douze, M., Schmid, C. Product Quantization for Nearest Neighbor Search. IEEE TPAMI 33(1), 2011. DOI: 10.1109/TPAMI.2010.57. Back to top
Malkov, Y. A., Yashunin, D. A. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320. Back to top
Johnson, J., Douze, M., Jégou, H. Billion-scale similarity search with GPUs. arXiv:1702.08734 (FAISS). Back to top
Guo, R., Sun, P., Lindgren, E., et al. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. arXiv:1908.10396 (ScaNN/AVQ). Back to top
Aumüller, M., Bernhardsson, E., Faithfull, A. A Benchmarking Tool for Approximate Nearest Neighbor Search. arXiv:1807.05614 (ANN-Benchmarks). Back to top
Aumüller, M., Bernhardsson, E., Faithfull, A. Reproducibility protocol for ANN-Benchmarks, 2022. Back to top
BigANN / Big-ANN Benchmarks (NeurIPS tracks): https://big-ann-benchmarks.com/ Back to top
Babenko, A., Lempitsky, V. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. CVPR 2016 (Deep1B). Back to top
Subramanya, S. J., et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. NeurIPS 2019. Back to top
Wang, J., Yi, X., Guo, R., et al. Milvus: A Purpose-Built Vector Data Management System. SIGMOD 2021. DOI: 10.1145/3448016.3457550. Back to top
Milvus Documentation — Index types overview. Back to top
Milvus Documentation — Architecture overview (streaming/query/data nodes & storage). Back to top
Weaviate Documentation — Vector indexing (HNSW/flat, dynamic switching). Back to top
Weaviate GitHub — license and system description. Back to top
Vespa Documentation — Approximate NN search using HNSW. Back to top
Elastic Docs — kNN search in Elasticsearch. Back to top
Elastic Docs — Optimize approximate kNN search (HNSW per segment). Back to top
Apache Lucene — Lucene99HnswVectorsFormat (HNSW vectors format). Back to top
Pinecone Docs — Hybrid search. Back to top
Ingber, A., Liberty, E. Accurate and Efficient Metadata Filtering in Pinecone’s Serverless Vector Database. ICML 2025. Back to top
Pinecone Docs — Security overview (TLS/at-rest encryption). Back to top
Pinecone — Technical & Organizational Security Measures (PDF). Back to top
Qdrant GitHub — features (payload filtering, quantization). Back to top
Qdrant Documentation — Security (API key, TLS). Back to top
pgvector GitHub — HNSW/IVFFlat and parameters. Back to top
Pennington, J., Socher, R., Manning, C. GloVe: Global Vectors for Word Representation. EMNLP 2014. DOI: 10.3115/v1/D14-1162. Back to top
Bajaj, P., Campos, D., Craswell, N., et al. MS MARCO. arXiv:1611.09268. Back to top
Schuhmann, C., Vencu, R., Beaumont, R., et al. LAION-400M. arXiv:2111.02114. DOI: 10.48550/arXiv.2111.02114. Back to top
Russakovsky, O., Deng, J., Su, H., et al. ImageNet Large Scale Visual Recognition Challenge. IJCV 2015. DOI: 10.1007/s11263-015-0816-y. Back to top
Lewis, P., Perez, E., Piktus, A., et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. Back to top
Karpukhin, V., Oğuz, B., Min, S., et al. Dense Passage Retrieval for Open-Domain QA. EMNLP 2020. Back to top
Radford, A., Kim, J. W., Hallacy, C., et al. CLIP. ICML 2021. Back to top
Robertson, S., Zaragoza, H. BM25 and Beyond. FnT IR 2009. DOI: 10.1561/1500000019. Back to top
Formal, T., Piwowarski, B., Clinchant, S. SPLADE. SIGIR 2021. DOI: 10.1145/3404835.3463098. Back to top
Dwork, C., Roth, A. The Algorithmic Foundations of Differential Privacy. 2014. Back to top
Abadi, M., Chu, A., Goodfellow, I., et al. Deep Learning with Differential Privacy. CCS 2016. Back to top
Shokri, R., Stronati, M., Song, C., Shmatikov, V. Membership Inference Attacks Against ML Models. IEEE S&P 2017. Back to top
Fredrikson, M., Jha, S., Ristenpart, T. Model Inversion Attacks. CCS 2015. DOI: 10.1145/2810103.2813677. Back to top
Bolukbasi, T., Chang, K.-W., Zou, J., et al. Debiasing Word Embeddings. NeurIPS 2016. Back to top
Regulation (EU) 2016/679 (GDPR). Back to top
Regulation (EU) 2024/1689 (EU AI Act). Back to top
Pan, J. J., et al. Survey of vector database management systems. VLDB Journal 2024. Back to top
Chronis, Y., Caminal, P., et al. Filtered Vector Search: State-of-the-art and Research Directions. PVLDB 2025. Back to top
Elastic — Licensing FAQ (AGPL option; licensing changes). Back to top
Elastic Blog — “Elasticsearch Is Open Source. Again!” (AGPL option). Back to top

End of document.