Overview

A Vector Database is a specialized storage system designed to efficiently store, index, and retrieve high-dimensional vectors (embeddings). In the context of RAG pipelines, vector databases serve as the core retrieval engine, enabling semantic search by finding vectors that are geometrically close to a query vector in embedding space.

Unlike traditional databases that search by exact matches (SQL WHERE clauses) or keyword matching, vector databases perform similarity search: given a query vector , find the vectors in the database most similar to .

Why dedicated vector databases? The naive approach of computing distances between the query vector and every stored vector in the database does not scale. For a database with vectors and dimensionality , a brute-force search requires distance computations per query. At million vectors and dimensions, this is approximately 15 billion operations per query, making it impractical for real-time applications.

Vector databases solve this through Approximate Nearest Neighbor (ANN) algorithms, which trade a small amount of recall for orders-of-magnitude speedups.

Key Concepts

Distance Metrics

Vector databases compute similarity using distance (or similarity) metrics. The choice of metric affects both the results and the indexing algorithm performance.

MetricFormulaRangeUse Case
Cosine Similarity[-1, 1]Normalized embeddings (most common for LLM embeddings)
Euclidean (L2)[0, ∞)When magnitude matters
Dot Product(-∞, ∞)Pre-normalized vectors (equivalent to cosine)
Manhattan (L1)[0, ∞)Sparse vectors, grid-like distances

For unit-normalized vectors (norm = 1), cosine similarity equals dot product. Most embedding models (OpenAI, Cohere, BGE) output normalized embeddings, so these are interchangeable in practice.

Exact vs Approximate Nearest Neighbors

ApproachDescriptionTime ComplexityRecallUse Case
Exact (Brute-force)Compare query to all vectors100%Small datasets (<10K vectors)
Approximate (ANN)Use index structures to prune search to 95-99%Production scale

ANN algorithms sacrifice some recall (the percentage of true nearest neighbors found) for dramatic speed improvements. A well-tuned ANN index achieves 95-99% recall with 100-1000x speedup over brute-force.

The Recall vs Speed Trade-off

Production systems can typically target 95-99% recall, which means 1-5% of queries may miss the “true” nearest neighbor but find the second or third closest instead. For RAG applications, this is usually acceptable since we retrieve top-K (typically 5-20) results anyway.

ANN Index Algorithms

Different indexing strategies trade off index build time, query speed, memory usage, and recall. The choice depends on your scale and constraints.

Must Read: https://www.pinecone.io/learn/series/faiss/vector-indexes/

HNSW (Hierarchical Navigable Small World)

Most popular algorithm for production vector databases.

Must Read: https://www.pinecone.io/learn/series/faiss/hnsw/

Characteristics

AspectDetails
Query SpeedVery fast ()
Memory UsageHigh (graph structure + original vectors)
Index Build TimeMedium-slow
Update SupportGood (can add/remove without full rebuild)
Best ForRead-heavy workloads, high recall requirements

IVF (Inverted File Index)

IVF clusters vectors into partitions (cells), then searches only relevant partitions at query time. It is similar to a library with organized sections.

Must Read: https://blog.dailydoseofds.com/p/approximate-nearest-neighbor-search

Key Parameters

ParameterDescriptionTrade-off
nlistNumber of clustersMore = finer partitions, slower build
nprobeClusters to search at query timeMore = better recall, slower query

Heuristics:

  • nlist ≈ sqrt(N) for balanced partitioning
  • nprobe ≈ nlist/10 as a starting point (tune for recall)

Characteristics

AspectDetails
Query SpeedFast, but depends on nprobe
Memory UsageLower than HNSW (no graph overhead)
Index Build TimeFast (just K-means + assignment)
Update SupportPoor (rebalancing needed for new clusters)
Best ForLarge datasets, memory-constrained environments

PQ (Product Quantization)

PQ compresses vectors by dividing them into subvectors and quantizing each independently, dramatically reducing memory usage. It is a lossy compression technique.

Must Read: https://www.pinecone.io/learn/series/faiss/product-quantization/

Characteristics

AspectDetails
Query SpeedMedium (quantization lookup overhead)
Memory UsageVery low (10-100x compression)
AccuracyLower (lossy compression)
Index Build TimeMedium (codebook training)
Best ForMemory-constrained, very large datasets

Compound Indices (IVF-PQ, HNSW+PQ)

Production systems can combine algorithms for best results.

IVF-PQ (FAISS Default for Large Scale)

https://towardsdatascience.com/similarity-search-with-ivfpq-9c6348fd4db3/

Combines IVF clustering with PQ compression:

  1. Cluster vectors (IVF) to reduce search space
  2. PQ-compress vectors within each cluster

Characteristics: Fast, low memory, moderate recall. Good for billion-scale datasets.

HNSW+PQ

https://weaviate.io/blog/ann-algorithms-hnsw-pq

HNSW graph navigation with PQ-compressed vectors:

  1. Use HNSW graph structure for navigation
  2. Store PQ codes instead of full vectors
  3. Optional: re-rank with original vectors for top candidates

Characteristics: Faster than pure HNSW, retains good recall, much lower memory.

Index Algorithm Comparison

AlgorithmQuery SpeedMemoryRecallBuild TimeUpdatesBest For
Brute-ForceSlowLow100%NoneEasyTiny datasets
HNSWVery FastHigh98-99%MediumGoodProduction, quality-critical
IVFFastMedium90-95%FastPoorLarge scale, moderate quality
PQMediumVery Low85-95%MediumMediumMemory-constrained
IVF-PQFastLow90-97%MediumPoorBillion-scale
HNSW+PQFastMedium95-98%MediumMediumBalanced scale + quality

Rule of Thumb:

  • < 100K vectors: HNSW or even brute-force
  • 100K - 10M vectors: HNSW (if memory allows), else IVF
  • 10M - 1B vectors: IVF-PQ or HNSW+PQ
  • 1B vectors: Specialized solutions (ScaNN, DiskANN)

Vector Database Landscape

Categories

1. Purpose-Built Vector Databases Designed from the ground up for vector search, with enterprise features.

2. Vector Search Libraries Algorithms you run yourself (no managed infrastructure).

3. Traditional Databases with Vector Extensions Add vector search to existing database infrastructure.

DatabaseTypeBest ForKey Features
PineconeManaged SaaSProduction, zero-opsServerless, automatic scaling, hybrid search
WeaviateOpen-source/CloudHybrid search, semantic featuresBuilt-in hybrid search, modules for ML
QdrantOpen-source/CloudFiltering, payload storageAdvanced filtering, payload indexing
MilvusOpen-source/CloudScale, GPU accelerationBillion-scale, GPU support
ChromaOpen-sourcePrototyping, simplicitySimple API, LangChain integration
FAISSLibraryPerformance, researchFacebook’s library, state-of-art algorithms
pgvectorPostgres extensionExisting Postgres usersSQL integration, ACID transactions
ElasticsearchSearch platformFull-text + vector hybridMature ecosystem, hybrid search

Detailed Comparison

Pinecone (Managed SaaS)

Strengths:

  • Fully managed, serverless architecture
  • Automatic scaling and replication
  • Built-in hybrid search (sparse + dense)
  • Metadata filtering with vector search
  • Strong consistency guarantees

Weaknesses:

  • Vendor lock-in
  • Higher cost at scale
  • Less control over index parameters

Best For: Production applications where operational simplicity is valued over cost optimization. Teams without dedicated infrastructure expertise.

Weaviate (Open-source)

Strengths:

  • Built-in hybrid search (BM25 + vectors)
  • Modular architecture (plug in different vectorizers)
  • GraphQL API
  • Active open-source community
  • Self-hosted or managed cloud options

Weaknesses:

  • Higher memory footprint than some alternatives
  • Learning curve for module system

Best For: Hybrid Search applications requiring both keyword and semantic search in one query.

Qdrant (Open-source)

Strengths:

  • Advanced filtering (payload indexing)
  • Efficient on-disk storage
  • Rust implementation (performance + safety)
  • Straightforward REST/gRPC API
  • Strong consistency options

Weaknesses:

  • Smaller ecosystem than Weaviate/Milvus
  • Fewer built-in integrations

Best For: Applications requiring complex metadata filtering alongside vector search. When you need to filter by .metadata.author == "John" AND .metadata.year > 2020 efficiently.

Milvus (Open-source)

Strengths:

  • Designed for billion-scale
  • GPU acceleration support
  • Multiple index types (HNSW, IVF, DiskANN)
  • Separation of storage and compute
  • Strong enterprise features

Weaknesses:

  • Operational complexity
  • Heavier resource requirements
  • Steeper learning curve

Best For: Very large scale deployments (100M+ vectors). Enterprise environments with dedicated infrastructure teams.

Chroma (Open-source)

Strengths:

  • Extremely simple API
  • Embedded mode (no server needed)
  • First-class LangChain/LlamaIndex integration
  • Fast iteration for prototypes
  • Lightweight

Weaknesses:

  • Not designed for production scale
  • Limited query features
  • No native hybrid search

Best For: Prototyping, local development, small projects. Getting started with RAG quickly.

FAISS (Library)

Strengths:

  • State-of-the-art algorithms (fastest implementations)
  • Multiple index types (brute-force to billion-scale)
  • GPU support
  • Well-documented, well-researched
  • Zero dependencies beyond NumPy

Weaknesses:

  • No managed infrastructure
  • No persistence (save/load manually)
  • No metadata filtering (vectors only)
  • No replication/sharding built-in

Best For: Research, performance-critical applications, embedding into larger systems. When you need maximum control.

pgvector (PostgreSQL Extension)

Strengths:

  • SQL interface (familiar)
  • ACID transactions
  • Combine with relational data
  • Use existing Postgres infrastructure
  • No separate vector database needed

Weaknesses:

  • Performance ceiling at scale
  • Limited to Postgres ecosystem
  • Fewer index options than specialized DBs

Best For: Applications already using PostgreSQL that need to add vector search without adding infrastructure. When you need transactions across vector and relational data.

Choosing a Vector Database

RequirementRecommended Options
Quick prototypeChroma, FAISS
Production with minimal opsPinecone
Hybrid search (keyword + semantic)Weaviate, Pinecone
Complex metadata filteringQdrant, Milvus
Existing Postgrespgvector
Billion-scaleMilvus, ScaNN
On-premise / air-gappedMilvus, Qdrant, Weaviate (self-hosted)
Full control over algorithmsFAISS
GPU accelerationMilvus, FAISS

Integration with RAG Pipelines

Indexing Flow

Query Flow

Metadata Strategies

Storing metadata alongside vectors enables powerful filtering:

index.upsert(vectors=[
    {
        "id": "chunk_001",
        "values": embedding_vector,  # [0.1, 0.2, ..., 0.8]
        "metadata": {
            "source": "annual_report_2024.pdf",
            "page": 12,
            "section": "Financial Overview",
            "date": "2024-03-15",
            "author": "CFO",
            "chunk_text": "Revenue increased by 15%..."
        }
    }
])
 
# Query with filter: semantic search + metadata constraint
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={
        "source": {"$eq": "annual_report_2024.pdf"},
        "date": {"$gte": "2024-01-01"}
    }
)

Filtering Strategies:

  • Pre-filtering: Apply metadata filter before vector search (reduces search space)
  • Post-filtering: Vector search first, then filter results (simpler but slower)

Most production databases use pre-filtering for efficiency.

Practical Application

Scaling Considerations

ScaleVectorsMemoryRecommended Approach
Tiny< 10K< 1 GBChroma embedded, brute-force
Small10K - 100K1-10 GBSingle-node HNSW (Chroma, Qdrant)
Medium100K - 10M10-100 GBManaged (Pinecone) or self-hosted cluster
Large10M - 1B100 GB - 10 TBDistributed (Milvus cluster), IVF-PQ
Massive> 1B> 10 TBSpecialized (ScaNN, DiskANN, custom sharding)

Memory Estimation

Rough estimation for HNSW memory usage:

Where:

  • = number of vectors
  • = embedding dimension
  • = HNSW edge count parameter
  • 4 bytes per float32 dimension
  • 8 bytes per neighbor pointer, 2 directions (bidirectional edges)

For Example: 10M vectors, 1536 dimensions, M=16:

Common Pitfalls

1. Mismatched Embedding Models

Problem: Indexing with one embedding model, querying with another.

Why It Fails: Different models produce incompatible vector spaces. A vector from OpenAI text-embedding-3-small is meaningless when compared to a vector from sentence-transformers/all-MiniLM-L6-v2.

Solution: Always use the same embedding model for indexing and querying. Store the model name as metadata.

2. Ignoring Index Tuning

Problem: Using default parameters for all workloads.

Why It Matters: Defaults optimize for general cases. Your recall/speed requirements may differ significantly.

Solution: Benchmark with your actual data. Tune ef_search (HNSW) or nprobe (IVF) based on recall targets.

3. Forgetting Metadata Indexing

Problem: Storing metadata but not indexing it for filtering.

Why It Matters: Unindexed metadata requires post-filtering (scan all results), negating speed benefits.

Solution: Explicitly create indexes on frequently-filtered fields. Most databases require this configuration.

4. Stale Indices

Problem: Documents update but vector indices do not.

Why It Matters: RAG returns outdated information, potentially causing incorrect answers.

Solution: Implement index refresh pipelines. Track document versions. Consider incremental updates vs. full rebuilds based on update frequency.

Advanced Topics

Hybrid Search in Vector Databases

Several vector databases support Hybrid Search natively, combining vector similarity with keyword (BM25) search:

DatabaseHybrid SupportImplementation
PineconeYesSparse-dense vectors in same query
WeaviateYesBM25 + vector fusion (configurable)
QdrantPartialRequires separate text index
MilvusYesScalar + vector composite index
pgvectorVia Postgrespg_trgm + pgvector separately

Multi-Tenancy

For SaaS applications serving multiple customers:

Approach 1: Metadata-Based Isolation

  • Single index, filter by tenant_id metadata
  • Simple, but potential data leakage risk

Approach 2: Namespace/Collection Separation

  • Separate namespace per tenant
  • Better isolation, more overhead

Approach 3: Database-per-Tenant

  • Full isolation
  • Highest overhead, best security

Quantization for Cost Reduction

Use quantized (lower precision) vectors to reduce costs:

PrecisionBytes per DimensionMemory Savings
float324Baseline
float16250%
int8175%
binary1/896%

Trade-off: Lower precision = lower recall. For many RAG use cases, int8 quantization can retain 95%+ recall at 75% memory savings.

Comparisons

Vector Database vs Traditional Database

AspectVector DatabaseTraditional RDBMS
Query TypeSimilarity (nearest neighbor)Exact match (WHERE clauses)
Index StructureANN graphs/clustersB-trees, hash indexes
Data ModelVectors + metadataTables, rows, columns
Typical Operation”Find similar to X""Find where X = Y”
ConsistencyOften eventualACID transactions
Scale PatternSpecialized shardingMature horizontal scaling

Vector Database vs Search Engine (Elasticsearch)

AspectVector DatabaseElasticsearch
Primary StrengthSemantic similarityFull-text search (BM25)
Vector SupportNative, optimizedAdded feature (k-NN plugin)
Hybrid SearchVariesExcellent (text + vectors)
Operational MaturityGrowingVery mature
EcosystemRAG/ML focusedEnterprise search focused

Resources

Documentations

Papers

Others

Benchmarks


Back to: 01 - RAG Index