RAG (Retrieval Augmented Generation) is a technique that connects a Large Language Model (LLM) to external / private data. Instead of relying solely on the model’s internal training data , RAG allows the model to “browse” a curated database to find relevant information before generating an answer.
HyDE: Hallucinating a fake answer to find real documents.
Multi Query: Breaking a complex question into sub-questions.
Re-ranking: Using a Cross-Encoder to re-score the top-K results for higher precision.
Contextual Retrieval: Prepending document/section context to chunks before embedding. Significantly improves retrieval accuracy. Related: Late Chunking.
Generation & Synthesis
Context Window Management: Handling token limits.
Multi-hop Reasoning: When a single retrieval isn’t enough, chaining multiple retrievals to answer complex questions (e.g., “Who founded the company that built X?”).
Citation & Attribution: Techniques to force the LLM to reference specific chunk IDs and their sources.
Compression Techniques: Using summaries or extractive compression to fit more relevant context into token limits.
Context Relevance: Is the retrieved text actually useful?
Groundedness/Faithfulness: Is the answer derived only from the context?
Answer Relevance: Did we answer the user’s question?
Tools: RAGAS, TruLens, Phoenix (Arize).
Synthetic Test Set Generation: Using LLMs to generate question-answer pairs from your corpus for evaluation. Often the bottleneck in proper RAG evaluation.
The Frontier
GraphRAG: Using Knowledge Graphs to capture structural relationships between entities.
Agentic RAG: Giving an agent tools to decide when to search, what to search, and if it needs to search again (e.g., Self-RAG).
Multimodal RAG: Embedding and retrieving images, diagrams, and tables. Uses CLIP-style embeddings or vision-language models for document understanding.
Comparisons
Feature
Finetuning
RAG
Long Context Window
Knowledge Source
Internal Weights (Parametric)
External DB (Non-Parametric)
Prompt Context
Data Freshness
Static (Training snapshot)
Dynamic (Real-time updates)
Dynamic (Per query)
Hallucinations
Hard to control
Reduced (Grounded)
Reduced
Privacy
Data baked into model
Data stays in DB
Data sent to API
Cost
High (Training)
Low (Inference)
Linear with length
Best For
Domain Vocabulary, Style, Formats
Knowledge retrieval, Facts
Summarizing 1 large doc
Common RAG Pitfalls & Challenges
Retrieval Problems
Poor Chunk Quality: Overlapping or fragmented chunks can confuse the retriever. Solution: Carefully tune chunking strategy.
Context Mismatch: Retrieved context doesn’t contain the actual answer because of poor chunking or embedding quality. Measure with Context Relevance metrics.
Noise Injection: Including irrelevant context can mislead the LLM (“Lost in the Middle” effect). Solution: Use Re-ranking or stricter filtering.
Generation Problems
Hallucination Outside Context: LLM generates information not in retrieved documents. Mitigation: Better prompting (“answer only from context”) + faithfulness evaluation.
Conflicting Information: Multiple retrieved documents contradict each other. Solution: Use LLM to detect conflicts or re-rank for consensus.
Token Overflow: Too much context → truncation → relevant info lost. Solution: Use compression, hierarchical retrieval.
Scalability & Performance
Latency: Retrieval adds overhead to inference. Solution: Caching, pre-computation, or Approximate Nearest Neighbor (ANN) optimization.
Index Staleness: If data updates frequently, retrieval might miss recent changes. Solution: Hybrid with live data + caching strategies.
Security Concerns
Prompt Injection via Retrieved Content: Malicious content in your corpus can inject instructions into the LLM context. Mitigation: Sanitize indexed content, use content delimiters, apply output validation.
Data Access Control: Ensuring users only retrieve documents they’re authorized to see. Solution: Metadata filtering with user permissions, document-level ACLs.
PII Leakage: Sensitive information in retrieved chunks may be exposed unintentionally. Solution: PII detection/redaction before indexing.
Where RAG Excels
Customer Support: Searching knowledge base / FAQs for relevant answers.
Document Q&A: Querying long documents without losing details.
Fact-Based Retrieval
Real-time Information: Combining current web search results with reasoning.