Overview

GraphRAG is an advanced Retrieval-Augmented Generation approach that augments traditional RAG with knowledge graph structures and hierarchical community detection. Instead of retrieving isolated text chunks based on vector similarity, GraphRAG extracts entities and their relationships from documents to build a structured knowledge graph, then uses community detection algorithms to create hierarchical summaries of the entire corpus.

The key innovation is that GraphRAG can answer global queries (questions about the entire corpus, like “What are the main themes in this dataset?”) that traditional vector-based RAG completely fails at, while also enhancing local queries with relationship context that would be lost in chunked text.

Problem with Standard RAG

Standard Naive RAG retrieves chunks purely based on semantic similarity to the query. This approach has a fundamental blind spot.

For a query like “What are the main themes across all documents?”, there is no single chunk that contains this answer. The query itself is not semantically similar to any specific chunk, so standard RAG returns low-relevance results.

How GraphRAG Solves This

GraphRAG introduces a two-phase approach:

Phase 1: Indexing (Offline Pre-computation)

  1. Extract entities (people, places, concepts) and relationships from all documents using an LLM
  2. Build a knowledge graph where nodes = entities, edges = relationships
  3. Apply community detection (Leiden algorithm) to cluster related entities
  4. Pre-generate summaries for each community at multiple hierarchy levels

Phase 2: Query Time

  • For local queries: Traverse entity neighborhoods in the graph
  • For global queries: Map-reduce over community summaries

Community Detection

The (Leiden algorithm) partitions the knowledge graph into “communities” of densely connected entities. The communities are hierarchical. Note that each entity belongs to a single leaf-level community.

Architecture: The Full Pipeline

1. Indexing Phase (Expensive, Upfront)

Entity Extraction

GraphRAG uses multipart prompts to guide the LLM:

Prompt Structure:
1. "Identify all entities in this text (people, organizations, places, concepts)"
2. "For each entity, provide: name, type, and a brief description"
3. "Identify all relationships between entities"
4. "For each relationship, describe the connection"

Gleaning (Multi-pass extraction):
After initial extraction, follow-up prompt:
"Review the text again. Did you miss any entities or relationships?"
→ Significantly reduces information loss

2. Query Phase

GraphRAG supports two distinct query modes:

Local Search (Entity-Centric)

Process:

  1. Use the LLM to extract entities from the query
  2. Find those entities in the knowledge graph
  3. Expand to neighboring nodes (1-2 hops)
  4. Retrieve associated text chunks and relationship descriptions
  5. Generate answer from retrieved context

Characteristics: Fast, targeted, lower cost per query.

Global Search (Corpus-Wide)

For questions like: “What are the main themes in this dataset?”

Query → Identify Relevant Community Level → Map-Reduce Over Summaries

Level Selection:
  Level 0 (1 community)   → Coarsest (entire corpus)
  Level 1 (~10 communities) → Broad themes
  Level 2 (~50 communities) → Sub-topics
  ...

Map-Reduce:
  Map: Send query + each community summary to LLM → Partial answers
  Reduce: Synthesize partial answers into final response

Process:

  1. Select appropriate hierarchy level based on query specificity
  2. For each community at that level, ask LLM: “Based on this summary, what’s relevant to the query?”
  3. Collect all partial answers
  4. Final LLM call: Synthesize partial answers into coherent response

Characteristics: Higher latency (10+ LLM calls), expensive (200K+ tokens), but answers previously impossible queries.

DRIFT Search (Dynamic Reasoning and Inference with Flexible Traversal)

DRIFT is a hybrid query mode introduced by Microsoft that bridges Local and Global search. It addresses queries that are too specific for pure Global search but require broader context than Local search provides.

Intuition: Think of DRIFT as “starting with a map, then zooming in.” It uses community summaries to get oriented, then dynamically drills down into specific graph neighborhoods based on what it discovers.

Three-Phase Process:

Phase 1: Primer
├─ Compare query against community reports at intermediate hierarchy levels
├─ Generate initial answer + follow-up questions
└─ Identify which communities/entities are most relevant

Phase 2: Follow-Up (Iterative)
├─ Take generated follow-up questions
├─ Execute local search for each question
├─ Gather specific facts from graph neighborhoods
└─ May generate additional follow-up questions (multi-hop)

Phase 3: Output Synthesis
├─ Combine primer insights + local search results
└─ Generate final comprehensive answer

When DRIFT Excels:

  • Queries that seem local but need broader context (“How does X’s work relate to the industry trends?“)
  • Questions where the optimal starting entities aren’t obvious from the query text
  • Multi-faceted queries that span multiple communities

Key Advantage: By incorporating community information early, DRIFT casts a wider net for relevant entities, leading to higher fact variety in final answers. Standard Local search might miss relevant entities if they don’t appear directly in the query.

Cost: Higher than Local (multiple LLM calls for follow-ups), but typically lower than Global (doesn’t require map-reduce over all communities).

Mathematically

Graph Representation

Given a corpus , GraphRAG constructs:

  • Entity set: where each
  • Relationship set:
  • Knowledge Graph:

Community Hierarchy

After Leiden community detection:

Where represents communities at level , and:

  • = single community (entire graph)
  • = finest communities (most granular)
  • (more communities at finer levels)

For each community :

Query Processing

Local Search retrieves a subgraph:

Global Search uses map-reduce:

Practical Application

When to Use GraphRAG

Use CaseWhy GraphRAG Helps
Thematic analysis”What are the main topics in these 1000 documents?”
Relationship reasoning”How does Company A connect to Person B?”
Multi-hop questionsChain of relationships naturally captured
Entity-centric QAGraph structure provides rich entity context
Summarization of large corporaCommunity summaries enable corpus-level understanding
Private enterprise dataBuild knowledge graph over internal documents

When NOT to Use GraphRAG

ScenarioWhy It’s Overkill
Simple factual QAStandard RAG suffices for direct fact retrieval
Single document QANo graph structure needed for one document
Latency-critical appsGlobal search is too slow (400ms-2s+)
Frequently changing dataRe-indexing KG is expensive
Budget constraintsIndexing could cost 10-50x more than standard RAG
Low entity densityIf your corpus has few extractable entities, graph is sparse

Cost Analysis

GraphRAG has significantly higher costs than standard RAG:

PhaseCost DriverTypical Scale
IndexingLLM calls for entity extraction~1 call per chunk
IndexingGleaning (multi-pass extraction)2-3x extraction cost
IndexingCommunity summarization~1 call per community
Global QueryMap phase (parallel)10-100 LLM calls
Global QueryReduce phase1-3 LLM calls
Local QuerySubgraph retrieval + generation1-2 LLM calls

Rule of Thumb: Indexing 1MB of text could cost roughly $10-15 (depending on LLM pricing). Global queries consume 200K+ tokens.

Indexing Cost Breakdown (Example)

For a corpus of 10,000 documents (~10MB of text):

Chunking: ~50,000 chunks (@ 600 tokens/chunk, 300 overlap)

Entity Extraction:
  - 50,000 LLM calls (1 per chunk)
  - With gleaning: +25,000 calls (50% additional)
  - ~75,000 calls total
  
Entity Resolution: CPU-only (minimal cost)

Community Detection: CPU-only (Leiden is efficient)

Community Summarization:
  - Depends on hierarchy depth
  - ~500-5,000 LLM calls for summaries

Embedding:
  - Standard embedding costs (~$0.0001/1K tokens)

Total: Heavy LLM dependency → $100-500+ for 10MB corpus

Comparisons

GraphRAG vs Standard RAG

AspectStandard RAGGraphRAG
Index StructureVector store (flat)Knowledge graph + communities (hierarchical)
Retrieval MethodCosine similarityGraph traversal + community lookup
Global QueriesFailsMap-reduce over summaries
Multi-hop ReasoningRequires explicit chainingImplicit in graph structure
Indexing CostLow (embeddings only)High (LLM extraction + summarization)
Query Latency~100-200msLocal: ~200-400ms, Global: 1-5s
Data FreshnessEasy to updateExpensive to re-index
ExplainabilityChunk citationsEntity + relationship provenance

GraphRAG vs Multi-hop Reasoning

AspectMulti-hop ReasoningGraphRAG
Relationship DiscoveryImplicit (LLM infers)Explicit (pre-extracted graph)
Query-time CostMultiple retrieval+LLM callsSingle subgraph retrieval
Error PropagationHop 1 error cascadesRelationships are fixed
Setup CostLow (standard RAG index)High (knowledge graph construction)
Best ForAd-hoc complex questionsEntity-relationship domains

Technical Deep Dive: Community Summarization

The community summarization step is critical for global search to work. At each level of the hierarchy:

# Pseudo-code for community summary generation
def generate_community_summary(community):
    entities = get_entities_in_community(community)
    relationships = get_relationships_in_community(community)
    
    prompt = f"""
    You are analyzing a community of related entities.
    
    Entities:
    {format_entities(entities)}
    
    Relationships:
    {format_relationships(relationships)}
    
    Generate a comprehensive summary that captures:
    1. The main theme of this community
    2. Key entities and their roles
    3. Important relationships
    4. Any notable patterns or findings
    """
    
    return llm(prompt)

The resulting summaries serve as a “compressed reasoning context” that allows the LLM to reason about large portions of the corpus without processing individual chunks.

Common Pitfalls

1. Entity Extraction Quality

Problem: Poor LLM prompts lead to missing or incorrect entities, corrupting the knowledge graph.

Solution:

  • Use domain-specific few-shot examples in extraction prompts
  • Implement gleaning (multi-pass extraction)

2. Over-Granular Communities

Problem: Too many small communities make global search expensive and fragmented.

Solution:

  • Tune Leiden resolution parameter
  • Merge communities below a minimum size threshold
  • Use appropriate hierarchy level for queries

3. Stale Knowledge Graphs

Problem: When source documents update, the knowledge graph becomes outdated.

Solution:

  • Implement incremental indexing (update affected subgraphs only)
  • Track document versions and their graph contributions
  • Schedule periodic full re-indexing for data freshness

4. Cost Explosion at Scale

Problem: For very large corpora, indexing costs become prohibitive.

Solution:

  • Use smaller/cheaper LLMs for extraction (with quality tradeoff)
  • Sample documents for entity extraction, not 100% coverage
  • Consider hybrid approaches: GraphRAG for core entities, standard RAG for supporting content

Prompt Engineering for Entity Extraction

Example extraction prompt pattern:

Given the following text:
---
{text_chunk}
---

Extract all entities and relationships following this format:

ENTITIES:
- Name: [entity name]
  Type: [PERSON | ORGANIZATION | LOCATION | CONCEPT | EVENT]
  Description: [brief description based on the text]

RELATIONSHIPS:
- Source: [entity 1]
  Target: [entity 2]
  Relationship: [description of how they are related]

Be exhaustive. Include all entities mentioned, even if briefly.

Resources

Papers

Others


Back to: 01 - RAG Index