Aayush's ML & AI Notes

❯

01 ML & AI Concepts

❯

LLMs & Generative AI

❯

Context Window Fundamentals

Context Window Fundamentals

Jan 13, 20261 min read

Overview

The context window is the maximum amount of text that an LLM can “see” at once, including both input and output.

What Consumes Context Window?

Everything that goes into the model counts toward the limit:

System Instructions (often invisible to users) (1,000-3,000 tokens typically)
Conversation History: both user messages and assistant responses
Current Message
Uploaded Documents
Model’s Response: (as it’s being generated)
Tool Use / Function Calls

Key Ideas

Just because a model CAN handle 200K tokens doesn’t mean it processes all of them equally well. Performance often degrades with very long contexts.

”Lost in the Middle” Problem

Models don’t pay equal attention to all parts of the context window. Language models are best at using information that appears:

At the very beginning (primacy effect)
At the very end (recency effect)

Information in the middle of a very long context is often overlooked or forgotten.

Context Window Management Strategies

Summarization and Compression

When conversations get long, summarize earlier exchanges.
Summarize at conversations exceeding 30-40% of context window.

Chunking Large Documents - RAG (TODO)

Back to: ML & AI Index

Graph View

Overview
What Consumes Context Window?
Key Ideas
”Lost in the Middle” Problem
Context Window Management Strategies
Summarization and Compression
Chunking Large Documents - RAG (TODO)

Backlinks

02 - LLMs & Generative AI Index
Context Compression Techniques
Prompt Engineering Fundamentals
Naive RAG Pipeline

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community