Everyone's excited about 1M-token context windows. And they should be — it's genuinely useful. But I keep seeing teams reach for long context as the solution to every knowledge problem, and I think that's a mistake.
The case for retrieval
When you stuff an entire codebase or document corpus into a context window, you're paying for tokens you almost never need. Retrieval lets you fetch only what's relevant — and done well, it surfaces the right 500 tokens instead of drowning the model in 200K noisy ones.
There's also a latency argument. A well-indexed retrieval system can answer in milliseconds. Long-context inference scales poorly with token count.
Where long context actually wins
That said, there are cases where retrieval falls short:
- Tasks requiring global reasoning — summarizing across an entire document, detecting contradictions spread across sections.
- Short, well-scoped corpora — if your knowledge base fits in 50K tokens and changes rarely, just put it in.
- Prototype speed — retrieval adds infrastructure. For a quick demo, long context is fine.
The honest answer
Most production systems need both. Use retrieval to narrow the search space, then use context to reason over the retrieved set. The tricky part is making the retrieval layer good enough that it doesn't become the bottleneck.
That's what I've been working on — and I'll write more about the specifics soon.