RAG & Knowledge

When RAG fails in production — and what to fix first

Common retrieval failure modes in enterprise settings: stale corpora, citation theater, chunking mismatches, and permission leaks — plus practical fixes.

April 12, 2026 · 9 min read

Retrieval-Augmented Generation is easy to demo and hard to operate. The failure isn’t always “bad embeddings.” Often it’s workflow: documents update constantly, ownership is messy, and users ask questions that don’t map cleanly to chunks.

Below are patterns we see repeatedly — generalized, not tied to any single client.

Stale corpora beat bad models

If your source documents change weekly and your index updates monthly, you’ll get confident wrong answers. The model will sound authoritative because the citation exists — but the answer is outdated.

Fix retrieval freshness first: define what “authoritative” means per source, automate ingest, and surface “last updated” context to users when helpful.

Citation without verification is theater

Citations reduce anxiety, but they’re not proof. Users need confidence that the cited passage actually supports the claim — especially in regulated contexts.

Invest in evaluation: sample production questions, verify answer-to-source alignment, and track citation accuracy as a metric, not as a vibe.

Chunking mismatches dominate “vector quality” debates

If chunks split across semantic boundaries, retrieval returns the wrong context. Tables, policies, and versioned PDFs are famous for this.

Sometimes the right fix is structure: section-aware chunking, metadata filters, hybrid search, or retrieving at the document level first and chunk second.

Permissions are a retrieval problem

The risk isn’t only “wrong answer” — it’s “right-looking answer sourced from something the user shouldn’t see.” Enforcement must happen in the retrieval path, not as an afterthought in the prompt.

What to fix first

If you’re triaging a struggling RAG system, start with: freshness, permissioning, chunking/search architecture, evaluation — then model swaps.

Swapping embeddings or models without fixing retrieval mechanics is expensive churn with capped upside.

Want help applying this in your environment? Book a short strategy call — we'll align on scope, risks, and a sensible first milestone.

Book a Strategy Call →