RAG done right: building knowledge bases that actually work

ThinkFutura ·

Every company with documents wants a “chat with your data” system. The pitch is simple: upload your files, ask questions, get answers. The reality is messier.

We’ve built RAG systems across insurance, research, logistics, and professional services. Here’s what we’ve learned about making them actually useful.

The naive approach (and why it disappoints)

Most RAG tutorials show you the same thing: chunk your documents, embed them, retrieve the top-k results, pass them to an LLM. It works on demos. It falls apart on real data.

Why? Because real enterprise data is:

  • Messy — PDFs with tables, scanned documents, spreadsheets, Slack threads
  • Contradictory — The 2023 policy says one thing, the 2024 update says another
  • Contextual — “Revenue” means something different in finance vs. sales vs. HR

Naive chunking destroys all of this context.

What actually matters

Chunking strategy

Don’t chunk blindly by token count. Respect document structure. A section heading + its content should stay together. A table should never be split across chunks.

For complex documents, we use hierarchical chunking: summaries at the document level, detailed chunks at the section level. The retriever can pull from both.

Metadata is everything

Every chunk should carry metadata: source document, date, author, section, document type. This lets you filter before you retrieve — which is faster and more accurate than hoping the embedding space separates everything cleanly.

Hybrid retrieval

Pure vector search misses exact matches. Pure keyword search misses semantic similarity. Use both. BM25 + vector search with reciprocal rank fusion consistently outperforms either alone.

Evaluation before scaling

Before you add more documents or more features, build an evaluation set. 50-100 question-answer pairs that represent real user queries. Run them after every change. If accuracy drops, you catch it immediately.

The access control problem

Enterprise knowledge bases need permissions. Not everyone should see everything. This is the part most prototypes skip and most production systems struggle with.

Our approach: inherit permissions from the source system. If a document lives in a restricted SharePoint folder, the chunks from that document carry the same restrictions. Filter at retrieval time, not after generation.

When RAG isn’t enough

Sometimes the answer isn’t in a single document. It requires synthesizing information across multiple sources, applying business logic, or doing calculations.

For these cases, we layer agentic workflows on top of RAG: the system can retrieve, reason, and call tools (like a calculator or an API) to construct the answer.

Start small, stay honest

The best RAG systems we’ve built started with a single document type and a single user group. They grew from there, guided by real usage data and real feedback.

If your users aren’t finding answers, the solution is rarely “add more documents.” It’s usually “understand why the retrieval is failing for these specific queries” and fix that.