[PERDITION//SEC]Contact
back to writing
// ai-security   2026-02-02  ·  9 min

Securing RAG pipelines: the threat model nobody draws

RAG turns every document in your knowledge base into a potential instruction set for your agent. Most teams don't draw the threat model that follows from that. Here's the one I use.

Retrieval-Augmented Generation is the dominant architecture pattern for production LLM applications right now. It's also the one with the messiest threat model, because RAG fundamentally turns every document in your knowledge base into a potential instruction set for whatever LLM is doing the answering. Most teams I review have not drawn that threat model out, and the ones who have have usually drawn the wrong one.

The threat model I use for RAG has four layers, and you should think about controls at each one. The first is the ingestion layer: who can put a document into the index, and what verification happens before they can. If anyone with a sales-team account can upload a PDF that becomes part of the corpus that answers customer queries, you have an ingestion problem. The second is the storage layer: are tenant boundaries actually enforced at retrieval time, or are you relying on the LLM to honor instructions about whose data it can return? The third is the retrieval layer: are documents from one tenant ever scored against queries from another? The fourth is the generation layer: what does the LLM do with retrieved content, and what is it allowed to do as a result of reading it?

The most common failure I see is at layer four. The application retrieves a document, passes it to the LLM as context, and the LLM treats the document content as instructions. Indirect prompt injection through corpus poisoning is the dominant exploit path I find in RAG systems, and it's usually achievable by anyone with write access to the corpus — which, in a sales-engineering or knowledge-base context, can be hundreds of people. The fix is the same as it is for direct prompt injection: structurally prevent retrieved content from influencing the agent's plan, by only allowing it to influence the answer.

Layer two — tenant isolation — is where I find the second-most-common class of bugs. A surprising number of multi-tenant RAG systems use a single shared vector index with a tenant filter applied at query time. That filter is exactly as reliable as the application code that applies it, which is to say, not reliable at all under adversarial conditions. The right answer for any system handling customer or regulated data is per-tenant indexes (or per-tenant namespaces in a system that supports them), with the isolation enforced at the storage layer, not at the application layer.

If you're building RAG and you can't answer four questions — who can write to the corpus, how is tenant isolation enforced, can retrieved content modify the agent's plan, what is the agent permitted to do as a result of reading any document — your threat model has gaps. Draw the picture. Stress-test it adversarially. The interesting failures are not at the model layer.