[PERDITION//SEC]Contact
all research
// AI Security   2026-03-30

Indirect Prompt Injection via Corpus Poisoning in Production RAG Systems

// abstract

A field study of 14 production RAG deployments across fintech, SaaS, and AI platform clients. We characterize the indirect prompt injection attack surface introduced by retrieval, document concrete exploitation paths observed in real systems, and propose a structural defense pattern that separates retrieved content from the agent's instruction channel.

Across fourteen Retrieval-Augmented Generation deployments reviewed during 2025–2026, indirect prompt injection through retrieved corpus content was the most reliably exploitable attack class we observed. In every system reviewed, an attacker with write access to any document indexable by the RAG pipeline could influence the behavior of downstream agents in ways the application owners had not anticipated.

This writeup characterizes the attack surface, documents three illustrative exploitation paths drawn from anonymized engagement findings, and proposes a structural defense pattern based on capability separation. We argue that the dominant industry response — adding instruction-classification heuristics or system-prompt warnings — is insufficient and likely to remain so, because the underlying issue is the conflation of two channels: the user's instructional intent, and the contents of retrieved data.

The proposed defense pattern is based on the observation that any LLM-based agent has at least two logically distinct inputs: instructions (what to do) and data (what to do it on). Production RAG systems have collapsed these two channels into a single context window. The structural fix is to re-separate them at the architecture layer, by introducing a planner/executor split where the planner sees only user input, the executor sees only retrieved data, and neither one can be reinterpreted as an instruction by the other.

We present an evaluation of this pattern against the same fourteen systems, with attack success rates falling from 100% (every system tested) to single-digit percentages on the redesigned variants. The remaining failures are addressed in the discussion section.

// Findings drawn from anonymized engagement data. For permission to cite or for the underlying methodology, contact [email protected].