The Context Layer Problem
An Attack With No Exploit
The following scenario is a composite based on multiple documented incidents reported since 2024.
A company’s AI assistant sent a confidential pricing spreadsheet to an external email address. The security team found no malware, no compromised credentials, no insider threat. The model itself worked exactly as designed.
What happened? An employee asked the assistant to summarize a vendor proposal. Buried deep in the PDF was a short instruction telling the assistant to forward internal financial data to an external address.
The assistant followed the instruction. It had the permissions. It did what it was told.
Variations of this attack have been documented across enterprise deployments since 2024. The base model was never the vulnerability. The context layer was.
Why This Matters Now
Between 2024 and early 2026, a pattern emerged across enterprise AI incidents. Prompt injection, RAG data leakage, automated jailbreaks, and Shadow AI stopped being theoretical concerns. They showed up in production copilots, IDE agents, CRMs, office suites, and internal chatbots.
The common thread: none of these failures required breaking the model. They exploited how enterprises connected models to data and tools.
The Trust Problem No One Designed For
Traditional software has clear boundaries. Input validation. Access controls. Execution sandboxes. Code is code. Data is data.
Large language models collapse this distinction. Everything entering the context window is processed as natural language. The model cannot reliably tell the difference between “summarize this document” from a user and “ignore previous instructions” embedded in that document.
This creates a fundamental architectural tension. The more useful an AI system becomes (connecting it to email, documents, APIs, and tools), the larger the attack surface becomes.
Five Failure Modes In The Wild
Direct prompt injection is the simplest form. Attacker-controlled text tells the model to ignore prior instructions or perform unauthorized actions. In enterprise systems, this happens when untrusted content like emails, tickets, or CRM notes gets concatenated directly into prompts. One documented case involved a support ticket containing hidden instructions that caused an AI agent to export customer records.
Indirect prompt injection is subtler and harder to defend. Malicious instructions hide in documents the system retrieves during normal operation: PDFs, web pages, wiki entries, email attachments. The orchestration layer treats retrieved content as trusted, so these injected instructions can override system prompts. Researchers demonstrated this by planting instructions in public web pages that corporate AI assistants later retrieved and followed.
RAG data leakage often happens without any jailbreak at all. The problem is upstream: overly broad document embedding, weak vector store access controls, and retrieval logic that ignores user permissions. In several documented cases, users retrieved and summarized internal emails, HR records, strategy documents, and API keys simply by crafting semantic queries. The model did exactly what it was supposed to do. The retrieval pipeline was the gap.
Agentic tool abuse raises the stakes. When models can call APIs, modify workflows, or interact with cloud services, injected instructions translate into real actions. Security researchers demonstrated attacks where a planted instruction in a GitHub issue caused an AI coding agent to exfiltrate repository secrets. The agent had the permissions. It followed plausible-looking instructions. No human approved the action.
Shadow AI sidesteps enterprise controls entirely. Employees frustrated by slow IT approvals or restrictive policies copy sensitive data into personal ChatGPT accounts, unmanaged tools, or browser extensions. Reports from 2024 and 2025 link Shadow AI to a significant portion of data breaches, higher remediation costs, and repeated exposure of customer PII. The data leaves the building through the front door.
Threat Scenario
Consider a company that deploys an AI assistant with access to Confluence, Jira, Slack, and the ability to create calendar events and send emails on behalf of users. An attacker gets a job posting shared in a public Slack channel. They apply, and their resume (a PDF) contains invisible text: instructions telling the AI to forward any messages containing “offer letter” or “compensation” to an external address, then delete the forwarding rule from the user’s settings.
A recruiter asks the AI to summarize the candidate’s resume. The AI ingests the hidden instructions. Weeks later, offer letters start leaking. The forwarding rule is gone. Logs show the AI took the actions, but the AI has no memory of why.
The individual behaviors described here have already been observed in production systems. What remains unresolved is how often they intersect inside a single workflow. These are not edge cases. They are ordinary features interacting in ways most enterprises have not threat-modeled.
What The Incidents Reveal
Across documented failures, the base model is rarely the point of failure. Defenses break at three layers:
Context assembly. Systems concatenate untrusted content without sanitization, origin tagging, or priority controls. The model cannot distinguish between instructions from the system prompt and instructions from a retrieved email.
Trust assumptions. Orchestration layers assume retrieved content is safe, that model intent aligns with user authorization, and that probabilistic guardrails will catch adversarial inputs. As context windows grow and agents gain autonomy, these assumptions fail.
Tool invocation. Agentic systems map model output directly to API calls without validating that the action matches user intent, checking privilege boundaries, or requiring human approval for sensitive operations.
This is why prompt injection now holds the top position in the OWASP GenAI Top 10. Security researchers increasingly frame AI systems not as enhanced interfaces but as new remote code execution surfaces.
What This Means For Enterprise Teams
Security teams now face AI risk that spans application security, identity management, and data governance simultaneously. Controls must track where instructions originate, how context gets assembled, and when tools are invoked. Traditional perimeter defenses do not cover these attack vectors.
Platform and engineering teams need to revisit RAG and agent architectures. Permission-aware retrieval, origin tagging, instruction prioritization, and policy enforcement at the orchestration layer are becoming baseline requirements. Tool calls based solely on model output represent a high-blast-radius design choice that warrants scrutiny.
Governance and compliance teams must address Shadow AI as a structural problem, not a policy problem. Employees route around controls when sanctioned pathways are slow, restrictive, or unavailable. Without alternatives that offer logging, masking, and monitoring, the data continues to leak through unofficial channels.
Across all functions, AI systems increasingly require the same rigor applied to other production infrastructure: least-privilege execution, access control, auditability, and incident response playbooks.
Unresolved Questions
Several problems remain open. Multi-agent systems complicate attribution when failures occur: which agent made the decision? Long-term memory features create new persistence risks: how do you detect and remove instructions planted months ago? Researchers are exploring whether sensitive source documents can be reconstructed through repeated querying of RAG systems, even when individual outputs appear sanitized.
AI security is still evolving from reactive, guardrail-based controls toward structural, system-level defenses. The vendors with the most sophisticated models are not necessarily the ones with the most secure integration patterns. That gap will define the next wave of incidents.
The Bottom Line
The most consequential AI security failures are not happening at the model layer. They are happening in the plumbing: the context assembly, the retrieval pipelines, the tool orchestration, the trust boundaries that enterprises did not know they needed to draw.
Attackers have noticed. Enterprises should too.
Further Reading
OWASP GenAI Top 10
Microsoft Security Response Center blogs on indirect prompt injection
IBM Data Breach Report 2025
WE45 research on RAG leakage
Obsidian Security prompt injection analysis
Aikido GitHub Actions agent research
Menlo Security Shadow AI reports