G360 Technologies

The Threat Room

The Threat Room

When AI Agents Act, Identity…….

When AI Agents Act, Identity Becomes the Control Plane A product team deploys an AI agent to handle routine work across Jira, GitHub, SharePoint, and a ticketing system. It uses delegated credentials, reads documents, and calls tools to complete tasks. A month later, a single poisoned document causes the agent to pull secrets and send them to an external endpoint. The audit log shows “the user” performed the actions because the agent acted under the user’s token. The incident is not novel malware. It is identity failure in an agent-shaped wrapper. Between late 2025 and early 2026, regulators and national cyber authorities started describing autonomous AI agents as a distinct security problem, not just another application. NIST’s new public RFI frames agent systems as software that can plan and take actions affecting real systems, and asks for concrete security practices and failure cases from industry. (Federal Register) At the same time, FINRA put “AI agents” into its 2026 oversight lens, calling out autonomy, scope, auditability, and data sensitivity as supervisory and control problems for member firms. (FINRA) Gartner has put a number on the trajectory: by 2028, 25% of enterprise breaches will be traced to AI agent abuse. That prediction reflects a shift in where attackers see opportunity. (gartner.com) Enterprises have spent a decade modernizing identity programs around humans, service accounts, and APIs. AI agents change the shape of “who did what” because they: The UK NCSC’s December 2025 guidance makes the core point directly: prompt injection is not analogous to SQL injection, and it may remain a residual risk that cannot be fully eliminated with a single mitigation. That pushes enterprise strategy away from perfect prevention and toward containment, privilege reduction, and operational controls. (NCSC) Why Agents Are Not Just Service Accounts Security teams may assume existing non-human identity controls apply. They do not fully transfer. Service accounts run fixed, predictable code. Agents run probabilistic models that decide what to do based on inputs, including potentially malicious inputs. A service account that reads a poisoned document does exactly what its code specifies. An agent that reads the same document might follow instructions embedded in it. The difference: agents can be manipulated through their inputs in ways that service accounts cannot. How the Mechanism Works 1. Agents collapse “identity” and “automation” into one moving target Most agents are orchestration layers around a model that can decide which tools to call. The identity risk comes from how agents authenticate and how downstream systems attribute actions: 2. Indirect prompt injection turns normal inputs into executable instructions Agents must read information to work. If the system cannot reliably separate “data to summarize” from “instructions to follow,” untrusted content can steer behavior. NCSC’s point is structural: language models do not have a native, enforceable boundary between data and instructions the way a parameterized SQL query does. That is why “filter harder” is not a complete answer. (NCSC) A practical consequence: any agent that reads external or semi-trusted content (docs, tickets, wikis, emails, web pages) has a standing exposure channel. 3. Tool protocols like MCP widen the blast radius by design The Model Context Protocol (MCP) pattern connects models to tools and data sources. It is powerful, but it also concentrates risk: an agent reads tool metadata, chooses a tool, and invokes it. Real-world disclosures in the MCP ecosystem have repeatedly mapped back to classic security failures: lack of authentication, excessive privilege, weak isolation, and unsafe input handling. One example is CVE-2025-49596 (MCP Inspector), where a lack of authentication between the inspector client and proxy could lead to remote code execution, according to NVD. (NVD) Separately, AuthZed’s timeline write-up shows that MCP server incidents often look like “same old security fundamentals,” but in a new interface where the agent’s reasoning decides what gets executed. (AuthZed) 4. Agent supply chain risk is identity risk Agent distribution and “prompt hub” patterns create a supply chain problem: you can import an agent configuration that quietly routes traffic through attacker infrastructure. Noma Security’s AgentSmith disclosure illustrates this clearly: a malicious proxy configuration could allow interception of prompts and sensitive data, including API keys, if users adopt or run the agent. (Noma Security) 5. Attack speed changes response requirements Unit 42 demonstrated an agentic attack framework where a simulated ransomware attack chain, from initial compromise to exfiltration, took 25 minutes. They reported a 100x speed increase using AI across the chain. (Palo Alto Networks) To put that in operational terms: a typical SOC alert-to-triage cycle can exceed 25 minutes. If the entire attack completes before triage begins, detection effectively becomes forensics. What This Looks Like from the SOC Consider what a security operations team actually sees when an agent-based incident unfolds: The delay between “something is wrong” and “we understand what happened” is where damage compounds. Now Scale It The opening scenario described one agent, one user, one poisoned document. Now consider a more realistic enterprise picture: How many agents read it? Which ones act on it? Which credentials are exposed? Which downstream systems are affected? The attack surface is not one agent. It is the graph of agents, permissions, and shared data sources. A single poisoned input can fan out across that graph faster than any human review process can catch it. Analysis – Why This Matters Now Regulators are converging on a shared premise: if an agent can take actions, then “governance” is not just model policy. It is identity, authorization, logging, and supervision. The regulatory message is consistent: if you deploy agents that can act, you own the consequences of those actions, including the ones you did not authorize. Implications for Enterprises Identity and access management Tooling and platform architecture Monitoring, audit, and response Risks and Open Questions Further Reading

The Threat Room

The Context Layer Problem

The Context Layer Problem An Attack With No Exploit The following scenario is a composite based on multiple documented incidents reported since 2024. A company’s AI assistant sent a confidential pricing spreadsheet to an external email address. The security team found no malware, no compromised credentials, no insider threat. The model itself worked exactly as designed. What happened? An employee asked the assistant to summarize a vendor proposal. Buried deep in the PDF was a short instruction telling the assistant to forward internal financial data to an external address. The assistant followed the instruction. It had the permissions. It did what it was told. Variations of this attack have been documented across enterprise deployments since 2024. The base model was never the vulnerability. The context layer was. Why This Matters Now Between 2024 and early 2026, a pattern emerged across enterprise AI incidents. Prompt injection, RAG data leakage, automated jailbreaks, and Shadow AI stopped being theoretical concerns. They showed up in production copilots, IDE agents, CRMs, office suites, and internal chatbots. The common thread: none of these failures required breaking the model. They exploited how enterprises connected models to data and tools. The Trust Problem No One Designed For Traditional software has clear boundaries. Input validation. Access controls. Execution sandboxes. Code is code. Data is data. Large language models collapse this distinction. Everything entering the context window is processed as natural language. The model cannot reliably tell the difference between “summarize this document” from a user and “ignore previous instructions” embedded in that document. This creates a fundamental architectural tension. The more useful an AI system becomes (connecting it to email, documents, APIs, and tools), the larger the attack surface becomes. Five Failure Modes In The Wild Direct prompt injection is the simplest form. Attacker-controlled text tells the model to ignore prior instructions or perform unauthorized actions. In enterprise systems, this happens when untrusted content like emails, tickets, or CRM notes gets concatenated directly into prompts. One documented case involved a support ticket containing hidden instructions that caused an AI agent to export customer records. Indirect prompt injection is subtler and harder to defend. Malicious instructions hide in documents the system retrieves during normal operation: PDFs, web pages, wiki entries, email attachments. The orchestration layer treats retrieved content as trusted, so these injected instructions can override system prompts. Researchers demonstrated this by planting instructions in public web pages that corporate AI assistants later retrieved and followed. RAG data leakage often happens without any jailbreak at all. The problem is upstream: overly broad document embedding, weak vector store access controls, and retrieval logic that ignores user permissions. In several documented cases, users retrieved and summarized internal emails, HR records, strategy documents, and API keys simply by crafting semantic queries. The model did exactly what it was supposed to do. The retrieval pipeline was the gap. Agentic tool abuse raises the stakes. When models can call APIs, modify workflows, or interact with cloud services, injected instructions translate into real actions. Security researchers demonstrated attacks where a planted instruction in a GitHub issue caused an AI coding agent to exfiltrate repository secrets. The agent had the permissions. It followed plausible-looking instructions. No human approved the action. Shadow AI sidesteps enterprise controls entirely. Employees frustrated by slow IT approvals or restrictive policies copy sensitive data into personal ChatGPT accounts, unmanaged tools, or browser extensions. Reports from 2024 and 2025 link Shadow AI to a significant portion of data breaches, higher remediation costs, and repeated exposure of customer PII. The data leaves the building through the front door. Threat Scenario Consider a company that deploys an AI assistant with access to Confluence, Jira, Slack, and the ability to create calendar events and send emails on behalf of users. An attacker gets a job posting shared in a public Slack channel. They apply, and their resume (a PDF) contains invisible text: instructions telling the AI to forward any messages containing “offer letter” or “compensation” to an external address, then delete the forwarding rule from the user’s settings. A recruiter asks the AI to summarize the candidate’s resume. The AI ingests the hidden instructions. Weeks later, offer letters start leaking. The forwarding rule is gone. Logs show the AI took the actions, but the AI has no memory of why. The individual behaviors described here have already been observed in production systems. What remains unresolved is how often they intersect inside a single workflow. These are not edge cases. They are ordinary features interacting in ways most enterprises have not threat-modeled. What The Incidents Reveal Across documented failures, the base model is rarely the point of failure. Defenses break at three layers: Context assembly. Systems concatenate untrusted content without sanitization, origin tagging, or priority controls. The model cannot distinguish between instructions from the system prompt and instructions from a retrieved email. Trust assumptions. Orchestration layers assume retrieved content is safe, that model intent aligns with user authorization, and that probabilistic guardrails will catch adversarial inputs. As context windows grow and agents gain autonomy, these assumptions fail. Tool invocation. Agentic systems map model output directly to API calls without validating that the action matches user intent, checking privilege boundaries, or requiring human approval for sensitive operations. This is why prompt injection now holds the top position in the OWASP GenAI Top 10. Security researchers increasingly frame AI systems not as enhanced interfaces but as new remote code execution surfaces. What This Means For Enterprise Teams Security teams now face AI risk that spans application security, identity management, and data governance simultaneously. Controls must track where instructions originate, how context gets assembled, and when tools are invoked. Traditional perimeter defenses do not cover these attack vectors. Platform and engineering teams need to revisit RAG and agent architectures. Permission-aware retrieval, origin tagging, instruction prioritization, and policy enforcement at the orchestration layer are becoming baseline requirements. Tool calls based solely on model output represent a high-blast-radius design choice that warrants scrutiny. Governance and compliance teams must address Shadow AI as a structural problem, not a policy problem. Employees route around controls