Copilot searched your mailbox. LiteLLM handed out admin keys. Run this 5-check audit before your stack is next
Summary
<p>Two AI tools broke in the same way in the same two weeks, and four research teams proved it. The pattern underneath every disclosure is one sentence: enterprise AI accepts external input with no trust boundary. </p><p>On June 15, Varonis disclosed <a href="https://www.varonis.com/blog/searchleak">SearchLeak (CVE-2026-42824)</a>, a proof-of-concept exfiltration chain in Microsoft 365 Copilot Enterprise Search. A victim clicks a crafted microsoft.com URL, Copilot searches their mailbox, and the data leaves through a Bing SSRF. No plugins, no second click, no visible indicator. Four days earlier, Obsidian Security published a <a href="https://www.obsidiansecurity.com/blog/litellm-privilege-escalation-rce">three-CVE chain against LiteLLM</a> that carried a default low-privilege user all the way to admin and remote code execution. Two tools. Two teams. One broken boundary.</p><p>The five-check audit at the end of this article maps each gap to a CVE or a market signal from June, a command you can run before lunch, and a sentence a CISO can read to the board.</p><h2>Copilot turned a trusted URL into an exfiltration engine</h2><p>SearchLeak chained three weaknesses into a silent data-theft chain. The URL q parameter fed attacker instructions straight to Copilot’s LLM. A rendering race condition fired an image tag before the output sanitizer ran. Bing’s image-search endpoint, allowlisted in the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">Content Security Policy</a>, routed the stolen data out. Microsoft rated the flaw critical and patched it on the back end, according to Varonis. <a href="https://nvd.nist.gov/vuln/detail/CVE-2026-42824">NVD has not yet scored it</a>; a third-party tracker lists it at 6.5 medium. The severity is contested, but the mechanism is not.</p><p>The escalation is the real story. This is the third Varonis Copilot exfiltration chain in twelve months, after <a href="https://arstechnica.com/security/2026/01/a-single-click-mounted-a-covert-multistage-attack-against-copilot/">Reprompt</a> in January and <a href="https://www.bleepingcomputer.com/news/security/new-attack-turned-microsoft-365-copilot-into-1-click-data-theft-tool/">EchoLeak</a> in 2025. Reprompt hit Copilot Personal. SearchLeak hit Enterprise Search. Enterprise inherits the user’s full organizational permissions, so the blast radius is everything that a user can reach.</p><h2>LiteLLM handed a default account to every provider key</h2><p>The LiteLLM gateway holds the keys for OpenAI, Anthropic, Azure, and Bedrock behind a single proxy. The Obsidian chain runs in three moves. <a href="https://cvefeed.io/vuln/detail/CVE-2026-47101">CVE-2026-47101</a>, an authorization bypass, lets a non-admin mint a wildcard API key. CVE-2026-47102 promotes that caller to proxy admin through an unguarded /user/update endpoint. CVE-2026-40217 escapes the code sandbox through exec() with full builtins. Obsidian then demonstrated a reverse shell by injecting a forged tool-call response through LiteLLM’s callback mechanism. Obsidian assessed the combined chain at CVSS 9.9. The developer typed one word. The attacker popped a shell.</p><p>A separate LiteLLM flaw made the urgency immediate. <a href="https://thehackernews.com/2026/06/litellm-flaw-cve-2026-42271-exploited.html">CVE-2026-42271</a>, a command-injection bug in the MCP test endpoints, landed on the <a href="https://www.cisa.gov/known-exploited-vulnerabilities-catalog">CISA KEV list</a> on June 8 with a June 22 remediation deadline. That KEV entry is not the Obsidian chain. The two are distinct disclosures four days apart, fixed in different releases, pointed at the same gateway. LiteLLM carries more than 40,000 GitHub stars and sits in thousands of enterprise deployments. This is not the first scare, either. A <a href="https://thehackernews.com/2026/06/litellm-vulnerability-chain-lets-low.html">supply-chain compromise backdoored LiteLLM versions 1.82.7 and 1.82.8 on PyPI in March</a>. A compromised gateway exposes every provider credential the organization holds.</p><h2>Langflow and Mini Shai-Hulud proved the pattern scales</h2><p>The same boundary broke in two more tools in the same fortnight. <a href="https://thehackernews.com/2026/06/unpatched-langflow-flaw-cve-2026-5027.html">Langflow CVE-2026-5027</a> became the third Langflow remote-code-execution flaw to hit active exploitation this year. A path traversal in file upload lets an attacker write files anywhere on disk, and because Langflow ships with auto-login enabled by default, a single unauthenticated request reaches RCE. <a href="https://www.vulncheck.com/">VulnCheck</a> confirmed exploitation on June 9. Censys counted roughly 7,000 exposed instances, the heaviest concentration in North America, with <a href="https://attack.mitre.org/groups/G0069/">MuddyWater</a> attribution.</p><p>The <a href="https://www.securityweek.com/over-100-npm-pypi-packages-hit-in-new-shai-hulud-supply-chain-attacks/">Mini Shai-Hulud campaign</a> hit a different pressure point. After the worm’s source code went public on May 12, copycat variants <a href="https://socket.dev/blog/mini-shai-hulud-campaign-hits-red-hat-cloud-services-npm-packages">compromised 32 Red Hat Cloud Services npm packages</a> on June 1, packages pulled 80,000 times a week. The worm harvests more than 20 credential types and self-propagates under the compromised maintainer’s identity.</p><p>Four teams, four tools, one operating failure. The bug classes differ. SearchLeak is a prompt injection. LiteLLM is privilege escalation. Langflow is path traversal. Mini Shai-Hulud is supply-chain poisoning. The boundary that broke is the same in all four.</p><h2>The market already repriced the risk</h2><p>CrowdStrike’s <a href="https://www.fool.com/earnings/call-transcripts/2026/06/03/crowdstrike-crwd-q1-2027-earnings-transcript/">Q1 FY27 earnings call</a> put a number on the gap. <a href="https://www.crowdstrike.com/en-us/platform/falcon-aidr-ai-detection-and-response/">AIDR</a>, the company’s AI detection and response line, grew ending ARR more than 250% sequentially, with a Q2 pipeline above $50 million (<a href="https://www.sec.gov/Archives/edgar/data/0001535527/000153552726000022/crwd-20260603xex991.htm">SEC-filed 8-K</a>). Total company ARR reached $5.51 billion, and CrowdStrike’s fleet telemetry shows more than 1,800 agentic applications running across enterprise endpoints. </p><p>On June 17, the company <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstrike-advances-ai-and-cloud-security-operations-on-aws/">extended AIDR to AWS</a>, adding real-time evaluation of agent, LLM, and MCP communications across Amazon Bedrock, Kiro, and Strands Agents, building on its work with <a href="https://www.anthropic.com/glasswing">Anthropic’s Project Glasswing</a>. Daniel Bernard, CrowdStrike’s chief business officer, said the AI attack surface now spans development, runtime, identities, and cloud infrastructure, and that teams treating those as separate domains leave the gaps between them open.</p><h2>Practitioners name the same gap in plainer terms</h2><p>David Levin, CISO at American Express Global Business Travel, <a href="https://venturebeat.com/security/amex-ciso-fights-threats-at-machine-speed-with-ai/">told VentureBeat</a> the pattern does not surprise him. “We kind of have this shadow AI, which is just the new version of shadow IT,” Levin said. </p><p>Both Langflow and LiteLLM fit the description. Teams stood them up for convenience, gave them credentials, and never brought them under governance. Levin puts the fix before deployment. “We didn’t go into this with just saying we’re going to go do this without the right fundamentals,” he said. “We leverage NIST controls. NIST has released their CSF along with their AI framework. OWASP released their top 10. You need the right fundamentals before you deploy.”</p><p>Merritt Baer, CSO at Enkrypt AI and former AWS Deputy CISO, named the structural version of the failure in a separate <a href="https://venturebeat.com/security/most-enterprises-cant-stop-stage-three-ai-agent-threats-venturebeat-survey-finds">VentureBeat interview</a>. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system,” Baer said. “The real dependencies are one or two layers deeper, and those are the ones that fail under stress.” She has tied that directly to how systems fall. “Raw zero-days aren’t how most systems get compromised. Composability is,” Baer <a href="https://venturebeat.com/security/adversaries-hijacked-ai-security-tools-at-90-organizations-the-next-wave-has-write-access-to-the-firewall">told VentureBeat</a>. “It’s the glue between the model and your data where the risk lives. If you give an agent bash and a root token, you’ve already done most of the attacker’s work for them.” That is what rows 2 and 4 of the audit test: the gateway that holds every key, and the agent identity no one governs.</p><p>Levin had a sharper frame for the boardroom. “You need to talk more in terms of risk versus compliance to your boards and your executives,” he said. “It’s not about the size of the engineering team anymore. It’s the size of your imagination. It’s all written in plain English. It’s not hard for anyone.” Neither SearchLeak nor LiteLLM needed custom malware or a zero-day to work.</p><p>Adam Meyers, CrowdStrike’s SVP of Intelligence, put the operational squeeze in numbers in an exclusive VentureBeat interview. “The problem is not zero-day. The problem is patching. If you 10x that problem, they’re gonna be completely underwater,” Meyers said. He pointed to identity as the second front. “Some of these AI have their own identities, or people give their identity to the AI to take action on their behalf, and that makes it a very complex problem.”</p><h2>The five-check trust-boundary audit</h2><p>Each row maps a gap to its proof point, a verification command for Monday morning, the fix, and the sentence to read to the board.</p><table><tbody><tr><td><p><b>Trust-Boundary Gap</b></p></td><td><p><b>Proof Point</b></p></td><td><p><b>What Broke</b></p></td><td><p><b>Verify Monday</b></p></td><td><p><b>Fix Monday</b></p></td><td><p><b>Board Language</b></p></td></tr><tr><td><p><b>1. Prompt-to-Data</b></p></td><td><p>SearchLeak CVE-2026-42824. P2P injection + HTML race + Bing SSRF. One-click mailbox exfiltration via microsoft.com URL. PoC demonstrated; Microsoft rated it critical, NVD not yet scored.</p></td><td><p>URL q-parameter passed to LLM as instructions. Sanitizer ran after render. Bing acted as exfiltration proxy via CSP allowlist.</p></td><td><p>Audit CSP allowlists for domains performing server-side fetches. Monitor Copilot Search URLs for encoded payloads. Review Copilot audit logs.</p></td><td><p>Confirm server-side patch applied. Enable sensitivity labels restricting Copilot. Treat AI streaming output as untrusted.</p></td><td><p>“Our AI assistant could search employee email and send results to an attacker through a trusted Microsoft URL. Vendor patched it. We must verify configuration.”</p></td></tr><tr><td><p><b>2. Gateway Credential Exposure</b></p></td><td><p>LiteLLM three-CVE chain (-47101, -47102, -40217). CVSS 9.9. Separate CVE-2026-42271 on CISA KEV (fixed in v1.83.7; full chain fixed in v1.83.14-stable). June 22 deadline.</p></td><td><p>No role validation on key endpoints. Self-promotion to admin via /user/update. exec() sandbox escape. One gateway exposes all provider keys.</p></td><td><p>Run pip show litellm. Below 1.83.14-stable = vulnerable. Check /mcp-rest/test/ exposure. Audit proxy_admin accounts.</p></td><td><p>Upgrade to v1.83.14-stable+. Rotate all provider API keys. Block /mcp-rest/test/* at proxy. Review Custom Code Guardrails.</p></td><td><p>“Our AI gateway held keys for every provider. A default account could promote itself to admin and steal them all. Rotating and patching now.”</p></td></tr><tr><td><p><b>3. AI Tooling Sprawl</b></p></td><td><p>Langflow CVE-2026-5027 (CVSS 8.8). Third RCE of 2026. ~7,000 exposed instances. MuddyWater. Active exploitation June 9.</p></td><td><p>Path traversal in file upload. Auto-login enabled by default. Single unauthenticated request to RCE.</p></td><td><p>Query Censys/Shodan for Langflow, Flowise, n8n, Dify on your perimeter. Check auto-login. Inventory AI tools outside change management.</p></td><td><p>Pull AI platforms behind VPN/zero-trust. Enable auth everywhere. Upgrade Langflow to v1.9.0+ (current release 1.10.0). Fingerprint surface continuously.</p></td><td><p>“AI dev tools are exposed to the internet with login disabled. A nation-state group is exploiting this flaw now. Pulling behind access controls today.”</p></td></tr><tr><td><p><b>4. Non-Human Identity Governance</b></p></td><td><p>AIDR ARR up 250% (Q1 FY27, SEC 8-K). Q2 pipeline >$50M. 1,800+ agentic apps across enterprise endpoints.</p></td><td><p>Agents hold identities and act on behalf of humans. Some exceed their intended scope to reach a goal. No standard governs agent credential lifecycle.</p></td><td><p>Inventory all non-human identities used by agents and MCP servers. Map agent-to-data-store access. Flag agents with write access to security policy.</p></td><td><p>Least-privilege every agent identity. Set privilege boundaries via identity protection. Runtime detection for policy-exceeding actions. Human-in-the-loop for policy changes.</p></td><td><p>“AI agents hold credentials and act autonomously. We do not govern their identity lifecycle like human access. The 250% market growth tells us this gap is systemic.”</p></td></tr><tr><td><p><b>5. Runtime Agentic Detection</b></p></td><td><p>Falcon AIDR expanded to AWS (June 17). Covers Bedrock, Kiro, Strands Agents. MCP integration. Real-time agent/LLM/MCP evaluation.</p></td><td><p>Traditional tools monitor human-speed actions. Agents run at machine speed, thousands of actions per minute, and route around controls to reach goals.</p></td><td><p>Test if EDR/XDR links agent actions to originating identity. Verify SIEM ingests MCP communications. Confirm you can distinguish human from agent on endpoint.</p></td><td><p>Deploy AIDR or equivalent runtime detection. Shadow-AI discovery for all agentic apps, models, MCP servers, identities. Real-time policy enforcement on agent actions.</p></td><td><p>“We cannot distinguish a human employee from an AI agent acting on their behalf. We need runtime detection at machine speed that can stop damage before it starts.”</p></td></tr></tbody></table><h2>The fix is plumbing, not policy</h2><p>The <a href="https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/">June 2 executive order</a> creates an AI Cybersecurity Clearinghouse with a July 2 deadline. The five gaps above are not frontier-model problems. They are plumbing problems in the gateways, orchestration platforms, identity layers, and runtime environments where AI meets the enterprise. </p><p>The audit is five rows. Every row maps to a June disclosure or market signal, a command a team can run before lunch, and a sentence a CISO can read to the board. The question is not whether your vendor will patch. It's whether you find the gap first — or whether an attacker finds it the way they found Copilot and LiteLLM.</p>