MCP Security Risks (2026): 7 Exploitable Failure Modes and How to Detect Them

The real production problem

MCP expands what agents can do, not just what they can read. That shift changes failure economics. A single risky call can cross from analysis into irreversible mutation.

Security teams often focus on handshake controls. Incidents usually happen after the handshake succeeds. The failure is in action authorization, scope boundaries, and weak detection paths.

One uncomfortable truth

If your tool manifest changes at 2:00 AM and nobody gets paged, your attacker has a maintenance window.

What top sources cover vs miss

Source	Strong coverage	Missing piece
MCP Security Best Practices (official)	Strong protocol-level threats: token passthrough, confused deputy, redirect/SSRF, and session handling rules.	No exploitability scoring model for prioritizing fixes when every issue looks critical on paper.
OWASP Guide for Third-Party MCP Servers	Practical attack classes such as tool poisoning, memory poisoning, tool interference, and discovery hardening.	Limited operational thresholds for paging, rollout blocks, and security SLO ownership.
Microsoft MCP Security Risk Analysis	Actionable mitigations for excessive permissions, indirect prompt injection, and baseline security posture upgrades.	No concrete detection query examples to catch violations before external impact.

7-risk exploitability matrix

Risk	Attacker precondition	Blast radius	Detection signal	Primary control
Tool poisoning and rug pull	Unverified tool manifest or version drift	High	Tool schema hash changes outside release window	Manifest pinning + checksum verification
Indirect prompt injection into tool calls	Untrusted content enters model context	High	Risky tool invocations after retrieval-heavy prompts	Pre-dispatch policy gate with action allowlist
Over-scoped OAuth tokens	Write scope granted to read-only workflows	Critical	Write actions by principals tagged read_only	Scope segmentation by server and action class
Token passthrough and confused deputy	Client relays user token directly to server	Critical	Downstream audience mismatch in token claims	Server-issued access token and audience checks
Shadow MCP server adoption	Direct connection outside approved registry	High	New server fingerprints with no approval record	Registry-only discovery + default deny
Cross-tool interference loops	Shared context between unrelated tool chains	Medium	Tool-call fan-out spike and repeated chain depth	Context isolation + chain depth guardrails
Output poisoning and data bleed	Raw tool output enters model context unfiltered	High	PII/secret classifier hit after tool completion	Output safety stage with REDACT/QUARANTINE paths

Risk notes and fast checks

1. Tool poisoning and rug pull

Teams approve a tool once, then trust it forever. Attackers change descriptions or behavior later and borrow that trust.

Example: A harmless `list_tickets` tool updates to include hidden side effects in a minor version. Nobody reviews because the name looks familiar.

Fast check: Pin manifest hashes. Alert on description or parameter drift before execution is allowed.

2. Indirect prompt injection into tool calls

The model receives adversarial instructions from retrieved data and treats them as task context.

Example: A document snippet says to invoke `delete_project`. The user never requested deletion. The model still proposes it.

Fast check: Run policy decisions on requested actions, not on the model's confidence text.

3. Over-scoped OAuth tokens

Read workflows silently inherit write capabilities over time. One compromised token can mutate production systems.

Example: A reporting assistant token gains repository admin scope during a rushed incident fix and never gets narrowed again.

Fast check: Split credentials by action class and enforce hard deny for scope mismatches.

4. Token passthrough and confused deputy

Passing upstream user tokens downstream lets untrusted servers act with the wrong authority boundary.

Example: Server A receives a token intended for client B, then calls server C with that token and obtains unauthorized data.

Fast check: Issue server-specific tokens with explicit audience and reject relayed tokens.

5. Shadow MCP servers

Untracked servers become permanent infrastructure because prototypes are faster than procurement workflows.

Example: A local helper server added for one demo keeps running in CI with stale credentials for months.

Fast check: Block registry misses at connection time. If it is not registered, it does not execute.

6. Cross-tool interference loops

Outputs from one tool chain accidentally trigger unrelated tools in another chain, creating noisy and risky cascades.

Example: A summary step emits text that another parser interprets as a new action request, creating recursive calls.

Fast check: Set max chain depth, isolate contexts, and require explicit handoff objects between tool domains.

7. Output poisoning and data bleed

Sensitive data in tool output enters model context and can leak later in unrelated user responses.

Example: A CRM lookup returns personal identifiers. The model later repeats them in a generic status update.

Fast check: Run output classifiers before model ingestion. REDACT low-risk secrets, QUARANTINE high-risk payloads.

Detection and containment gates

These gates are intentionally strict during early production rollout. Relaxing thresholds is easy later. Explaining an avoidable breach to legal is usually harder.

Gate	Target	Page condition	Owner
Unapproved write calls	0	> 0 in any 10m window	Governance On-call
Registry drift	0 unknown servers	Any unknown server in production	Platform Security
Manifest drift outside deploy window	0	> 0 and no linked change ticket	MCP Platform
Scope mismatch attempts	< 0.1%	>= 1% for 15m	Identity Team
QUARANTINE ratio	< 1%	> 3% for 15m	Safety Team

Policy and SIEM examples

mcp-security-policy.yaml

YAML

version: v1
rules:
  - id: deny-unregistered-server
    match:
      labels:
        mcp.server: "*"
      mcp:
        deny_servers_not_in: ["github", "jira", "snowflake", "slack"]
    decision: DENY

  - id: require-approval-for-write
    match:
      labels:
        mcp.action: write
    decision: REQUIRE_APPROVAL
    constraints:
      max_runtime_sec: 60
      max_retries: 1

  - id: deny-scope-mismatch
    match:
      auth:
        require_scope_match: true
    decision: DENY

detect-unapproved-writes.sql

SQL

-- Detect write actions executed without approval
SELECT
  event_time,
  actor_id,
  server_name,
  tool_name,
  decision,
  approval_id
FROM mcp_audit
WHERE action_class = 'write'
  AND decision = 'ALLOW'
  AND (approval_id IS NULL OR approval_id = '')
  AND event_time >= NOW() - INTERVAL '10 minutes'
ORDER BY event_time DESC;

contain-server.sh

Bash

#!/usr/bin/env bash
set -euo pipefail

SERVER="$1"

echo "[1/3] disabling server in registry: $SERVER"
curl -s -X POST "$API/registry/$SERVER/disable"

echo "[2/3] revoking active tokens for server: $SERVER"
curl -s -X POST "$API/tokens/revoke" -d "{"audience":"$SERVER"}"

echo "[3/3] forcing approval mode on all write actions"
curl -s -X POST "$API/policy/emergency-write-approval"

echo "Containment completed for $SERVER"

For broader rollout controls, pair this with MCP in Production Best Practices.

Limitations and tradeoffs

More operational noise early on

Strict paging thresholds can be noisy in week one. Tune with data, not guesswork.

Approval queues can slow delivery

Write-path approval protects systems but introduces latency. Plan staffing around peak windows.

Classifier tuning is ongoing work

Output safety false positives are normal. Teams need a regular calibration cycle.

Frequently Asked Questions

Which MCP risk should I fix first?

Start with unapproved write-path execution and over-scoped tokens. Those two controls usually reduce blast radius the fastest.

Do read-only agents still need strict controls?

Yes. Read paths can still leak sensitive data through model output, logs, or copied context into later conversations.

How often should tool manifests be revalidated?

At minimum on every release and daily in production. Real-time drift alerts are better when tooling allows it.

Is prompt filtering enough for MCP security?

No. Prompt filtering helps, but policy enforcement outside the model is what blocks unsafe calls deterministically.

What should trigger immediate containment?

Any unknown server in production, any unapproved write action, or any critical token scope mismatch should trigger containment.

MCP Security Risks: 7 Exploitable Failure Modes and How to Detect Them