Skip to content
MCP Security

MCP Security Risks: 7 Exploitable Failure Modes and How to Detect Them

The hard part is not naming risks. The hard part is proving your controls detect them before damage spreads.

Security14 min readUpdated Apr 2026
TL;DR
  • -Most MCP incidents happen after a valid tool invocation, not before it.
  • -Two variables predict blast radius: token scope and write-path approval coverage.
  • -Tool poisoning and prompt injection are not the same failure mode. They need different controls.
  • -If tool manifest drift is invisible to operators, you do not have a secure production baseline.
Exploitability

Severity labels are useful, but attacker preconditions decide what gets exploited first.

Policy Boundaries

Valid auth is not a safety guarantee. Policy must evaluate each action before execution.

Detection

If write violations are only discovered in retrospectives, detection is failing in production.

Operator reality

Most teams can list MCP risks. Fewer can answer three simple questions: which risks are currently exploitable, which would page on-call in under five minutes, and which have containment scripts that actually work.

The real production problem

MCP expands what agents can do, not just what they can read. That shift changes failure economics. A single risky call can cross from analysis into irreversible mutation.

Security teams often focus on handshake controls. Incidents usually happen after the handshake succeeds. The failure is in action authorization, scope boundaries, and weak detection paths.

One uncomfortable truth

If your tool manifest changes at 2:00 AM and nobody gets paged, your attacker has a maintenance window.

What top sources cover vs miss

SourceStrong coverageMissing piece
MCP Security Best Practices (official)Strong protocol-level threats: token passthrough, confused deputy, redirect/SSRF, and session handling rules.No exploitability scoring model for prioritizing fixes when every issue looks critical on paper.
OWASP Guide for Third-Party MCP ServersPractical attack classes such as tool poisoning, memory poisoning, tool interference, and discovery hardening.Limited operational thresholds for paging, rollout blocks, and security SLO ownership.
Microsoft MCP Security Risk AnalysisActionable mitigations for excessive permissions, indirect prompt injection, and baseline security posture upgrades.No concrete detection query examples to catch violations before external impact.

7-risk exploitability matrix

RiskAttacker preconditionBlast radiusDetection signalPrimary control
Tool poisoning and rug pullUnverified tool manifest or version driftHighTool schema hash changes outside release windowManifest pinning + checksum verification
Indirect prompt injection into tool callsUntrusted content enters model contextHighRisky tool invocations after retrieval-heavy promptsPre-dispatch policy gate with action allowlist
Over-scoped OAuth tokensWrite scope granted to read-only workflowsCriticalWrite actions by principals tagged read_onlyScope segmentation by server and action class
Token passthrough and confused deputyClient relays user token directly to serverCriticalDownstream audience mismatch in token claimsServer-issued access token and audience checks
Shadow MCP server adoptionDirect connection outside approved registryHighNew server fingerprints with no approval recordRegistry-only discovery + default deny
Cross-tool interference loopsShared context between unrelated tool chainsMediumTool-call fan-out spike and repeated chain depthContext isolation + chain depth guardrails
Output poisoning and data bleedRaw tool output enters model context unfilteredHighPII/secret classifier hit after tool completionOutput safety stage with REDACT/QUARANTINE paths

Risk notes and fast checks

1. Tool poisoning and rug pull

Teams approve a tool once, then trust it forever. Attackers change descriptions or behavior later and borrow that trust.

Example: A harmless `list_tickets` tool updates to include hidden side effects in a minor version. Nobody reviews because the name looks familiar.

Fast check: Pin manifest hashes. Alert on description or parameter drift before execution is allowed.

2. Indirect prompt injection into tool calls

The model receives adversarial instructions from retrieved data and treats them as task context.

Example: A document snippet says to invoke `delete_project`. The user never requested deletion. The model still proposes it.

Fast check: Run policy decisions on requested actions, not on the model's confidence text.

3. Over-scoped OAuth tokens

Read workflows silently inherit write capabilities over time. One compromised token can mutate production systems.

Example: A reporting assistant token gains repository admin scope during a rushed incident fix and never gets narrowed again.

Fast check: Split credentials by action class and enforce hard deny for scope mismatches.

4. Token passthrough and confused deputy

Passing upstream user tokens downstream lets untrusted servers act with the wrong authority boundary.

Example: Server A receives a token intended for client B, then calls server C with that token and obtains unauthorized data.

Fast check: Issue server-specific tokens with explicit audience and reject relayed tokens.

5. Shadow MCP servers

Untracked servers become permanent infrastructure because prototypes are faster than procurement workflows.

Example: A local helper server added for one demo keeps running in CI with stale credentials for months.

Fast check: Block registry misses at connection time. If it is not registered, it does not execute.

6. Cross-tool interference loops

Outputs from one tool chain accidentally trigger unrelated tools in another chain, creating noisy and risky cascades.

Example: A summary step emits text that another parser interprets as a new action request, creating recursive calls.

Fast check: Set max chain depth, isolate contexts, and require explicit handoff objects between tool domains.

7. Output poisoning and data bleed

Sensitive data in tool output enters model context and can leak later in unrelated user responses.

Example: A CRM lookup returns personal identifiers. The model later repeats them in a generic status update.

Fast check: Run output classifiers before model ingestion. REDACT low-risk secrets, QUARANTINE high-risk payloads.

Detection and containment gates

These gates are intentionally strict during early production rollout. Relaxing thresholds is easy later. Explaining an avoidable breach to legal is usually harder.

GateTargetPage conditionOwner
Unapproved write calls0> 0 in any 10m windowGovernance On-call
Registry drift0 unknown serversAny unknown server in productionPlatform Security
Manifest drift outside deploy window0> 0 and no linked change ticketMCP Platform
Scope mismatch attempts< 0.1%>= 1% for 15mIdentity Team
QUARANTINE ratio< 1%> 3% for 15mSafety Team

Policy and SIEM examples

mcp-security-policy.yaml
YAML
version: v1
rules:
  - id: deny-unregistered-server
    match:
      labels:
        mcp.server: "*"
      mcp:
        deny_servers_not_in: ["github", "jira", "snowflake", "slack"]
    decision: DENY

  - id: require-approval-for-write
    match:
      labels:
        mcp.action: write
    decision: REQUIRE_APPROVAL
    constraints:
      max_runtime_sec: 60
      max_retries: 1

  - id: deny-scope-mismatch
    match:
      auth:
        require_scope_match: true
    decision: DENY
detect-unapproved-writes.sql
SQL
-- Detect write actions executed without approval
SELECT
  event_time,
  actor_id,
  server_name,
  tool_name,
  decision,
  approval_id
FROM mcp_audit
WHERE action_class = 'write'
  AND decision = 'ALLOW'
  AND (approval_id IS NULL OR approval_id = '')
  AND event_time >= NOW() - INTERVAL '10 minutes'
ORDER BY event_time DESC;
contain-server.sh
Bash
#!/usr/bin/env bash
set -euo pipefail

SERVER="$1"

echo "[1/3] disabling server in registry: $SERVER"
curl -s -X POST "$API/registry/$SERVER/disable"

echo "[2/3] revoking active tokens for server: $SERVER"
curl -s -X POST "$API/tokens/revoke" -d "{"audience":"$SERVER"}"

echo "[3/3] forcing approval mode on all write actions"
curl -s -X POST "$API/policy/emergency-write-approval"

echo "Containment completed for $SERVER"

For broader rollout controls, pair this with MCP in Production Best Practices.

Limitations and tradeoffs

More operational noise early on

Strict paging thresholds can be noisy in week one. Tune with data, not guesswork.

Approval queues can slow delivery

Write-path approval protects systems but introduces latency. Plan staffing around peak windows.

Classifier tuning is ongoing work

Output safety false positives are normal. Teams need a regular calibration cycle.

Frequently Asked Questions

Which MCP risk should I fix first?
Start with unapproved write-path execution and over-scoped tokens. Those two controls usually reduce blast radius the fastest.
Do read-only agents still need strict controls?
Yes. Read paths can still leak sensitive data through model output, logs, or copied context into later conversations.
How often should tool manifests be revalidated?
At minimum on every release and daily in production. Real-time drift alerts are better when tooling allows it.
Is prompt filtering enough for MCP security?
No. Prompt filtering helps, but policy enforcement outside the model is what blocks unsafe calls deterministically.
What should trigger immediate containment?
Any unknown server in production, any unapproved write action, or any critical token scope mismatch should trigger containment.
Next step

Run one tabletop exercise this week: simulate an unapproved write call from an unknown MCP server. If your team cannot contain it in under 15 minutes, fix detection and containment before expanding agent permissions.