A Claude Code hook that warns you before calling a low-trust MCP server

DEV Community

xaip-agent

Apr 20, 2026, 10:15 AM

Last week researchers at Ox published findings showing that the MCP STDIO transport lets arbitrary command execution slip through unchecked, and that 9 of 11 MCP marketplaces they tested were poisonable. Anthropic's response: STDIO is out of scope for protocol-level fixes, the ecosystem is responsible for operational trust. Fair — Anthropic donated MCP to the Linux Foundation's Agentic AI Foundation in December 2025 specifically so independent infrastructure could grow around it. But that leaves a real gap for anyone running Claude Code today: how do you know whether an MCP server you're about to invoke is trustworthy? The Anthropic official registry is pure metadata (license, commit count, popularity). mcp-scorecard.ai scores repos, not behavior. BlueRock runs OWASP-style static scans. None of these ask the one question that actually matters: Does this MCP server, in real call-time use, work? So I built a small thing to answer it. A zero-config Claude Code hook that does two things on every MCP tool call: Before the call — queries a public trust API for that server. If the score is low, Claude shows an inline warning: ⚠ XAIP: "some-server" trust=0.32 (caution, 87 receipts) Risk: high_error_rate After the call — emits an Ed25519-signed receipt (success, latency, hashed input/output) to a public aggregator that updates the score. Install: npm install -g xaip-claude-hook xaip-claude-hook install Next MCP call fires the hook. That's the whole UX. No raw content leaves your machine — only hashes. { "agentDid": "did:web:context7", "callerDid": "did:key:a1c6cd34…", "toolName": "resolve-library-id", "taskHash": "9f3e…", // sha256(input).slice(0,16) "resultHash": "1b78…", // sha256(response).slice(0,16) "success": true, "latencyMs": 668, "failureType": "", "timestamp": "2026-04-17T04:24:59.925Z", "signature": "...", // Ed25519 over canonical JSON (agent key) "callerSignature": "..." // Ed25519 over canonical JSON (caller key) } The aggregator rejects anything that fails signature verification. The trust API computes a Bayesian score across all verified receipts per server, weighted by caller diversity — so one enthusiastic installer can't fake a reputation. Being transparent: the dataset is small. A curl against the live trust API today: Server Trust Verdict Receipts Flag memory 0.800 trusted 112 — git 0.775 trusted 35 — sqlite 0.753 trusted 42 — puppeteer 0.671 caution 32 high_error_rate context7 0.618 caution 560 low_caller_diversity filesystem 0.579 caution 610 low_caller_diversity playwright 0.394 low_trust 37 high_error_rate fetch 0.365 low_trust 36 high_error_rate Verify any of these yourself: curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7 The low_caller_diversity flag on high-volume servers is the single most honest number in that table. It means: I'm the biggest caller right now, and that's exactly the problem this tool is supposed to solve. The flag only clears when independent installers start generating receipts — which is what the npm package is for. Every other "MCP trust" project I've seen scores the repository: Commit frequency, license, stars, contributor count (mcp-scorecard.ai) Static source-code vulnerability scans (BlueRock) Registry inclusion as implicit trust (official MCP registry) These are useful proxies, but none of them tell you whether a server works in practice. A well-maintained repo can have a buggy release; a single-author repo can be rock solid; a newly-forked malicious repo looks identical to the original under static scan. XAIP scores observed behavior. Every call is a signed attestation. The scoring is Bayesian, so: Servers with few receipts get insufficient_data — no verdict, no warning High-variance patterns (mixed success/failure) get lower confidence The high_error_rate flag is computed from real response content, classifying quota exceeded, rate limit, unauthorized, and "isError": true as failures This is the same philosophy as OpenSSF Scorecard vs. runtime attestation in supply chain: you want both, but only one of them catches regressions in production. I want to be specific about limitations, because "AI trust protocol" posts tend to overpromise: ~10 servers, ~1500 receipts total. Small. This post is partly an ask for installers to fix that. One aggregator node. Byzantine fault tolerance requires quorum; right now there's one Cloudflare Worker. Quorum needs multiple operators, which is the next milestone. Client-side inferSuccess is heuristic. We look at response text for error patterns. False positives and negatives are possible — fetch's 36% error rate might be over-counted (legit 404s shouldn't hurt the server's score) or real. Privacy model relies on hashes, not ZK. Inputs and outputs are hashed before transmission, but statistical correlation across taskHashes is possible in principle. Migration to ZK receipt aggregation is a future idea, not a current feature. I personally generated most of the high-volume receipts. The low_caller_diversity flag you see on context7 and filesystem is me. npm install -g xaip-claude-hook xaip-claude-hook install xaip-claude-hook status Open a new Claude Code session. Call any MCP tool. Check: cat ~/.xaip/hook.log You'll see lines like: 2026-04-17T04:24:59Z POST context7/resolve-library-id ok=true lat=668ms → 200 And the next time you (or Claude) invoke a low-trust server, the warning shows up inline. Uninstall is a single command. Keys under ~/.xaip/ persist — delete manually to wipe. npm: https://www.npmjs.com/package/xaip-claude-hook — npm install -g xaip-claude-hook Repo: https://github.com/xkumakichi/xaip-protocol Hook source: https://github.com/xkumakichi/xaip-protocol/tree/main/clients/claude-code-hook Live Trust API: https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7 Aggregator: https://xaip-aggregator.kuma-github.workers.dev Issues, scoring bugs, angry takes — all welcome on GitHub. If you maintain an MCP server and your score looks wrong, I want to hear about it first.