Building an MCP server — lessons from thunderbit-mcp

DEV Community

Ethan Cole

May 11, 2026, 12:05 AM

When we started building thunderbit-mcp, the plan sounded straightforward: expose Thunderbit's web extraction API to AI coding agents through the Model Context Protocol. In practice, the hard parts were not the SDK calls. The hard parts were product-shaped: How many tools should the server expose? What should a tool return when a page is blocked, slow, or only partially extracted? Should the server run locally over stdio, remotely over HTTP, or both? How much should the LLM decide, and how much should the tool force into a schema? What makes an MCP server feel dependable instead of magical? This post is a field guide from shipping an MCP server for web data extraction. The examples use Thunderbit because that is the system we were working on, but the lessons apply to most MCP servers that wrap an existing API. MCP gives you a clean frame: a host application talks to an MCP server; the server exposes tools, resources, prompts, and capabilities; messages move over JSON-RPC; the connection goes through initialization, operation, and shutdown. That sounds tiny, which is part of the appeal. But the minute you ship a server to real users, you are no longer only designing a protocol adapter. You are designing an interface for an AI agent. That changes the questions. A REST API can assume the caller is a developer who read the docs. An MCP tool is often called by a model that inferred intent from one sentence: "Grab the pricing tables from these competitor pages and give me a CSV." The model may not know whether to fetch raw HTML, render JavaScript, extract structured fields, paginate, or retry with a different region. A good MCP server turns that ambiguity into a small number of safe, predictable decisions. For thunderbit-mcp, we treated the MCP layer as a product API, not a thin wrapper around every internal endpoint. The first temptation is to expose everything: distill extract batchDistill batchExtract getJob cancelJob listJobs render screenshot proxyDebug credits schemaInfer That looks complete, but it creates decision fatigue for the model. Tool descriptions start overlapping. The agent has to decide between five similar verbs before it has even helped the user. We had better results when tools mapped to user intent instead of internal API shape: fetch_page_content(url, options) extract_structured_data(url, schema, options) extract_many_pages(urls, schema_or_mode, webhook?) check_extraction_job(job_id) The important detail is not the exact names. It is that each tool answers a distinct question: "I need readable page content." "I need fields that match a schema." "I need to run this across many URLs." "I need to check async progress." If two tools are hard for you to explain in one sentence without using implementation words, merge them or make one an option. In normal API design, descriptions are documentation. In MCP, descriptions are also model steering. This means vague descriptions are expensive. Bad: Extract data from a URL. Better: Extract structured data from a public web page using a JSON Schema. Use this when the user asks for specific fields such as prices, names, emails, dates, reviews, listings, tables, or product attributes. Returns JSON that conforms to the provided schema when possible. That description teaches the agent when to call the tool. It also prevents the common mistake of using extraction when the user only needs a readable summary. We also learned to include negative guidance: Do not use this tool for private pages that require the user's logged-in browser session. Do not use it for actions such as clicking buttons, submitting forms, or making purchases. Negative guidance matters because web automation is a broad mental category. If your server only reads pages, say so. If it can act on pages, be even more explicit. For web extraction, a natural first version is: { "url": "https://example.com/product", "prompt": "Get the product name, price, rating, and availability." } That works for demos. It is less fun in production. We moved toward JSON Schema as the primary contract: { "type": "object", "properties": { "name": { "type": "string", "description": "The product name as shown on the page" }, "price": { "type": "number", "description": "Current listed price in USD, excluding shipping" }, "inStock": { "type": "boolean", "description": "Whether the product appears available to buy" } }, "required": ["name", "price"] } This did three useful things: It made the user's expected output machine-checkable. It let the model create or refine the schema before calling the tool. It reduced downstream cleanup because the result already had shape. The funny thing about agents is that they are often better at writing a schema than at remembering all the implicit constraints in a prose prompt. Use that. AI agents do not need poetic error messages. They need errors they can act on. For thunderbit-mcp, we tried to keep tool failures in a small set of categories: INVALID_INPUT AUTH_REQUIRED RATE_LIMITED FETCH_FAILED EXTRACTION_FAILED PARTIAL_RESULT JOB_PENDING Each error includes: a short human-readable message whether retrying makes sense any safe next action the request or job ID for debugging Example: { "code": "RATE_LIMITED", "message": "The request hit the current account rate limit.", "retryable": true, "retryAfterSeconds": 60 } The goal is not to hide complexity. It is to keep the agent from improvising. A model that sees retryAfterSeconds is much more likely to wait or explain the limit than to spam the same tool call five times. The MCP spec currently defines two standard transports: stdio and Streamable HTTP. For a first server, stdio is usually the calmest path: The client launches your server as a subprocess. You read JSON-RPC messages from stdin. You write protocol messages to stdout. You write logs to stderr. That last point is worth underlining. Do not log to stdout. In stdio MCP, stdout is protocol space. A single stray console.log("debug") can break the client connection. stdio is a good fit when: users run the server locally configuration is mostly environment variables the server is a wrapper around an API you want broad compatibility with desktop agents and coding tools Streamable HTTP becomes attractive when: you want a hosted server auth is browser-based or OAuth-based multiple clients need to connect you need resumability, session management, or server-to-client notifications For Thunderbit, stdio made the initial developer workflow simple: install, add config to the MCP client, pass an API key, and start using tools. A remote HTTP server is a better second step once the auth and tenancy story are mature. Auth is not just a security feature. In MCP, auth is often the first moment of truth. If setup requires five steps, three dashboards, and a mystery config file, many users will assume the server is broken. The local stdio version should make the happy path obvious: npx thunderbit-mcp And the MCP client config should be boring: { "mcpServers": { "thunderbit": { "command": "npx", "args": ["thunderbit-mcp"], "env": { "THUNDERBIT_API_KEY": "your_api_key" } } } } For hosted transports, use real auth. The MCP transport docs call out important security protections for HTTP servers, including origin validation, localhost binding for local servers, and proper authentication. Do not treat "it is just an agent tool" as a reason to relax security. Agent tools are exactly where you want clean boundaries. A good MCP server should reduce clarification loops. For web extraction, the model often needs to know: Should JavaScript be rendered? Should the server follow pagination? Should it return Markdown or JSON? Should it run one URL or many? Is partial data acceptable? You can force the model to ask the user every time, but that makes the workflow feel brittle. Instead, set defaults that match the common case and expose options for the edge cases. For example: { "url": "https://example.com", "renderMode": "auto", "countryCode": "US", "maxPages": 1, "includeLinks": false } The model can still override these when the user says "include all pagination" or "check the German version." But the default path stays short. Batch extraction is not instant. That is fine, as long as the tool gives the agent a narrative it can relay to the user. Bad async response: { "id": "job_123" } Better: { "jobId": "job_123", "status": "queued", "submittedUrls": 80, "estimatedCompletionSeconds": 120, "nextAction": "Call check_extraction_job with this jobId." } Agents are very literal. If there is a next action, put it in the result. If there is no next action, say that too. The MCP Registry is now the official centralized metadata repository for publicly accessible MCP servers, currently in preview. That is good news for discovery, but it also raises the bar for packaging. Before submitting, check the unglamorous parts: Is the package name stable? Is the README installation flow tested from scratch? Are required environment variables documented? Does the server expose a useful version? Are tool names stable enough to avoid breaking users? Does the server fail gracefully without credentials? Is there a minimal example for at least one popular MCP client? Registry metadata is not a substitute for a good first run. If the first command fails silently, discovery will not save you. The best MCP servers are not neutral pipes. They encode judgment. For thunderbit-mcp, those opinions were: Prefer structured output when the user asks for fields. Prefer cleaned Markdown when the user asks to read, summarize, or compare pages. Prefer batch tools when the user provides many URLs. Avoid browser actions unless the capability is explicitly supported. Return partial results clearly instead of pretending everything succeeded. Keep credentials out of prompts and tool outputs. Your opinions will be different. The point is to have them. An MCP server that exposes every knob equally forces the model to become your product manager at runtime. That is rarely what you want. If I were starting another MCP server tomorrow, I would use this checklist: Start with three to five tools. Write tool descriptions like model instructions, not API docs. Use structured inputs and outputs everywhere. Put logs on stderr for stdio servers. Add stable error codes before adding more features. Test with real agent prompts, not only direct tool calls. Include one copy-paste client config in the README. Document auth failure, rate limits, retries, and partial results. Decide which transport is primary before designing auth. Treat registry submission as part of the release, not an afterthought. Thunderbit is an AI web scraper and web extraction platform. The API is designed to turn web pages into clean Markdown or structured JSON while handling common scraping problems like JavaScript rendering, noisy HTML, anti-bot friction, geo-routing, batch jobs, and webhooks. That makes it a natural fit for MCP: agents often need fresh web data, but they should not have to manage a browser cluster or maintain brittle CSS selectors just to answer a question. The weakness is also clear: Thunderbit is not the right tool for every MCP job. If you only need to read local files, query your own database, or call a simple internal API, a tiny custom MCP server will be cheaper and more direct. Thunderbit makes sense when the hard part is the public web. That distinction matters. MCP works best when each server has a sharp job. Building an MCP server is easy in the same way building a CLI is easy: the first command can work in an afternoon. Shipping one people trust takes longer. You have to design the verbs, the defaults, the errors, the auth flow, the packaging, and the story the agent tells when something goes wrong. The protocol gives you the rail. The product work is deciding where the rail should go. That was the biggest lesson from thunderbit-mcp: the server is not just how an AI calls your API. It is how your API becomes part of somebody's thinking loop.