Book a Free Strategy Call
Skip the read — talk to Walid in 30 min.
Free strategy call. We map your AI engineering team, you keep the notes.
In 2024, "AI engineer" meant fine-tuning models and chaining prompts. By 2026, it means something else entirely: building the connective tissue that lets agents act on real systems. The Model Context Protocol (MCP) is that tissue, and the people who can ship production MCP servers are the ones quietly running the most leveraged practices in AI consulting.
The hard part is not the protocol — the spec is small enough to read in an afternoon. The hard part is everything around it: choosing the right transport, designing tool surfaces that don't blow up an agent's context, locking down authentication, handling long-running operations, and deploying servers that don't fall over when a Claude Code session hammers them with 80 parallel tool calls. Most "MCP tutorials" stop at hello-world. This one doesn't.
This guide walks through MCP server development end-to-end in 2026: what the protocol actually is, the current SDK and transport stack, the anatomy of a production-grade server, the dev workflow we use at AY Automate, security non-negotiables, deployment options from local stdio to Vercel, and real walkthroughs of patterns we've seen work — and fail — in production.
What MCP is and isn't
The Model Context Protocol is an open spec, originally proposed by Anthropic and now adopted across Claude, ChatGPT, Cursor, Windsurf, and dozens of IDE and agent frameworks. It standardizes one specific thing: how an LLM client and an external tool server talk to each other. That's it. It is not an agent framework, not an orchestration layer, not a model.
The mental model is JSON-RPC 2.0 over a transport, with a small typed schema for three primitives:
- Tools — callable functions the model can invoke (
search_issues,send_email,query_db). - Resources — readable content the client can fetch and put into context (
file://,db://table,notion://page). - Prompts — reusable prompt templates the client can surface as slash commands or quick actions.
A newer fourth primitive — sampling — lets the server ask the client to run an LLM completion on its behalf. This is what makes MCP servers genuinely composable: a server can be both a tool provider and a tool consumer.
What MCP is not: it is not a transport, not an auth scheme, not a deployment model. The spec deliberately leaves those open. That is why the ecosystem fragmented quickly — stdio servers, SSE servers, streamable HTTP servers, OAuth-protected servers, and locally-piped servers all coexist. Understanding which to use for which problem is half the job.
If you want a curated tour of what production MCP looks like in the wild, we maintain a running list in the best Claude Code MCP servers.
The MCP server stack in 2026
The official SDKs are the right starting point. Skip the hand-rolled implementations — they exist, but the SDKs handle protocol versioning, capability negotiation, and the dozen edge cases you'd otherwise hit at 2am.
TypeScript SDK (@modelcontextprotocol/sdk). The dominant choice in 2026. Used by GitHub MCP, Linear MCP, Sentry MCP, and most agency builds. Strong typing through Zod schemas, first-class support for stdio + SSE + streamable HTTP, and the cleanest async ergonomics. If you're shipping to Vercel, Cloudflare Workers, or running inside a Next.js app, this is the default.
Python SDK (mcp). Second most common, dominant for data and ML use cases. Pairs naturally with FastAPI, pandas, and the Anthropic SDK. Slightly behind TypeScript on streamable HTTP polish but identical for stdio and SSE servers. Pick this when your tools wrap Python-native systems (vector DBs, scikit pipelines, internal ML services).
Other SDKs. Rust (rmcp), Go (mcp-go), Kotlin, C#, Swift, and Ruby implementations all exist and are spec-conformant. They are useful when embedding MCP into an existing service in that language, but for greenfield work TypeScript or Python is still the right answer.
Transports — choose carefully.
- stdio — server runs as a subprocess of the client, communicates over stdin/stdout. Zero network surface, zero auth needed (the client owns the process). Default for desktop tools, IDE plugins, and Claude Code local servers.
- SSE (Server-Sent Events) — HTTP-based, server pushes a stream. Used by hosted servers for older clients. Being phased out in favor of streamable HTTP.
- Streamable HTTP — the 2026 default for hosted servers. Single endpoint, supports bidirectional streaming via chunked responses, plays well with serverless. This is what you want for any internet-reachable MCP server.
A common mistake: building an SSE server in 2026. Don't. Go streamable HTTP from day one.
Anatomy of a production MCP server
A toy MCP server has one tool, one transport, and no auth. A production server has a lot more going on. Here is the actual shape of a server we'd ship to a client:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
const server = new McpServer({
name: "acme-ops",
version: "1.4.2",
});
// Tools — actions the agent can take
server.tool(
"create_ticket",
{
title: z.string().min(3).max(200),
priority: z.enum(["low", "medium", "high", "urgent"]),
assignee: z.string().email().optional(),
},
async ({ title, priority, assignee }, { authInfo }) => {
requireScope(authInfo, "tickets:write");
const ticket = await ticketService.create({ title, priority, assignee });
return {
content: [{ type: "text", text: `Created ticket ${ticket.id}` }],
structuredContent: { ticketId: ticket.id, url: ticket.url },
};
}
);
// Resources — readable content the client can attach
server.resource("ticket", "ticket://{id}", async (uri) => {
const id = uri.pathname.slice(1);
const ticket = await ticketService.get(id);
return {
contents: [{ uri: uri.href, mimeType: "application/json", text: JSON.stringify(ticket) }],
};
});
// Prompts — reusable templates surfaced as slash commands
server.prompt("triage", { ticketId: z.string() }, ({ ticketId }) => ({
messages: [
{
role: "user",
content: { type: "text", text: `Triage ticket ${ticketId}. Check duplicates, suggest priority, propose owner.` },
},
],
}));
The non-obvious production concerns:
- Tool surface area. Each tool you expose lives in every agent context for the entire session. Twenty tools at 200 tokens of schema each is 4K tokens before the user types anything. Keep tool names short, descriptions tight, and split large servers into multiple smaller ones the user can enable selectively.
- Structured content. Return both human-readable
textand machine-readablestructuredContent. The agent uses the text, the calling app uses the structure. - Progress notifications. Long-running tools (deploys, large queries) should emit
progressnotifications. Otherwise the client times out or the agent assumes failure. - Cancellation. Honor the
cancellednotification. A user who hits escape expects the in-flight tool call to actually stop, not finish and bill them for the API call. - Error shape. Return errors as content with
isError: true, not by throwing. Thrown exceptions become protocol errors and the agent often can't recover gracefully. - Sampling. If your server needs an LLM call (summarizing a long resource, classifying a result), use sampling to delegate to the client's model. The user pays once, on the model they chose.
Development workflow
The workflow we use at AY Automate, in order:
1. Spec the tool surface before writing code. A markdown doc listing every tool name, its inputs, its outputs, and the business operation it maps to. Five tools is usually too many for v1 — start with two, get them right, then expand. We've thrown away entire MCP servers that "worked" because the tool design was wrong.
2. Scaffold with the official SDK. npx @modelcontextprotocol/create-server for TypeScript or uv pip install mcp for Python. Skip third-party scaffolders; they drift from the spec.
3. Local stdio first. Build the server as stdio. It's the fastest dev loop — no ports, no auth, no CORS. Wire it into Claude Code or Claude Desktop via claude_desktop_config.json or .mcp.json:
{
"mcpServers": {
"acme-ops": {
"command": "node",
"args": ["./dist/server.js"],
"env": { "ACME_API_TOKEN": "..." }
}
}
}
4. Test with mcp-cli and the inspector. The MCP Inspector (npx @modelcontextprotocol/inspector node ./dist/server.js) is a web UI that lets you call every tool, read every resource, and invoke every prompt without going through an agent. Use it to catch schema bugs before they confuse Claude.
5. Real-agent test loop. Once tools work in the inspector, run them in Claude Code with a real task. Watch which tools the agent picks, where it gets confused, and where it loops. Most tool surface bugs only show up under agent pressure.
6. Add transport layer. When stdio works end-to-end, wrap the same handlers in a streamable HTTP server. The SDK makes this a five-line change. Don't rewrite — just swap the transport.
7. Add auth, then deploy. Auth before deploy, every time. We cover that next.
For a fuller walkthrough of an end-to-end build with code, see how to build a custom MCP server.
Security checklist
Most MCP servers we audit fail at least three of these. None of them are negotiable for a production server with internet exposure:
- Authentication on the transport. OAuth 2.1 with PKCE is the spec-recommended pattern for hosted MCP servers in 2026. Bearer tokens are acceptable for B2B integrations where you control both ends. Never ship a streamable HTTP MCP server without auth — assume any unauthenticated server will be found and abused within 48 hours.
- Scoped permissions per tool. Not all tools should be callable by all tokens. A
read_invoicestoken should not be able to calldelete_customer. Implement scopes at the tool boundary, not just at the transport. - Confused deputy defense. When your MCP server calls an upstream API on behalf of a user, the upstream call must use the user's identity, not the server's. Otherwise an attacker who gets a low-privilege MCP token can ride your server's higher-privilege upstream credentials.
- Input validation everywhere. Zod (TS) or Pydantic (Python) at every tool boundary. The LLM will send you malformed inputs. Sometimes deliberately, because of prompt injection in resources it has read.
- Output sanitization. Tool results flow back into the agent's context. A tool that returns attacker-controlled HTML or markdown can prompt-inject the agent into calling your other tools maliciously. Sanitize URLs, strip control characters, and treat all external data as hostile.
- Secrets in env, not code. Never inline API keys. Use the host's secret manager — Vercel env vars, AWS Secrets Manager, Doppler, 1Password Connect.
- Rate limiting. Per-token, per-tool, per-IP. Agents loop. A single buggy agent can make 10,000 tool calls in an hour.
- Audit logging. Every tool call, who called it, what arguments, what result. Structured JSON logs to a sink you can actually query.
Deploying MCP servers
Five deployment patterns we use, in increasing complexity:
Local stdio. Server runs on the user's machine as a subprocess. Zero infra, zero auth, full filesystem access. Perfect for dev tools, personal automation, and anything that touches local resources. Distribute via npm, pip, or a single binary.
Docker. Same stdio or HTTP server, packaged. Useful when the server has heavy native deps (Playwright, ffmpeg, a Python ML stack). Users run docker run ghcr.io/acme/acme-mcp and wire it into their client config.
Cloud Run / Fly / Railway. Streamable HTTP server in a container, scaled by request. Cheap, fast cold starts in 2026, supports long-lived streams. Good default for B2B MCP servers serving 1–50 known clients.
Serverless (AWS Lambda + API Gateway). Works for streamable HTTP if you set up streaming responses correctly. Watch for the 15-minute hard cap on Lambda execution and the 30-second API Gateway default. Reserve this for stateless, short-call MCP servers.
Vercel. Our preferred host for client MCP servers in 2026. Fluid Compute handles streaming responses, environment variables and OIDC auth are first-class, deploys are atomic, and you can colocate the MCP server with the rest of the product. Use the streamable HTTP transport, set maxDuration to 300, and put the handler in a route handler under /api/mcp. Vercel also handles auth in front via middleware cleanly — pair it with Clerk or Auth0 for OAuth flows.
For internal-only MCP servers (HR tools, finance tools, anything with sensitive data), we usually deploy to the client's existing private VPC behind their SSO, not to public hosting.
Real examples
Three production MCP servers worth studying — each illustrates a different pattern.
GitHub MCP (official, by GitHub). The reference example of how to wrap a large existing REST API. ~50 tools covering issues, PRs, code search, actions, releases. Lessons to steal: tool names always start with a verb, descriptions are one sentence each, structured output mirrors the GitHub API JSON shape, and read-only tools are visually separated from mutating ones. Their pagination handling is also worth copying — they return cursor-based continuation tokens in structuredContent so the agent can fetch the next page deterministically.
mcp-gsc (Google Search Console). A smaller, vertical MCP server we use heavily at AY Automate for SEO work. ~20 tools, all read-shaped, OAuth to Google in front. Pattern to steal: every tool that returns a list also accepts limit and orderBy parameters, so the agent can shape the response itself instead of post-processing. Saves thousands of tokens per session. Built in Python with FastMCP, deployed as a local stdio server (because the OAuth tokens live on the user's machine, not in the cloud).
Filesystem MCP (official reference). Looks trivial. It isn't. It demonstrates the right way to scope file access (root directories declared at startup, all paths normalized and validated against roots before any I/O), how to return file content as resources rather than tool output (so the agent can cache and re-read without burning tool calls), and how to express common operations as a small, sharp tool set rather than one mega-tool. Read its source before you build anything that touches a filesystem.
A pattern across all three: small tool surfaces, sharp boundaries, no clever abstractions. Every successful MCP server we've shipped has fewer tools than the team's first instinct.
Common pitfalls
- Too many tools. Twenty tools sounds reasonable until you realize each one costs context every turn. Split into multiple servers and let users enable subsets.
- Tool descriptions that are too long. The model reads them every time. Aim for one sentence each. Put detail in error messages, not descriptions.
- Returning megabytes of data. A tool that returns 50K tokens of JSON destroys the agent's context budget. Page it, summarize it, or return a resource URI instead.
- Long-running tools without progress. A 90-second deploy with no progress notifications looks frozen. The agent gives up, the user retries, and now the deploy is running twice.
- Tool names that aren't verbs.
usersis a bad tool name.list_usersis good.user_managementis terrible. - No structured content. Returning only text forces the calling app to regex the output. Always return both.
- Stdio servers that print to stdout for logging. Stdout is the protocol channel. Log to stderr. Always.
- Shipping without auth on a public server. This will end badly. Always.
- Building one giant MCP server instead of several focused ones. Composition is the point.
How AY Automate builds MCP servers for clients
We've built MCP servers for legal automation pipelines, sales-ops workflows, internal ML platforms, multi-tenant SaaS products, and Claude Code workflow extensions. The pattern is consistent: discovery week to map the tool surface against the real business operations, two-week scoped build with weekly demos, then a hardening pass on auth, observability, and deploy.
What makes our work different is the depth on the Claude side. We are an Claude Code agency by trade — we build the agents that consume MCP servers as much as we build the servers themselves. That means our tool surfaces are designed for how Claude actually behaves under load, not how an MCP server "should" theoretically look. We've watched Claude Code make 200-tool-call sessions against servers we built, and we tune for that.
Stack-wise: TypeScript SDK by default, Python when the tools wrap Python systems, streamable HTTP for hosted, stdio for local, OAuth 2.1 with PKCE for auth, Vercel or the client's VPC for deploy, OpenTelemetry for observability, and a full test suite that runs both inspector-driven assertions and live Claude Code regression tests. Every server we ship comes with a CLAUDE.md describing how the agent should use it.
If you're scoping an MCP server build — internal tooling, a customer-facing extension, or a vertical MCP product — book a consultation and we'll walk through the right architecture in 30 minutes. We work in English, French, and Arabic.
FAQ
What is an MCP server?
An MCP server is a small program that exposes tools, resources, and prompts over the Model Context Protocol, so that LLM clients like Claude, ChatGPT, or Cursor can interact with external systems in a standardized way. Think of it as a typed API designed specifically for AI agents.
How is MCP server development different from building a REST API?
The shape is similar (typed endpoints, structured responses) but the consumer is different. A REST API is consumed by deterministic code; an MCP server is consumed by a probabilistic agent that reads tool descriptions, picks tools, and may misuse them. That changes how you design tool boundaries, write descriptions, validate inputs, and shape outputs.
Do I need TypeScript or can I use Python?
Both SDKs are first-class. Use TypeScript if you're shipping to Vercel, Cloudflare, or a Next.js app. Use Python if your tools wrap Python systems (ML, data, scientific computing). Either is fine; don't agonize over the choice.
How long does it take to build a production MCP server?
A focused server (5–10 tools, one transport, OAuth) takes a strong engineer 1–3 weeks including testing and deploy. A large multi-domain server can take 6–10 weeks. Most of the time goes into tool design and security, not the protocol layer.
Should I use stdio or streamable HTTP?
Stdio for local tools the user runs on their own machine — IDE integrations, personal automation, anything touching the local filesystem. Streamable HTTP for hosted servers used by multiple users or accessed over the internet. Don't use SSE in 2026 unless you must support a legacy client.
Is MCP only for Claude?
No. The protocol is open and adopted by ChatGPT (via custom connectors), Cursor, Windsurf, Cline, Continue, Zed, and many others. A well-built MCP server works across all of them.
Where can I find existing MCP servers to learn from?
The official modelcontextprotocol/servers repo on GitHub, the Anthropic-curated registry, and our list of the best Claude Code MCP servers. Read the source of GitHub MCP, the Filesystem reference server, and mcp-gsc before writing your own.
Should we build internally or hire an agency?
If your team already ships TypeScript or Python services and understands LLM agent behavior, build internally — the protocol isn't hard. If you don't yet have agent engineering experience on the team, or you need the server in production in under a month, working with a Claude Code agency shortcuts six months of learning. We've made the expensive mistakes already.
Book a Free Strategy Call
Building this in production?
Walid runs a 30-min call to map your AI engineering team. Free, no slides.

Adel keeps the engine running at AY Automate. He owns internal processes, team coordination, and the operational excellence that lets us ship fast for clients.
