Richard Oliver Bray

Server rack with fiber optic cables plugged in

It's recently become 'cool' to say MCP servers suck, or CLIs with skills are better than MCP servers. A lot of tools from Playwright to Google Workspaces and even the Chrome DevTools are all adding CLIs to their existing MCP tools and for good reason.

Tools from MCP servers use a lot of context since all their metadata is loaded into the agent harness before you write your first prompt. Also, you can't replicate results from tool calls by running them outside the agent, which is useful for debugging.

MCP Server?#

Just in case you're unaware or need a refresher on what MCP servers are, this section is for you. MCP (model context protocol) is a standard that allows ai models to securely and easily connect to local or external data sources.

MCP servers take things a step further. They're a piece of software that implements the protocol using either;

tools, executable functions or actions,
prompts, specialised workflows
and/or resources, structured data or files.

In my experience this servers mostly expose tools, so a weather MCP server will allow an agent to call the "getWeather" tool which fetches the weather at that moment for a specific location.

The Context Cost of MCP Tools#

Now I'm not knocking MCPs in general. I think they are a great standard that was introduced to 'connect ai assistants to the systems where data lives'. However, when it comes to ai coding assistants that have been installed on a user's machine, these have access tools like Bash and WebSearch that in most cases are more performant, and use less tokens than running tools from an MCP server.

For example, going back to the weather example (which I know is overused in general for explaining MCPs). If you wanted to install that server you'll first have to connect it to your agent harness (Claude Code, Codex CLI, OpenCode etc..), which then runs the server whenever the harness starts. The server typically has multiple tools not just one. So it could have one tool to get the latest weather details in Celsius and Fahrenheit, one for humidity, one for a seven day weather forecast and all of these tools take up tokens. In some cases, a single server could use up to 8K tokens just by being installed, which is almost 10% of the full context window of a typical state of the art model like Opus 4.6, I will talk more about context windows later.

Also bear in mind that this server is usually written in Python/Typescript and run on Node.js, docker or other equivalents which take up computer resources.

CLIs Do the Same Job For Less#

Now with a CLI, or by using the models built in tools (Bash, WebSearch), you could ask the model to use the bash tool to curl a url to get the weather of your location (which you provide). No unnecessary tools or context used. If you do download a cli, I'm not sure if one exists for getting up-to-date weather information but say one does. They are usually lightweight, and run processes on demand so use less computer resources. Now, I completely understand that you have to be relatively technical to pull this off and the average person wouldn't even feel comfortable using the terminal, let alone be aware of the default tools an agent harness provides.

Where MCP Servers Shine#

This is why I think MCP servers are great. For those using things like Claude CoWork, or building a consumer facing agent with Mastra or similar frameworks, they are the ideal standard. A plug-and-play approach to connect an agent to an external service. However, if you care about using as little context as possible, and having a lean setup on your machine, are using Claude Code or an equivalent via the terminal, I would strongly urge you to switch to CLIs instead of MCP servers.

Context Rot and the Model Dumb Zone#

My original aversion to MCP servers were predominantly due to the fact I would have to download and run servers via npm/npx to do things I could do with a script, but it later grew because of context rot or what people call the model dumb zone. This occurs when a model goes beyond a certain context percentage, usually 80% for anthropic models, they start to produce 'dumb' results. Hallucinate more, become more sycophantic, make more coding errors, even for models with a 1M token context window.

What's more, Claude Code auto compacts a session if more than 80% of the context window is used up, which is annoying if you have a 'golden session', where the information is perfect, the agent is giving amazing answers, and you have to give it all up because you ran out of context. I'm not the biggest fan of compaction but that's a topic for another article. So if I can do things to use as little context as possible, I try to do that, and since MCP tools tend to take up the most amount of context, I use as little of them as possible.

Now, I am fully aware of Anthropic's tool search tool which works using progressive disclosure (I believe) to find the right tool for the job without clogging up context, which I do use. But to my knowledge, it's not a common way of discovering MCP tools in other agent harnesses apart from Claude Code or Cursor. This is why I still advocate for using zero or as little MCP servers as possible for coding assistants, disabling them in sessions where you know you don't need them, or even restricting them to certain projects.

Skills: The Best of Both Worlds#

But there is one huge problem with my example above for CLIs is that you, the human, have to tell the agent to use a specific tool or CLI command to use, whereas with tools from an MCP server, the agent has these loaded into context ahead of time so it can figure out what tool to use simply based on a prompt, this is where Skills come in. These are markdown files that give your agents instructions on when and how to use certain CLI commands. The model knows when to use the right command based on your prompt, although you can explicitly tell a model to use a tool if you want to be certain it will work.

Just like calling an MCP tool sometimes using the right skill can be hit and miss.

Nevertheless I hope you found this useful. If you have any questions or want any points of clarification please feel free to reach out to me on X. This is a somewhat brief post and I didn't go into as much detail as I could have.

Until next time, happy coding 👋