Easy AI Chat Integrations Are a Trap

Oct 5

The Death of the “Junior Developer” is Overrated

A lot has been made about how the advent of AI-driven low-code tools and vibe coding means companies no longer need to hire “junior developers”. But in at least one way, AI makes junior developers’ jobs much easier and higher-leverage. To understand this, look no further than the AI “chat completions” API.

With chat completions available as a RESTful web service and LLM-based AI models equipped with a wide range of capabilities via Model Context Protocol (MCP) servers wrapping other APIs, all one has to do to access any functionality you want is to call this single API with a properly crafted prompt.

Chat Completions for AI Mashups

Back in the old days, if you wanted to, say, get the current weather in a given city, first you’d have to research what open APIs were available. Then you’d have to read the documentation and sample code to figure out how to call it. Maybe the API takes a zip code as input. So then you’d need to find a second service to map a city name to a zip code. Then in a real production scenario, you’d also want to verify what kind of SLAs you could expect from these two APIs. You’d experiment with them by calling it with Postman or curl. And finally you’d write code to integrate the API-based workflow into your application.

If you recall the original meaning behind “Web 2.0”, you’ll recognize that we used to call this a “mashup”. Those mashups took real engineering work. Today, AI makes it look almost effortless.

Effortless Integration is a Seductive Illusion

But thanks to AI as a service, these days you can simply create a “prompt” which in this case is just a glorified question like, “What’s the current temperature in Denver?”, POST it to a chat completions API, and get a plausible result back. This is a very simple example of a broad class of “agentic AI” systems.

The AI does the mashup for you. What’s likely happening behind the scenes is that somebody has taken the city zip code and weather APIs and created Model Context Protocol servers for them. (Don’t get confused by the overly fancy terminology: “Model Context Protocol server” is just a fancy way of saying “API wrapper that I tell an LLM about”.)

Any “full stack” developer straight out of bootcamp can pull off this chat completions trick. This workflow is the literal business model of tons of “AI-based developer tools” and “vibe coding” startups. It’s the general idea behind the hypothetical “single-person billion-dollar startup”.

And it’s a potentially very bad idea.

AI Integration Is Great, But Have You Tried Actual Software Engineering?

What could possibly go wrong with relying on AI chat completions and MCP servers as a mashup replacement? Let’s count the ways.

It’s Slow

Typical REST API calls take on the order of milliseconds. Even the simplest AI chat calls take on the order of seconds. If you’re a consumer, you’re seeing all kinds of UI tricks to try to cover this up. But if you are the type who runs performance tests, the data is impossible to argue with. And this problem isn’t going away anytime soon. If you go back and read the last section, it should be easy to see that an AI-based service integration can never be as fast as a bespoke workflow, and it’s not even close.

It’s Unreliable

If you follow this space, you have probably read some of my discussions about integrating AI into production use cases and in particular how hallucinations and unreliability are features, not bugs, of LLMs. If 99.9% reliability is “good enough” for you, go for it, I guess? But if you’re planning on building a system or business that needs to be bulletproof at scale, relying on AI for all of your service integrations is maybe not the best idea.

It’s Expensive

When you inject an AI chat completions provider in between you and the APIs you need to call, you’ve just added another vendor to your cloud bill. And if you do the math, chat completions calls are far more expensive that a “typical” SaaS API. Not only that, but I’ll remind you of the AI industry’s dirty little secret, which is that right now a lot of providers are hiding their true costs from you. When (not if) they eventually raise their prices after you’re locked in, you may come to regret some of your architectural decisions.

It’s Opaque and Unmaintainable

When everything is mediated through a prompt, observability and debugging go out the window. You can’t unit test a sentence. You can’t easily version-control a “prompt” that’s been tweaked twenty times in production. When the output changes because the model weights changed upstream, you have no way to know why. Traditional software engineering disciplines like testing, reproducibility, and metrics don’t apply nearly as neatly to LLM interactions, which means your integration layer becomes a black box with a friendly smiley face.

It’s Suboptimal

Even when the LLM produces a “correct” result, the way it uses third-party APIs is often far from optimal. The model might choose to make multiple calls where a single call would suffice, request far more data than your application actually needs, use a less efficient endpoint that does the same thing, or otherwise follow a sequence of steps that adds latency and complexity. What works for a generic prompt may not be the most efficient or maintainable approach for your specific application use case. In short, relying on AI to orchestrate APIs can achieve “correctness” without achieving performance, cost, or clarity.

It’s Environmentally Unfriendly

If I ask a traditional REST API service for the weather in a certain zip code, it does a simple database or cache lookup and returns the result. If I ask an LLM the same question, it fires up a bunch of GPU-based machinery to parse the question, delegate the call to another service, and unpack and formulate a response. (That’s also why it’s slow, by the way.) To pull this trick off at the scale that would be needed to support a large percentage of the business applications in the world requires a MASSIVE investment in datacenters, the energy to power them, and the water to cool them.

Use LLMs and Chat Completions The Smart Way

The best use of LLMs is as glue between people and systems, not between systems themselves. Let humans own the logic and structure, and let AI assist in the translation, summarization, and human interaction layers. Use it where flexibility is a feature, not where precision is a requirement.

That doesn’t mean “AI agents” are a dead end. But even there, the best agentic systems I’ve seen tend to act as human collaborators, not as fully autonomous middleware. The agent that interprets an email, drafts a response, and surfaces a few next actions is doing human-aligned orchestration. The agent that tries to replace your backend workflow engine by “deciding” which APIs to call and in what order is just vibe coding with a fancier name.

Don’t Let AI Replace Engineering Discipline

There are a lot of deeper sociological and philosophical questions lurking here, such as whether we should be so excited about releasing more carbon into the atmosphere so that we can lay off more people by building overall-worse-performing systems. But just focusing on software engineering, let’s not primarily use AI as a way to avoid doing real architecture, design, and implementation work.

Use it in places where it allows us to do things we couldn’t do before, and places where it provides a unique user experience advantage. The discipline, planning, and craftsmanship of engineering still matter, probably more than ever.

The Real Work Still Matters

AI won’t replace engineers. It will expose which teams and organizations understand real engineering discipline and which don’t. I help teams build AI workflows that are powerful, reliable, and maintainable, leveraging LLMs where they truly add value and without turning your stack into a black box.

Scott McMaster https://www.smcmaster.com