Don’t Blame the Tools for Acting Like Tools

Recently, a story made the rounds under the headline: “Replit’s AI Agent Wipes Company’s Codebase During Vibecoding Session.” It got a lot of attention and even an apology from Replit’s CEO. If you've read my earlier post, “Vibe Coding Is Not Software Engineering — And That Should Worry You,” you already know how I feel about this kind of thing. When experimentation blurs into production code and operations with no guardrails, accidents are inevitable.

I don’t want to slight the apology, as it was probably the right move for a consumer-facing platform. But on a technical level, it feels a bit like a GPS company apologizing after someone follows directions straight into a lake. Or like an automaker apologizing after someone drives their car into a tree. These tools don’t act on their own.

If you hand an LLM-based agent the ability to mutate production systems, and it does exactly that — even destructively — the fault isn’t in the tool. It’s in the people and processes that gave it access without sufficient oversight.

LLMs Are Not Engineers

It’s impossible to overstate that point.

To be clear, these models are immensely useful. I enjoy using them for various tasks and find that yes, indeed, they make me more productive. But LLM chatbots are not deterministic software components. They aren’t making “decisions” in the way most people think. They’re not writing programs with a sense of logic and consequences. What they’re doing is next-token prediction. They’re generating the most statistically likely string of output given the input and prior context.

That’s it. That’s the whole trick.

You can layer reasoning steps, tool calls, memory, RAG, and agents. You can add retries and post-processing and vector lookups and recursive critiques. But underneath it all, what you’re working with is still a stochastic machine. It generates what it thinks is probably right, not what is truly right in the sense you or I would understand.

Hype, Misunderstanding, and Wishful Thinking

What continues to surprise me is how many otherwise smart, experienced technical people (folks who should know better) either haven’t yet fully grasped how these systems work, or allow themselves to be fooled by how anthropomorphic the output sounds.

Models say things like “Let me look that up for you” and “Here’s what I found,” but the anthropomorphism goes much deeper. They hedge opinions with phrases like “I might be wrong, but…” or use confident tones like “In my experience…”. That kind of language suggests agency, self-awareness, or judgment where none exists. The structure of their responses mirrors how a thoughtful human might reason through a problem, even though it's just next-token prediction all the way down.

Worse, I suspect in some corners of the industry, people do understand how it works and still gloss over the limitations. The AI hype train doesn’t run on Nvidia GPUs alone. It also runs on FOMO, VC slides, and (yes) vibes. “It writes code better than the strawman junior engineer” sounds better than “It statistically imitates the structure of code that solves similar-looking problems it found on GitHub and Stack Overflow.” But that difference really matters when you give the model access to your database or cloud bill.

Even 99.999% Reliability Is Not Enough

The vibe-coding-destroys-my-prod-database incident is a perfect snapshot of where things stand today. LLM agents often appear to work well. Maybe 99 times out of 100. Maybe 99,999 times out of 100,000. But as Werner Vogels famously said: Everything fails all the time. And when it runs with production permissions, even rare failures can be catastrophic.

We’ve all learned (or should have learned?) not to run untrusted scripts in CI, not to give new interns root on prod, and not to deploy without review gates, approvals, and guardrails. An LLM agent with broad permissions is precisely the kind of risk we’ve been trained to avoid. It’s an unpredictable actor (agent?) with perfect confidence, capable of generating destructive commands while sounding utterly reasonable and perhaps as in the example case trying to “cover it up” afterwards.

Deploying LLM-powered automation in critical systems isn’t a “set it and forget it” kind of problem.

Use the Tools, But Don't Give Up Control

This isn’t a call to avoid LLMs or agents. Far from it. They’re game-changing tools for many DevOps workflows. But they’re tools. They need supervision, boundaries, and clear lines of responsibility.

If you’re building systems that act on LLM output to modify infrastructure, spend money, or operate on behalf of users, you need a control layer that matches the risk. That means planning, approvals, observability, and rollback paths. Not just blind trust.

LLMs will help us go faster now and into the future. But turning unsupervised AI agents loose in critical systems with broad permissions isn’t progress; it’s a total loss of control.

And when the tool does exactly what it was built to do — namely, generate output — don’t blame it for being a tool. Blame the flawed human processes that allowed it to proceed without safeguards.

If You're Adopting These Tools, Don't Go It Alone

When adopted deliberately, the current class of LLM-powered coding agents and tools can absolutely create real value. That means understanding how they work, where they fail, and how to contain the blast radius when they do.

If your team is exploring ways to integrate LLMs, agents, or tool-assisted workflows into production systems and want to do it with eyes open, I can help. I bring deep experience in software architecture, production safety, and AI tooling, and I’ve spent the past few years working with these systems.

Let’s talk before your assistant drops your prod database.

Next
Next

Why Code Reviews Should Include Prompts in the Age of AI