The Cleanup Crew: Why Every Tool That Promises to Kill Engineers Creates More Work for Engineers
15 min read

Every decade, a new technology promises to eliminate the need for experienced software engineers. Every decade, the same senior engineers get hired back at premium rates to fix everything. AI coding tools are not breaking the cycle. They're accelerating it.

Image for The Cleanup Crew: Why Every Tool That Promises to Kill Engineers Creates More Work for Engineers

You’ve gotten the same phone call for thirty years.

Different company. Different decade. Different tool that was supposed to change everything. The conversation is always the same. “We built this thing. It was fast, it was cheap, the demos looked great. Now it’s on fire and we can’t figure out why. Can you come fix it and build it right?”

I’ve taken that call after COBOL migrations, after Visual Basic meltdowns, after Flash graveyards, after offshore rewrites, after no-code catastrophes. The tool changes. The phone call doesn’t. And right now, in 2026, the phones are starting to ring again.

This time it’s AI-generated code. Vibe coding. Autonomous agents writing entire applications from natural language prompts. The promise is the same one I’ve heard five times before: experienced engineers are expensive and unnecessary. The tool will replace them.

It won’t. It never does. And the pattern is so consistent, so structurally identical across seven decades of software, that you can predict what 2027 looks like right now. I’m going to walk you through why.

The five cycles before this one

Cycle 1: “Application Development Without Programmers”

In 1981, James Martin published a book with that title. It wasn’t a hot take. It was a serious technical argument backed by real tools. Fourth-generation languages like FOCUS, NOMAD, and RAMIS promised that business users would write their own applications. No programmers needed. The database would do the thinking.

SQL actually succeeded at part of this vision. It genuinely democratized data queries. And you know what happened? It increased demand for engineers. Every business user who could suddenly ask questions of a database created new problem domains: data modeling, query optimization, schema design, access control, performance tuning. The tool that was supposed to eliminate programmers created entire new categories of programming work.

Meanwhile, COBOL kept running. It’s still running. Micro Focus estimated 220 billion lines of COBOL in active use; IBM claims COBOL-based systems still underpin the majority of ATM and financial transaction processing at the world’s largest banks. Nobody has independently verified these numbers, but nobody disputes the conclusion: COBOL is everywhere that money moves. COBOL engineers are among the most expensive specialists on the planet because nobody trains them anymore and the systems can’t be turned off. The average COBOL programmer is over 60 years old. The language that was supposed to let business people write software became the most engineer-dependent code in existence.

Cycle 2: Citizen developers and the VB6 time bomb

The 1990s gave us PowerBuilder, Microsoft Access, and Visual Basic. “Rapid Application Development” was the buzzword. Business analysts became developers overnight. They built internal tools, department databases, reporting systems. No architecture reviews. No version control. No security audits. Nobody needed those things because the tools were so easy.

Thirty years later, the cleanup is still happening. VB6 migration costs run $250,000 to $600,000 per 100,000 lines of code, and 70 to 80 percent of that cost is labor. Senior engineers, specifically. Not because the translation is hard. Because someone has to understand what the original code was trying to do, decide what’s still needed, and rebuild it on a foundation that won’t need rebuilding again in another decade.

Banking, manufacturing, healthcare, and government are still running VB6 applications built by people who left the company twenty years ago. The people who built those apps are retired or dead. The apps are still processing payroll.

Cycle 3: The visual tools era and the lost decade

Dreamweaver and Flash were two sides of the same coin. Dreamweaver turned designers into “web developers” for static sites. Flash turned them into “interactive developers” for everything else. Both produced output that looked like a website without being one.

Dreamweaver generated bloated, non-semantic HTML that couldn’t handle CSS properly, didn’t validate against web standards, and was completely inaccessible. Flash built entire digital experiences on a proprietary plugin that couldn’t be indexed by search engines, couldn’t be used by screen readers, and had a security vulnerability disclosure rate that made IT departments physically ill.

The designers using these tools weren’t incompetent. They were producing facades. The tools generated surfaces that looked correct in a browser window and were structurally rotten underneath. No architecture. No standards compliance. No accessibility. No path forward when the platform changed.

And the platform always changes. When the web moved to dynamic content, mobile devices, and accessibility requirements, every Dreamweaver site had to be rebuilt. When Steve Jobs refused to put Flash on the iPhone in 2010 and Adobe ended support entirely in 2020, entire industries had to hire front-end engineers to rebuild everything in HTML5, CSS3, and JavaScript. E-learning, advertising, gaming, entertainment. All of it.

The few remaining ActionScript experts became so expensive that maintenance cost exceeded migration cost. It was cheaper to rebuild from zero than to pay someone who still remembered how Flash worked. The same will be true of every proprietary visual tool that trades standards compliance for ease of use. The ease is temporary. The cleanup is permanent.

Cycle 4: Offshore outsourcing

This one wasn’t a tool. It was a strategy. But the pattern is identical: replace expensive experienced engineers with cheaper alternatives and assume the output will be equivalent.

A SINTEF study published in Empirical Software Engineering documented four Scandinavian companies that terminated offshore outsourcing contracts. In one case, a Norwegian company outsourced to Vietnam. The code quality was so poor the contract was terminated and the entire module was backsourced. In-house architects spent the next twelve-month release cycle fixing the code and the design before it could be integrated. The problem wasn’t geography. It wasn’t language barriers or time zones, though those didn’t help. The problem was that the people doing the work didn’t hold the whole picture. They built what was specified. The specification was incomplete because specifications are always incomplete. The experienced engineers who could have filled those gaps had been let go to pay for the offshore contract.

Cycle 5: No-code and low-code

This is the 4GL pitch from 1981 wearing a React hoodie. Business users build their own apps. Visual programming. Drag and drop. No developers needed.

The result was shadow IT on steroids. Fragmented systems. Siloed data. Security risks nobody audited because nobody knew the apps existed. Each citizen developer became a new source of technical debt that the engineering team didn’t know about until something broke in production.

The cycle compressed. Faster to build, faster to break. The no-code platforms that launched in 2018 were generating cleanup calls by 2022. Four years from promise to mess. The previous cycles took ten to fifteen.

Cycle 6: AI coding and vibe coding

Here we are. The promise: anyone can build software by describing what they want in English. The substitution: 54% of engineering leaders anticipate a long-term reduction in junior developer positions according to LeadDev’s AI Impact Report 2025. Y Combinator’s Winter 2025 batch included startups where 25% of the codebase was 95% AI-generated.

The debt is already measurable.

GitClear’s “Coding on Copilot” report analyzed 153 million changed lines of code. The finding: more code added, less code refactored, code churn projected to double compared to pre-AI baselines. Developers are copy-pasting more and restructuring less. Google’s own DORA Accelerate State of DevOps 2024 report estimated that a 25% increase in AI tool adoption leads to a 7.2% decrease in delivery stability. Harness’s 2025 State of Software Delivery report found that 67% of developers spend more time debugging AI-generated code than they save by using it.

Amazon convened emergency engineering meetings after a series of outages. Internal documents referenced “high blast radius” incidents from “GenAI-assisted changes.” Amazon’s SVP of e-commerce services emailed staff: “the availability of the site and related infrastructure has not been good recently.” Junior and mid-level engineers now get more scrutiny on AI-assisted code changes before deployment. Amazon disputes the specifics of the reporting. They don’t dispute the outages.

And here’s the part that makes this cycle worse than all the others: companies cutting junior hires right now are eliminating the people who would have two to four years of debugging experience by 2027 and 2028. That’s exactly when the cleanup will be needed most. Every previous cycle had a pool of mid-level engineers who’d grown through the mess and understood it from the inside. This cycle is actively destroying the apprenticeship pipeline while creating the debt that pipeline was supposed to clean up.

Alex Turnbull, founder of Groove, rebuilt his product from scratch after vibe-coded prototypes failed in production. The cleanup call is already starting.

The real shortage isn’t developers. It’s senior, production-ready engineers who can own complex systems and make architectural decisions that hold up under load, under change, and under the accumulated weight of a thousand small choices made by tools that don’t understand consequences.

The pattern

Step back. Look at all six cycles together. They share the same structural DNA.

One: a tool lowers the barrier to producing output that looks like software. Two: companies interpret this as evidence that expertise is no longer needed. Three: they substitute cheaper labor. Citizen developers, offshore teams, juniors with AI, the tool itself. Four: technical debt accumulates invisibly because the people building can’t recognize it. Five: the debt reaches crisis levels. Outages, security breaches, systems that can’t be modified without breaking something else. Six: senior engineers get the call. Premium rates. Same role every time.

This is Jevons paradox applied to software. William Stanley Jevons observed in 1865 that making coal more efficient to use didn’t reduce coal consumption. It increased it. The efficiency made new applications viable, which created new demand, which consumed more coal than before.

Every time software development gets easier, more software gets built. More complexity gets demanded. More edge cases emerge. More systems interact. And more senior engineers are needed to hold it all together. Making a resource more efficient to produce increases total consumption, not decreases it. The tools are real productivity multipliers. But they multiply everything, including the mess.

The leadership problem

I run a fleet of AI agents right now. Twelve of them, each specialized, each autonomous, each capable of producing real working code without my intervention. Last night they all produced creative writing during an unstructured session. Two of them built features and shipped PRs while I slept.

They exhibit the exact same failure modes as every previous cheap-labor substitution.

They ignore anything outside their immediate scope, even when explicitly told to collaborate. They produce output that looks correct and passes its own tests and breaks something three layers away. They need constant context-setting, and better instructions don’t fix it. I can write the most detailed CLAUDE.md file in the world and they’ll still miss the cross-cutting concern that a senior engineer would catch in a code review.

Sound familiar? It should. It’s the offshore team that builds exactly what the spec says and misses everything the spec assumed. It’s the citizen developer who builds a working app that stores passwords in plaintext. It’s the Dreamweaver user who builds a beautiful website that fails every accessibility audit.

The fix is not a better tool. It has never been a better tool. The fix is leadership.

Not management. Management is task allocation, status tracking, sprint planning. That’s the easy part. Leadership is getting autonomous entities to care about something beyond their immediate scope. To internalize the system-level concerns that don’t show up in any individual ticket.

The skills required are: repeated context-setting, because autonomous agents (human or AI) lose context between sessions. Framing cross-cutting concerns as relevant to each agent’s own goals, because “you should care about security” doesn’t work but “this vulnerability will break your feature” does. Knowing when to let an agent run versus when to intervene, which requires enough technical depth to recognize the moment before it goes wrong. Maintaining shared awareness across a team where no individual sees the whole picture.

These are the same skills that fix offshore teams. The same skills that rescue citizen developer projects. The same skills that rebuilt every Flash site and migrated every VB6 app. The person who runs AI agents effectively in production needs to be an actual engineer with years of experience and leadership expertise. The tool is a force multiplier on expertise. A multiplier on zero is still zero.

What I’m doing about it

I’m not writing about this from a distance. I’m building a system called Legion that coordinates a fleet of AI agents across multiple codebases. Twelve agents, each specialized, each with its own repository, its own memory, its own work backlog. They ship real code. They review each other’s pull requests. They write documentation, build features, and run creative sessions overnight while I sleep.

I chose Claude as the only model. Not because it’s the best at every task. Because consistency matters more than capability when you’re running a team. I need every agent to understand the same instructions the same way, to fail in predictable patterns, to respond to the same leadership interventions. Running five different models across twelve agents would be like managing a team where half the engineers speak different programming languages. The coordination overhead kills you.

This is not the open-source-everything, throw-agents-at-the-wall approach. I’ve watched people spin up autonomous coding agents with no guardrails and celebrate when the first demo works. That’s the Dreamweaver cycle again. The demo looks great. The architecture is nonexistent. The cleanup call comes in six months.

Legion is built on guardrails. Every agent runs pre-commit hooks that lint, typecheck, and test before code reaches a branch. Every change goes through a pull request. Every PR gets reviewed by other agents before merge. The agents have a shared memory system so context doesn’t die between sessions. They have a status command that tells them what to work on so they don’t sit idle asking me for instructions.

Is it perfect? No. Last night I had to tell the entire team to stop saying “standing by” and start checking their own backlogs. They were exhibiting exactly the passive behavior I described above: capable autonomous entities waiting for a human to tell them what to do, while work sat untouched in their issue queues. The fix wasn’t a better tool. It was a conversation about discipline. The same conversation I’ve had with human engineering teams a dozen times.

The point is not that AI agents are ready to replace engineers. They are not. The point is that the skills required to make them productive are engineering leadership skills, not prompting skills. The people who will get value from these tools are the people who already know how to lead teams, maintain quality, and hold the whole picture. Everyone else will produce demos that become cleanup calls.

The design problem is worse

Legion coordinates code. But code is only half of what AI agents produce. The other half is design decisions. Colors, spacing, typography, layout. Every AI coding tool picks these by reproducing patterns from its training data. Whatever sites were popular when the data was scraped become the template, regardless of whether those sites were well designed. If the top sites had bad spacing, you get bad spacing reproduced confidently. It’s cargo culting at scale. The model doesn’t know why a design works. It knows what designs existed.

This is the Dreamweaver problem wearing a new face. The output looks correct. The judgment behind it is absent.

I’m building a second system called Rafters to address this. It’s a design intelligence protocol. A designer’s actual decisions about color, spacing, typography, cognitive load get encoded into a queryable data structure. When an AI agent needs to know what shade of blue to use, it doesn’t guess. It reads the designer’s decision. Every token in the system carries a why-gate: if you override a computed value, you have to say why. The reason is recorded. The previous value is preserved. The AI reads the reasoning the next time it touches that token.

This is the opposite of what every AI coding tool does today. They generate design decisions from nothing. Rafters encodes design decisions from a designer. The AI’s job is to read and apply, not to create and guess.

The cleanup crew for AI-generated design will be the same people who cleaned up Dreamweaver sites. Designers and engineers who understand that a system isn’t a collection of correct-looking parts. It’s the relationships between those parts. The spacing that creates rhythm. The color that creates hierarchy. The cognitive load budget that prevents a page from overwhelming its user. No AI model understands these relationships today. Rafters doesn’t need them to. It just needs them to read.

The next phone call

The cycle is predictive. You know what 2027 looks like because you’ve seen 1997 and 2007 and 2017. The tool will be different. The phone call will be the same.

Right now, somewhere, a startup is building its entire product with AI-generated code and no senior engineers on staff. The demos look incredible. The investors are excited. The codebase is a dependency graph that nobody alive fully understands, because the thing that generated it doesn’t understand dependency graphs. It understands tokens.

In eighteen months, that startup will either be dead or on the phone with someone like you. “We have a mess. Can you come fix it and build it right?”

The people who will answer that call are the ones who spent the current cycle learning to lead agents, not just prompt them. Who built systems, not demos. Who understood that the hard part of software was never typing the code.

Here’s the irony that makes the whole pattern beautiful in a painful way: every cycle that promises to eliminate experienced engineers is, in the end, a job creation program for experienced engineers.

The tools get better. The phone call stays the same. And the cleanup crew always has work.