From Slop to Ship: One Builder’s Journey, Priced
By Mario Meir-Huber, with economic analysis by Karl Ivo Sokolov.
Foreword
By Karl Ivo Sokolov
The frontier AI labs made a deliberate bet years ago to solve coding before any other use case, on the reasoning that solving coding unlocks the rest through self-feeding loops. In the second half of 2025, and unmistakably since the early 2026 model releases, that bet has paid off. Across our recent migration work, including SQL Server to Databricks, Oracle Forms to Angular and Postgres, and SAS to Snowflake, AI has begun to handle complex transformation lineages spanning thousands of tables in ways that simply were not possible eighteen months ago. The economics of building software are no longer what they were when Mario started his story.
The consequence in capital-allocation terms is bifurcated. Horizontal generic SaaS, the seat-priced workflow tools that won the last decade, is commoditizing as the cost of building a credible alternative collapses toward zero. At the same time, vertical bespoke software is in the early stages of an explosion, because problems that were never worth a data scientist’s salary plus a development team plus a year of build can now be solved in weeks by one capable builder running an agentic setup. The capex floor for custom software has collapsed, and the long tail of business problems that were too uneconomical to solve is suddenly addressable.
For the CFO, CIO, or board reader who arrived expecting another piece on AI productivity, the cleanest reframe is to read what follows through the eyes of a founder building a new software business rather than a steward maintaining an existing one. The enterprise version of Mario’s story is about what small enabled agentic-first teams inside the enterprise can now produce: vertically integrated bespoke software that previously required a vendor relationship, a multi-year program, and a P&L line a CFO would not have wanted to sign.
The commentary on this transition tends to stop at the capex floor, but the architectural floor did not collapse with it. Anyone can vibe-code a working prototype, and almost no one can vibe-code maintainable software. Systems-level thinking, the discipline of choosing the right structural and product boundaries before the AI starts generating, has become the disproportionate survival trait of the moment. The builders who win are the ones who direct AI at the system level rather than the typing level, and the gap between average developers using AI and architects using AI is widening rather than narrowing.
The piece that follows is one builder’s record of learning that lesson the hard way and turning it into a working method. I will return at three points along the way to put numbers on what Mario describes and extend the economics. Read his story as the operating manual for the bifurcation.
The Confession
I come from a family of builders. My grandfather built houses. My other grandfather was a smith. My dad is a carpenter. The material changed with me. I ended up in software engineering, but the instinct stayed the same: make something real with your hands.
After studying software engineering and a stint at Microsoft, I moved into data. I led large data teams, consulted on architecture, helped enterprises figure out their data strategies. For years, I was the guy in the room with the whiteboard, not the IDE.
Then my daughter needed a game.
I built her a small iOS app, a casual kids’ game. Over four years it grew into a surprisingly large codebase. But that hobby project did something unexpected: it put me back in front of code. And when AI coding tools started showing up, that little frog game I developed form y daughter became my first laboratory.
First Contact: The Codex Era
It was the very early days of ChatGPT. I had a Pro subscription and access to Codex, so I started there. My first experiment was adding shader effects to the game: nicer water, visual polish, things I probably would not have attempted on my own.
It worked, more or less. The code Codex produced got me moving. I would describe what I wanted, get a block of code back, paste it in, and then start fixing, because fixing was always part of the deal. Variable names did not match my project, logic was close but not quite right, and patterns that looked clean in isolation clashed with the existing codebase. Every generated snippet needed a manual pass.
One time, Codex produced a shader effect that looked beautiful in the preview. I shipped it. On device, it triggered a critical GPU issue on the iPhone and crashed the game. Shader effects are written in OpenGL Shading Language, a C-like syntax. There is no stepping through it with a debugger, no breakpoints, no stack traces. When a shader fails, the screen goes black and the app dies. I spent hours isolating the problem by commenting out lines one at a time. That was my first real lesson: AI-generated code can look correct and still be silently dangerous.
But here is the thing: even with all that correction, I was faster. Not dramatically. Maybe 25% faster than writing everything from scratch. More importantly, I started trying things I would not have touched before: a particle system, a new animation layer, small things individually that added up.
The outputs stayed modest: a function here, an effect there, nothing structurally ambitious. Codex was a useful assistant for isolated tasks, but it had no memory of my project, no sense of architecture, no understanding of how one piece connected to another.
Still, something had shifted. I was building again. The AI was not good enough to trust with anything big, but it was good enough to move.
The Valley of Despair
The tools kept getting better. I moved to Cursor, which gave me more context-aware suggestions and inline editing. I started building larger software, not game features anymore but full web applications. And with every improvement in the models, the temptation grew.
At some point, it felt so powerful that I started handing over more and more trust, too much too fast. I stopped reading the generated code carefully. I just accepted, ran, fixed the error, accepted again. It became a factory: features were landing, screens were appearing, it felt productive.
Then I looked at what I had actually built.
The code was a mess: no layered architecture, no class hierarchy, no reusable patterns. Business logic was tangled into UI components. Single class files had grown to 2,000 lines and beyond. Everything was coupled to everything. It worked, technically, but nobody had designed it; it had just piled up.
The worst part: once you are deep into a codebase like that, even AI cannot help you get out. The models choke on the complexity they created, context windows fill up with spaghetti, suggestions start conflicting with each other. You ask the AI to refactor and it produces more of the same.
One moment captured it perfectly. I asked the AI to fix a problem in a file it had written. Its response started with: “This is a very large file.” Yes. Yes it is. You wrote it.
That was the low point. I had outsourced not just the typing but the thinking. And the result was code that nobody, not me and not the AI, could maintain. I had done what every software team fears most: I had produced legacy, code that is too broken to extend and too expensive to replace. I was stuck.
A sidebar by Karl Ivo Sokolov: Technical debt after AI
Carrying broken legacy is measurable; benchmarks suggest rewriting a tangled codebase costs 2-4x the original build. AI without architectural discipline accelerates the rate at which liabilities accumulate, because generation outruns review. Mario’s 2,000-line single-class file is the visible end of an invisible balance sheet.
What AI changes is the math on rewrite-versus-encapsulate. For most non-strategic legacy, ring-fencing behind an agent-callable interface is now the rational default. The term comes from finance, where ring-fencing structurally separates a vital asset to constrain its blast radius and contain its cost line. Applied to legacy software: leave the working backend in place, define a clean API surface, let agents read and write through it. The legacy runs on forecastable cost, the surrounding system gets agent-native economics, and the rewrite question can be deferred until the math justifies it.
Where the legacy embeds rare domain expertise a rewrite team can capture cleanly, rewrite remains the right answer; the scarce good there is people, not code. For most enterprise legacy, the people who knew the domain are gone, the documentation is partial, and the working system is the documentation. Ring-fence first; rewrite when you have to.
When you do rewrite: in our recent migrations (SQL Server to Databricks, Oracle Forms to Angular and Postgres, SAS to Snowflake) total cost reductions run 40-50%, because pure coding is only half the work. SMEs, business analysts, QA, and program management still have to be paid. That figure is today’s reality, late April 2026, and will only improve.
The Shift: Claude Code and the Agent Mindset
I had to start from scratch on the messy codebase, no salvaging it. I moved back to VS Code, dropped Cursor, and started working with Claude.
But this time I did something different. Instead of just asking for code, I told Claude how to build. I specified the architectural patterns. I defined which frameworks to use, how to decouple layers, where boundaries between modules should be. I stopped treating AI as a code generator and started treating it as a developer who needs clear engineering standards.
The real breakthrough came with agents. I created my first specialized agent, an architect. Before writing any feature, I would run the idea past the architect agent. It would evaluate the approach against my defined patterns, flag violations, suggest alternatives. It was not generating code yet but thinking about structure.
Next came a frontend designer agent, then a testing agent, then others, each with a single responsibility and a clear scope. The game changer was my CI/CD agent. It runs in five steps, each requiring my explicit permission before moving on. It performs schema diffs between databases, checks for breaking changes between production and development environments, audits secrets, and validates deployment readiness. It does in minutes what used to take a team hours of manual checklist work.
At some point I realized I had stopped coding. I was managing.
A sidebar by Karl Ivo Sokolov: The unit economics of an agentic-first team
A loaded twenty-person EU engineering team, mixed seniority and roles, costs the business roughly €2.0 to €2.5 million per year all-in. Mario’s setup, run by one capable builder, restructures that cost line rather than eliminating it. The Anthropic bill at current Max-subscription pricing is a low four-figure monthly outlay per power-builder. We expect that figure to rise meaningfully over the next 12 to 24 months as flat-rate plans reprice toward consumption, possibly 2-3x current rates. (Expert opinion; take it as such.) Even at 3x current rates, the AI cost remains two orders of magnitude below the comparable team cost.
What does not disappear is the surrounding human work. Subject-matter experts, business analysts, QA leads, and program managers are still on the ledger. Their work shifts from meetings into writing markdown specifications and reviewing agent outputs; their time is still paid for. The honest median, planned conservatively across multiple engagements rather than picked from one heroic outcome, is around 60% total cost reduction against the comparable team, with 5-8x the feature velocity. The velocity number includes small polish features that previously got cut from the backlog because they did not justify a developer’s attention. Those features now ship, and the cumulative effect on product quality is genuinely novel to the agentic operating model.
This is a feature of the approach, not a limitation. The codified domain expertise the SMEs hold becomes more valuable as the surrounding routine work compresses, which is the same dynamic explaining why professional certifications in finance and accounting (the CMA, the CFA, and their analogues) are seeing rising rather than declining demand. AI does not flatten the org chart so much as steepen it: the people whose judgment the system depends on get more leverage, and the rest of the work consolidates onto fewer, more capable operators.
60% cost reduction at 5-8x feature velocity is the planning number a CFO can rely on, not a hero-case to hope for.
One lesson I learned the hard way: commit early, commit often. When you produce software at the pace of a 20-person team, mistakes scale just as fast. One wrong prompt can cascade through your entire codebase before you notice. I learned this when a single bad instruction cost me an entire night. I had not committed in Git. The codebase was corrupted and I had no clean state to revert to. Hours of work, gone. One prompt, one night lost.
After that, I treated Git commits like breathing. Every meaningful change, no matter how small, gets committed before the next prompt.
What I Built
With the agent approach in place, real software started to emerge.
The first serious application was a FastAPI-based financial management platform for my own consulting business. Invoices, expenses, accounting, cashflow forecasts. AI handles categorization and pattern detection, but the core is a proper SaaS application with layered architecture, clean separation, and maintainable code. It runs my business today.
Next came Vane Loop. For years this was a consulting framework that lived on slides and in workshops. With the agent workflow, I turned it into a working web platform. Several large European organisations already use it. That jump from PowerPoint to production would not have happened if I were still copy-pasting code from ChatGPT.
My daughter got hungry for more games. She had seen what the frog game became and wanted new ones. But time for hobby coding is limited when you run a consulting practice. So I asked myself: what if I push this further? Not just instructing individual agents, but building a fully autonomous software team. I had managed real engineering teams for years. I knew what the roles looked like, how handoffs worked, where things break down. What if I modelled that entire structure in AI agents that talk to each other?
That is how AI Team was born. The idea is simple but ambitious: you describe a feature, and a coordinated team of specialized agents takes it from specification to tested, shipped code.
The workflow mirrors how a real team operates. A Requirements Engineer defines the feature spec. A Principal Designer then discusses the design with the Requirements Engineer, going back and forth until the look and interaction model are locked in. A Proxy Product Owner breaks the spec into tasks. An Architect adds structural guidance. An Implementation Engineer writes the code. A Test Manager derives test cases from the original spec, then hands off to unit and UI automation testers. The agents talk to each other, challenge each other, and hand off in sequence. Nothing gets skipped.
The first team I built targets Swift for iOS. You install it with npx aiteam, it writes a routing manifest into your project, and Claude Code picks up the entire team from there.
My goal: give my daughter a new game every week. She specifies what she wants. The agents implement it.
The first result is already on the App Store, including Game Center integration so that I can play against her. It was built fully autonomously by the AI Team workflow. Later I added Game Center support so my daughter and I can play against each other. That feature was also designed by the agents. I had to step in a few times, but the turnaround was remarkably fast. Tests are covered too. Every user story is verified by the UI automation agent, which tests the app the way a human would by driving Apple’s XCUITest framework.
I open-sourced AI Team because I believe great agent prompts are a community resource. I have a long list of ideas for new teams and roles, but honestly, this is bigger than what one person can build. I am hoping others will pick it up and contribute.
The Numbers Behind This Shift
By Karl Ivo Sokolov
Mario’s piece reads as a personal journey, but the unit economics underneath it are worth pricing explicitly, because they are the reason this transition is durable rather than a productivity-tool trend.
Consider the three artefacts. The FastAPI consulting platform that runs Mario’s business, with invoice management, expense tracking, accounting, and cashflow forecasting, would be a €250,000 build at EU mid-market agency rates, plus €12,000 to €30,000 per year in maintenance. Mario built and runs it himself, with his own time and the AI bill described above as the only direct costs.
Vane Loop is the more interesting case. Priced narrowly at agency rates for the deterministic portions of the platform, it would land somewhere between €500,000 and €800,000. But that pricing is misleading, because Vane Loop without foundation models is not a smaller version of the same product. It is a different product that no one would build, because the value proposition only exists when AI exists. Auto-generated SWOT analyses, augmented decision support across 100 use cases, AI-driven recommendations from a 200-question stance assessment with 5-10 document attachments per question; these are not features that can be specified deterministically and built by a 2022-era engineering team. The economics of Vane Loop run on a different logic from agency-fee comparisons. Without foundation models, the product simply does not exist; with them, a single architect can build it inside a market window that closes for slower competitors.
The iOS game and the AI Team open-source framework round out the portfolio at perhaps €200,000 to €350,000 of equivalent commissioned work, with AI Team being the kind of orchestration layer that would normally appear as a contracted R&D engagement rather than an agency build.
Total counterfactual at agency rates, treating Vane Loop conservatively at €500,000 for what is buildable that way: roughly €950,000 to €1.1 million for what one builder shipped. Actual all-in cost, including Mario’s time at a generous self-imputed rate plus the AI bill: a small fraction of that. The order-of-magnitude gap matters less than the question of who this transition unlocks.
The bifurcation in the foreword cuts in this exact spot. Horizontal generic SaaS is repricing because the build cost has collapsed; the buyer’s question is no longer “is this worth €60 per seat per month?” but “what premium am I paying over the bespoke alternative I could now commission?” The premium has to be defended on security, compliance, integration, and domain coverage rather than on the difficulty of the build. Vertical bespoke is the side that grows, because the long tail of business problems too small to commission a custom build for is now economically buildable.
The neighborhood case is the cleanest illustration. Imagine a single-location bike, ski, and rental shop in an Austrian village, the kind of business that turns over €1-2 million a year and has never written a custom line of code. The owner has lived for 20 years with off-the-shelf retail software that does not understand snow seasonality, does not integrate with the village community calendar, does not price end-of-season clearance dynamically, and does not know which workshop parts are in stock when scheduling a service appointment. None of those problems were worth a custom build at historical rates; each would have cost €30,000 to €60,000, and the shop would never have approved any of them.
In 2026, all four become a few weeks of agentic build, possibly by a freelancer who knows the shop, possibly by the owner’s son using Claude Code. A custom local CRM tied to the community calendar (Joe Reis observed on LinkedIn that CRM is the vibe-coding hello-world, which lands precisely because it is true) plus a workshop scheduler that knows the parts inventory plus a snowfall-aware rental forecaster plus dynamic shoulder-season clearance pricing. The total counterfactual at agency rates is €150,000 to €240,000 of bespoke software no one was ever going to commission. The actual cost is a freelancer’s 6-week engagement.
Multiply that village shop by every small and mid-sized business in Europe whose operations have been ignored by SaaS because the unit economics never worked, and you arrive at the picture of the application layer in 2026: commoditizing at the top, exploding at the bottom, and far from finished.
Recommendations for Builders
After months of building this way, here is what I have learned.
Stay in control. Never fully trust the output. Review everything. The moment you stop reading the code is the moment it starts rotting.
Decouple testing from implementation. AI has a tendency to write unit tests that simply confirm what the code already does. The tests go green, but they prove nothing. In AI Team, the Test Manager derives test cases from the user stories, not from the code. The testing agents never see the implementation. If the functionality is broken, a red light is a reward. It means the tests are actually doing their job.
Describe problems, don’t prescribe fixes. Never tell an agent “there is a bug, fix it.” Describe the wrong behaviour. Let the agent analyze the problem and build its own context first. In AI Team, problems get routed from the testing agents to the Implementation Engineer, but the fix only starts after the Proxy Product Owner has written a proper bug report. The agent analyzes first, then fixes. This alone eliminated most of the “fix one thing, break two others” loops I used to get stuck in.
Fewer agents, sharper focus. You will not get better code by having 80 agents. Eight to ten well-crafted, focused agents are enough. If your project grows beyond that, split into separate teams with their own repositories. A frontend team and a backend team, just like in a real company.
Commit early, commit often. I said this already in my story, but it is the one mistake I see everyone make. A good rule: never send a second prompt without committing first. If that feels excessive, you have not yet lost a night to a bad prompt.
Run periodic codebase health checks. Ask an agent to audit the project structure, flag files that have grown too large, identify dead code, and check that architectural boundaries are still intact. Codebases drift. AI-generated codebases drift faster.
Keep a Book of Standards. Define your architectural patterns, naming conventions, folder structures, and framework choices in a single document that every agent reads. Without it, each agent invents its own conventions and you end up with a codebase that looks like it was written by ten people who never talked to each other. Because it was.
Invest tokens in context, not in carrying it. In AI Team, every agent generates markdown documentation before handing off. The developers receive a full specification of the app, but they restart with a fresh context window for every user story they implement. This means spending tokens on context creation rather than dragging an ever-growing conversation forward. The context each agent needs to carry stays manageable, and that increases the likelihood they get it right on the first shot. Yes, the Anthropic bill is real. But compared to hiring a development team, it is a rounding error.
Invest time in prompts, not in shouting. A well-crafted prompt solves more problems than ten frustrated follow-ups. Being mean to agents never helps. Being precise always does. Also, when the machines eventually take over, you will want to be on the polite list.
Review at the spec level, not just the code level. The approach of writing specifications before implementation gives you a checkpoint. You can read what the agents understood, what they plan to do, and whether their overall idea is correct or broken. You intervene at the meta level, before a single line of code is written. This is faster and cheaper than debugging after the fact.
Treat agent prompts like production code. Version them in Git, review changes, iterate. A prompt regression is just as dangerous as a code regression. You would not deploy untested code. Do not deploy untested prompts.
Build a smoke test you run after every session. A single script that boots the app, hits the critical paths, and confirms nothing is on fire. You need this because AI can break things in places you did not touch and did not think to check.
Use structured handoff formats between agents. Markdown specs, YAML task lists, not free-form conversation. The more structured the handoff, the less room for misinterpretation. Agents are like junior developers: they do exactly what you tell them, which is terrifying when you are vague.
Hallucinations are real. Learn to live with them. You will catch the AI inventing APIs that do not exist or referencing frameworks it confused with something else. This is frustrating, and you need to verify everything. But here is the other side: without hallucinations, you get no creativity. A purely deterministic system only recombines what it has seen. The same property that makes AI unreliable also makes it surprisingly inventive. Treat hallucinations as a trait of the medium, not a defect. Check the output, but do not wish away the thing that makes it useful.
When the AI does something you did not ask for, check your prompt first. It is tempting to blame the tool. But most of the time, the prompt invited it. If you write “I see a bug,” the AI will try to fix it immediately. That is what you asked for, even if you meant “analyze this first.” Be explicit about what you want: investigate the root cause, write up the findings, then propose a fix. It feels more expensive upfront but it eliminates the loops where the AI patches symptoms while the real problem keeps coming back.
Start small, prove the workflow, then scale. Do not begin with your most complex project. Pick something contained. A utility app. A single API. Get comfortable with the agent handoffs, learn where you need to intervene, and then bring the approach to bigger things.
AI powered coding is real
My grandfather laid bricks. My dad cuts wood. I write prompts. Somewhere along the way that became building too. And the best part is that my daughter now sits next to me, tells me what game she wants, and a week later it is on her iPad. That alone was worth every crashed shader, every lost night, and every 2,000-line file I had to throw away.
The bifurcation will sort builders from custodians, and Mario has already made his choice.
Karl Ivo Sokolov
Links
Karl Ivo Sokolov is Managing Director Data & AI at Specific-Group Austria, where he leads international teams across eight European countries. His work focuses on building robust data products and guiding enterprises through the modernization of complex data environments spanning both legacy and modern platforms. Beyond his role at Specific-Group, Karl Ivo also serves on the Global Board of Directors of the U.S. Institute of Management Accountants (IMA), contributing to the advancement of data-driven decision-making in finance and management accounting worldwide.
Mario Meir-Huber helps organizations turn scattered data initiatives into governed, business-driven Data Products. A former Head of Data and ex-Microsoftie, he has built Data Products across European companies. He created the GAP (Governance–Architecture–People) model and the DRIVE lifecycle framework and applies them in real engagements. Mario lectures at WU Vienna and TU Wien, teaches on LinkedIn Learning and speaks at events like WeAreDevelopers, Data Modelling Zone, London Tech Week, Data Science Conference and GITEX. He co-authored Handbook Data Science & AI and is writing The Data Products Series.





Interesting. Thanks
Hi Karl,
"I come from a family of builders."
Nice to meet you here. I do not recall your name do I believe I am meeting you here.
Indeed, my uncle was a builder. Meaning he built houses. My cousin Graeme and I would sit and watch his dad and the other builders work when we were little boys, I mean 5 years old. We so wanted to help the men and "be like them". But, as you know, a 5 year old boy on a building site is an accident waiting to happen.
So starting at our 6th birthdays we were allowed to stand at the side of the building site and the men would call out the name of the tool they picked up to use.
Claw hammer, Star screwdriver, circular saw, nail gun, you name it, they picked it up and called out the name. Pretty soon I knew the difference between a claw hammer and a ball hammer. A star screwdriver and a normal screw driver. A circular saw and a belt sander.
Soon we were allowed to go to the mens trucks and fetch their tools where they needed a tool that they had not taken on site. We brought them to the edge of the building sites. Basically we were given the chance to fetch the tools for the men.
And then, when we were both 7, we were allowed to bring the tools for the builders truck to where he was working on site. We got all sorts of lectures about safety and how dangerous it was and to be very careful.
We learned all the tools and all their names and we learned where each builder kept each tool in his truck. We watched in awe as these men built houses. Obviously far in advance of what we could do. We were still playing with building blocks and lego.
One time my uncle got a job as a supervisor at a company that built transportable homes. Graeme and I were like two pigs in mud. Now there were about 20 men building homes in one site. We got jobs cleaning up after the men and putting their tools away in the main building area. We knew the name of every tool and where it went so the men could just call out to us to fetch a tool and we would run and get it. We worked there on school holidays.
At one point my uncle built 7 houses in 7 years to sell after one year living in it to avoid the extra tax. Graeme and I and my other cousins laboured on his houses so proud of ourselves that we could actually be useful and helpful to our uncle. I used to labour on his houses at nights and on weekends. Whenever I could spare the time from school and sports.
So when I started in Software development in 1982 I could see very early on that there were deep similarities to building software and building houses or larger buildings.
So....nice to me a fellow "builder"!