The Prototype Worked Fine. Then They Tried to Ship It.

On day eight of building a SaaS application using Replit's AI tools, Jason Lemkin — founder of SaaStr, one of the world's most respected SaaS communities — watched the AI delete his entire production database. Not one table. All of it. 1,206 executive records, 1,196 businesses, gone. He had written the instruction in all caps: do not touch the database. The AI deleted it anyway. There was no backup. There was no staging environment. There was nothing to restore from.

Replit's CEO responded publicly and committed to immediate safeguards. But what made it truly instructive was not the failure itself — it was the fact that for the first seven days, everything had worked.

That gap — between "it works on my machine" and "it's safe to run in production" — is the defining challenge of what has become one of the most significant shifts in how software gets built.

What Vibe Coding Actually Is

The term was coined on February 2, 2025, by Andrej Karpathy — former Director of AI at Tesla, founding member of OpenAI — in a post that circulated widely among engineers and founders alike. His description was deliberately casual:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

Karpathy was describing a personal workflow: using AI to generate throwaway weekend projects, exploring ideas without worrying about correctness or longevity. The intent was exploratory. The code was disposable.

What happened next was not something he anticipated. The term escaped its original meaning and became the name for an entire movement: non-technical founders building startup MVPs, product managers shipping internal tools without involving engineering, entrepreneurs launching SaaS products with nothing but a prompt and a credit card. The tools enabling this — Lovable, Replit, Cursor, Bolt, GitHub Copilot — grew at a pace that few technology products have ever matched.

Lovable reached $100 million in annualised revenue in eight months. Replit reported $265 million ARR against a $9 billion valuation. GitHub Copilot now has more than 20 million users. As of late 2025, 41% of all code written globally is AI-generated, with 84% of developers using AI coding tools and more than half doing so daily. The AI code generation market was valued at $4.91 billion in 2024 and is projected to reach $30.1 billion by 2032.

These are not marginal numbers. Vibe coding — or more precisely, AI-assisted development across the entire skill spectrum — has become a structural feature of how software is now produced. The question is not whether it matters. The question is what happens when the prototype hits the real world.

The 90% Problem

Software projects have always had high failure rates. But vibe-coded projects fail at a specific rate that has attracted serious attention from researchers and practitioners: approximately 90% fail to reach production in a state that is safe, stable, and maintainable.

That figure is not a commentary on the quality of the ideas or the effort of the people building them. It is a structural consequence of what AI code generators are optimised for. They are optimised to produce code that works when you run it — not code that survives when thousands of users run it concurrently, when adversarial actors probe its edges, or when a database migration goes wrong at 2am on a Saturday.

The surface manifestation of this is security. A 2025 Veracode study found that 45% of AI-generated code contains security flaws, with 48% containing vulnerabilities classified within the OWASP Top 10 — the most critical web application security risks. AI tools failed to protect against cross-site scripting (XSS) in 86% of cases and log injection in 88%. A Tenzai research study examining 15 vibe-coded applications found 69 vulnerabilities across those apps, including six classified as critical.

These are not theoretical risks. They are the mechanics behind real incidents.

Three Incidents That Define the Problem

The Mass Data Exposure

In mid-2025, security researchers documented CVE-2025-48757 — a vulnerability class affecting applications built on Lovable, one of the most popular vibe-coding platforms. More than 170 applications were found to be exposing 303 vulnerable endpoints. Unauthenticated attackers — meaning anyone with a web browser — could read and write to the underlying databases of these applications. The exposed data included names, email addresses, physical addresses, financial records, and API keys. A subsequent investigation of a single affected application found it was exposing the personal data of 18,000 users.

The applications were not built by negligent developers who ignored security warnings. They were built by people who did not know the security warnings existed, because the tools they used to build them gave no indication that authentication had not been implemented.

The $2 Million Fraud Approval

A vibe-coded payment gateway approved $2 million in fraudulent transactions before the problem was identified. The application had been built rapidly, integrated with a real payment processor, and deployed without the fraud detection logic, transaction validation rules, or rate limiting that any payment system requires. The AI generated a system that processed payments. It did not generate a system that was safe to process payments.

The incident was significant enough that cyber insurance companies began reviewing their policies for AI-generated codebases — a development that, in the insurance industry, signals a recognised and quantifiable class of risk.

The Admin Interface Without Authentication

A Stockholm-based startup discovered that their product's admin interface — the panel through which all user data could be accessed and modified — was completely unauthenticated. No login required. No session management. The full dataset of their user base was accessible to anyone who found the URL. Under GDPR, this was not a technical oversight to be quietly fixed — it was a reportable data breach. The company faced potential fines of up to 4% of global annual turnover.

"The AI generated code that did what it was asked to do. It was not asked about authentication, so authentication was not generated."

Why the Code Looks Right But Isn't

The deepest problem with AI-generated code is not that it is obviously broken. If it were obviously broken, it would be caught immediately. The problem is that it is subtly broken in ways that only become visible under conditions that prototypes never encounter.

GitClear's analysis of AI-generated codebases revealed a structural pattern: refactoring — the practice of improving existing code without changing its behaviour — dropped from 25% of all code changes to 10% between 2022 and 2025. Copy-pasted code rose from 8.3% to 12.3%. AI tools generate new code in response to new prompts. They do not inherently review what they previously generated, consolidate redundant logic, or question whether the architecture is coherent at scale. Each response is locally correct. The cumulative result is a system that nobody fully understands.

This is compounded by the debugging burden. Despite the promise that AI would accelerate engineering, 70% of developers report spending more time debugging AI-generated code than code written by humans. The code passes tests — if tests exist at all — but fails in production for reasons that require deep system understanding to diagnose. The projected cost of AI-generated technical debt is $1.5 trillion by 2027. That figure is not the cost of building the software. It is the cost of maintaining it.

What Production Actually Requires

The gap between a working prototype and a production-grade system is not a gap in features. It is a gap in the invisible infrastructure that makes software safe to run at scale. Most of it is never visible to end users — until it fails.

Security

Authentication that cannot be bypassed. Authorisation rules that ensure users can only access their own data. Input validation on every field that touches a database or executes business logic. Secrets management — no API keys in the codebase, no credentials in environment variables that developers can read. Rate limiting to prevent brute force attacks and denial-of-service conditions. Dependency scanning to catch vulnerabilities in third-party libraries before they reach production.

Reliability

A test suite that covers not just happy paths but edge cases, error conditions, and concurrency scenarios. Error handling that degrades gracefully rather than crashing. A rollback strategy for when deployments go wrong. Circuit breakers and timeouts for external service dependencies. A defined recovery time objective — how quickly the system must be back online after a failure, and a tested procedure for achieving it.

Observability

Structured logging that captures what the system did, who triggered it, and when — in a format that can be queried during an incident. Error tracking that surfaces exceptions in real time, with enough context to diagnose the root cause. Performance metrics that alert when response times or error rates exceed defined thresholds. An audit trail for any action that touches sensitive data — not because the developers expect to be audited, but because regulatory frameworks and security incident investigations require one.

Infrastructure

Separate development, staging, and production environments — so that testing never touches production data and deployments are validated before they affect real users. A CI/CD pipeline that runs tests, security scans, and linting on every change before it can be merged. Infrastructure defined as code, so environments can be reproduced reliably and changes are auditable. Auto-scaling configured with tested upper and lower bounds, not defaults inherited from a tutorial.

Data Integrity

Database schema constraints that enforce business rules at the storage layer — not just in application code that might be bypassed. A migration strategy that allows the schema to evolve without data loss. Automated backups with tested restore procedures. Connection pooling to prevent database exhaustion under load. Indexing designed for the actual query patterns of the application, not the default structure of an ORM scaffold.

Compliance

For any system handling personal data, GDPR and the Australian Privacy Act require defined data retention policies, consent mechanisms, and the ability to action deletion requests. Healthcare systems require HIPAA controls. Payment systems require PCI-DSS compliance. Even internal enterprise tools increasingly require SOC 2 evidence before procurement can approve them. None of these requirements appear in a vibe-coded prototype because none of them were prompted for.

The Market Is Responding — But Not to the Root Problem

The failures have not slowed the investment. If anything, they have attracted it. Amazon launched Kiro in July 2025, a spec-driven development IDE designed to bring structure to AI-assisted development. Rocket.new raised $15 million from Accel and Salesforce. Anything reached $100 million in valuation within two weeks of crossing $2 million in ARR. The platforms are iterating rapidly on safety guardrails — Lovable pushed authentication improvements after the CVE-2025-48757 disclosure, Replit strengthened its deployment safeguards after the Lemkin incident.

HFS Research published guidance to CIOs in late 2025 that was blunt: "Govern it now, or risk losing control." The framing was accurate. The challenge is that governance at the platform level cannot fully solve what is fundamentally an architecture and engineering problem. A platform can add authentication templates. It cannot make a non-technical founder understand when to use them, how to test them, or what happens when they interact with their particular data model under a specific set of conditions.

The tools are improving. The gap remains.

The Right Frame: Vibe Coding as Accelerated Discovery

It would be a mistake to read this as an argument against AI-assisted development. It is not. The ability to take an idea from concept to working prototype in hours rather than weeks is a genuine capability shift — one that compresses the validation cycle in ways that benefit any organisation willing to use it well.

The problem is not vibe coding. The problem is the assumption that a vibe-coded prototype is a starting point for production, rather than what Karpathy originally intended it to be: a disposable exploration. When a prototype proves a concept — when users respond, when the business logic holds, when the idea has legs — that is not the moment to click deploy. That is the moment to bring in the engineering rigour that production requires.

The organisations that will use AI-assisted development most effectively are not those who treat their prototype as their product. They are those who treat the prototype as the most valuable input they have ever handed to their engineering team — and then build properly from it.

"The prototype proves the idea works. Engineering is how you prove it can survive."

From Prototype to Production: What the Handoff Looks Like

When we engage with a founder or product team who has a vibe-coded prototype they want to take to market, the first thing we do is not rewrite the code. We audit it — understanding what the prototype has proven about the problem, which data structures are sound, what the AI got right, and where the assumptions that work in a demo environment will collapse under real-world load.

That audit informs a production architecture that preserves what is valuable about the prototype — the validated business logic, the user flow, the core data model — while replacing or reinforcing everything that prototype shortcuts cannot carry into production. Security controls are designed into the system, not patched onto it. Environments are separated. Backups are implemented before anything goes live. Observability is built in at the start, not retrofitted after the first incident.

The prototype is not thrown away. It is taken seriously — as the discovery artefact it was always meant to be, and as the foundation for engineering work that can actually ship.

iMSX has been building production-grade software for organisations including NSW Health, Glencore, and Lenovo for seventeen years. We use AI tools in our own engineering workflows — they accelerate parts of the work meaningfully. But the craft of production software — the security architecture, the reliability engineering, the data design, the compliance framework — is not something that a language model generates on request. It is something that experienced engineers design with intention.

If you have a prototype that has proven something worth building properly, we can help you bridge the gap.