Is vibe coding production ready?

For prototyping, yes. For production, not on its own. Simon Willison draws the line: vibe coding is a non-technical way to prototype, while shipping reliable software is agentic engineering, a separate discipline. Steven Sinofsky calls the current state a significant gap between what the concept promises and what the tools actually execute. The fix is judgment, not another prompt.

Why does my vibe-coded app break in production?

Because the demo and production are different problems. The first 80% (the happy path) is what AI tools are best at. The last 20% (payments, auth, security, edge cases under real load) is where they break, because there's less training data and the cost of a wrong line is high. As the Y Combinator survey puts it, AI generates code well but debugging stays hard.

Should I ship, rebuild, or hire a developer for my vibe-coded MVP?

Ship if the janky version still delivers value and users tolerate it, which is Garry Tan's test. Hire a developer for the one or two parts you can't afford to get wrong, the way George at Wrestle AI brought in a dev just for payment integration. Rebuild only when the architecture, not the polish, is the blocker.

Where does AI coding actually fail?

On security, novel problems, and anything needing system-level judgment. Simon Willison points to the lethal trifecta of prompt-injection risks the industry is normalizing. The a16z panel separates vibe coding (focused on output) from enterprise coding (attention to implementation detail). AI raises the floor of what's buildable; it does not yet raise the ceiling on reliability.

Do I need to learn to code to finish a vibe-coded app?

Not necessarily. Lazar at Lovable argues a non-technical background can even be an advantage. But the Y Combinator survey is blunt that taste, debugging skill, and system thinking still decide who ships reliably. You don't need to write every line. You need to know which lines you can't trust the model with, and decide whether to learn them, route around them, or hire them out.

Gavel Playbook · Strategy

Vibe-coded to production. Nine plays for finishing the last 20%.

Your MVP works in the demo and dies in production. The last 20 percent, Stripe, auth, security, isn't a coding problem. It's a decision: ship the janky thing, rebuild it, or hire. Nine operators on which call to make.

Plays: 9 cited plays
Sources: 4 channels
Read time: 9 minutes
Updated: June 2026

Do it in this order

How do you take a vibe-coded app to production?

The last 20 percent is not more coding, it is hardening the parts the demo never tested. Mark which features are prototype and which are production, then secure the ones that touch money, identity, and other people's data first. Hire narrowly for what you cannot trust the model with, like payments and auth. Debug with discipline instead of stacking fixes. Then ship the jankiest version that still works and let real users tell you what is left.

The hardening order

1 Draw the line. Mark which features are prototype and which are production. The prototype parts can stay vibe-coded; anything touching money, identity, or other data gets the engineering posture (play 03).
2 Audit security. Before launch, check for the lethal trifecta: untrusted input, access to private data, and the ability to act on the outside world. If you have all three with no guardrails, fix that first (play 04).
3 Hire narrowly. List the parts you cannot afford to get wrong, usually payments and auth, and pay one developer for just those hours instead of trying to vibe-code them yourself (play 05).
4 Debug with discipline. Commit every time something works so resets are cheap, and when a bug survives two fix attempts, reset to the last good commit rather than stacking a third (play 09).
5 Ship the janky version. Stop polishing and launch the version that still does the one thing users need. If they tolerate the rough edges, the problem is real and you finish in public (play 01).

You vibe-coded an MVP in a weekend and it felt like magic. Then you hit the wall everyone hits. The demo works and production does not. Stripe will not connect, you cannot tell if your code is secure, and the bug you have been chasing for three days will not die. The line founders keep repeating is honest: real engineers spend two hours coding and six debugging, vibe engineers spend ten minutes coding and three days debugging. The first 80 percent is the easy part the build-in-a-weekend tutorials show you. The last 20 percent, the part that makes an app real, is where most people give up.

This is not a coding tutorial. Gavel will not write your Stripe integration or audit your auth. What it does is the part the tutorials skip: the decision. The last 20 percent is rarely a code problem you can prompt your way out of. It is a strategic call, ship the janky version, rebuild it properly, or hire a developer for the risky part, and the right answer depends on your situation. Every play here comes from an operator who hit the wall on the record, told in their own words, with a timestamp you can check. They do not all agree, which is the point.

"There's a significant gap between what the concept promises and what the tools actually execute."

Steven Sinofsky, former Microsoft Windows lead, on a16z

The Plays

Nine operators on the same wall, and the call each one made.

Garry Tan, YC · Y Combinator

Ship the jankiest thing that still works

Garry Tan's rule for the production wall is to lower the bar, not raise it. The instinct at the last 20 percent is to keep polishing in private until everything is perfect, which is the surest way to never finish. Tan's counter is to cut scope and ship the jankiest thing that still provides real value, then let users tell you whether the rough edges even matter. If somebody uses a janky product anyway, that speaks volumes about how badly they need the problem solved.

The point is not to ship garbage. It is to stop treating the unfinished 20 percent as a private failure and start treating it as a public test. Launching early is generally good as long as there is some minimum value the product actually provides, and most of the polish you were agonizing over turns out not to be the thing standing between you and a paying user.

Steal it

Stop polishing. Ship the version that still does the one thing users need, even if the edges are ugly. If people tolerate the janky product and keep using it, the problem is real and you finish in public. If they bounce, no amount of last-20-percent craft would have saved it.

Deep dive: Garry Tan's Launch-Jankiest Rule Watch on YouTube · 08:44

Paul Graham (discussed) · Y Combinator

Do the unscalable finishing work by hand

Paul Graham's most-quoted essay is the antidote to the weekend-build fantasy that the app should finish itself. The lesson founders skip is that startups do not take off on their own. You recruit the first users by hand and do the work that does not scale to delight them. Applied to a stuck vibe-coded MVP, that means the finishing work is not a background job you can prompt your way out of.

It is manual: onboarding early users one conversation at a time, watching them hit the broken edge, fixing it live. Airbnb did not scale by automating early. The founders went out and did obviously unscalable things to get the flywheel turning. The same hands-on grind tells you which parts of the last 20 percent users actually care about, so you spend your finishing effort on the integration that matters instead of gold-plating the one nobody touches.

Steal it

Do the unglamorous finishing work by hand before you automate it. Onboard the first ten users one at a time, fix their bugs live, wire up the one integration that matters yourself. The manual work that does not scale is exactly what tells you which parts of the last 20 percent are worth building right.

Deep dive: Paul Graham's Do Things That Don't Scale Watch on YouTube · 04:09

Simon Willison · Lenny's Podcast

Know the line between vibe coding and engineering

Simon Willison draws the cleanest line in the debate, and it dissolves most of the confusion about why the last 20 percent is so hard. Vibe coding, in his framing, is a non-technical way to prototype: you describe what you want and accept the output without reading it. Agentic engineering is the separate, professional discipline of building production-ready software with AI, where you still own the architecture and the review. The trap is using the first mode to do the second mode's job.

A prototype that works in a demo and a system real users depend on are different problems, and the tools that feel magical for the first feel reckless for the second. Naming which one you are doing is the unlock. The production parts of your app, the ones that touch money, identity, and other people's data, need the engineering posture even if the rest stays vibe-coded.

Steal it

Write down which parts of your app are prototype and which parts are production. The prototype parts can stay vibe-coded. The production parts, the ones that touch money, identity, or other people's data, get treated as engineering: tests, review, and a human who understands every line.

Watch on YouTube · 08:01

Simon Willison · Lenny's Podcast

Treat security as the wall, not a checkbox

Willison's second warning is the one most weekend builders never hear until it is too late: security is where vibe-coded apps fail, and the failure is structural, not a missed checkbox. He describes the lethal trifecta, the combination of untrusted input, access to private data, and the ability to take action on the outside world. An app with all three and no guardrails is a prompt-injection waiting to happen. The deeper risk is cultural.

The industry is normalizing unsafe AI development practices, shipping apps that handle real user data with security treated as a step you get to later. For a founder, the takeaway is to move the security question to the front. Before you worry about polish, ask whether your app exposes the trifecta, because that is the part of the last 20 percent that does not just embarrass you, it can sink you.

Steal it

Before you launch, audit the three things that combine into a real breach: untrusted input, access to private data, and the ability to act on the outside world. If your app has all three with no guardrails, fix that first. Security is the wall most vibe-coded apps hit, not a checkbox at the end.

Watch on YouTube · 26:22

George, Wrestle AI · Starter Story

Hire one dev for the part you can't trust AI with

George built Wrestle AI, an AI wrestling coach that analyzes match footage, to around 17,000 dollars a month, and he is a useful case study because he did not pretend AI could do everything. He used a vibe coding platform and ChatGPT to build most of the app in about a month, leaning on public APIs to move fast. Then, for payment integration, he hired a developer to do that specific part. That is the ship-versus-hire decision made in the wild.

He did not rebuild the whole app and he did not white-knuckle the riskiest integration himself. He vibe-coded the 80 percent that was forgiving and bought a few hours of real engineering for the part where a bug means lost revenue or a broken subscription. Hiring narrowly, just for payments or auth or the one compliance-sensitive feature, is almost always cheaper than the rebuild you do after that part silently breaks in production.

Steal it

Make a list of the parts you cannot afford to get wrong, usually payments, auth, and anything legal. Vibe-code everything else, then pay one developer for a few hours to do just those parts. Hiring narrowly is cheaper than rebuilding after a Stripe webhook silently drops a charge.

Watch on YouTube · 02:15

a16z panel · a16z

Separate vibe coding from enterprise coding

The a16z panel makes a distinction that tells you which posture to take at the wall. Vibe coding is the less formal mode, focused on the output and moving fast. Classic, enterprise-style coding involves more choices and pays attention to implementation details. Their read is that even enterprise users will vibe-code for the right tasks, which means the answer is not vibe coding good, enterprise coding bad.

It is matching the mode to the stakes. The reason the last 20 percent stalls is usually that a founder built the whole thing in output-first mode and then tried to harden it without ever switching gears. The fix is to decide, feature by feature, which game you are playing. Validation and internal tools can stay loose.

The parts real customers depend on get the implementation-detail attention that the enterprise mode is built for.

Steal it

Decide which game you are playing before you commit a tool. If you are validating an idea, vibe-code for output and move fast. If you are shipping something people depend on, switch to the mode that pays attention to implementation detail. Trying to do both in one undifferentiated blob is why the last 20 percent never closes.

Watch on YouTube · 10:10

Steven Sinofsky · a16z

Mind the gap between the concept and the execution

Steven Sinofsky, who ran Windows at Microsoft, offers the most grounding frame for managing your own expectations. He separates vibe writing, which he sees as a relatively mature use of AI with real autonomy, from vibe coding, which he says is still constrained. There is a significant gap between what the concept promises and what the tools actually execute, which produces errors and demands careful human oversight. In plain terms, the tool that looked like it nearly finished your app in the demo will, on the parts that matter, give you a lot of errors and sometimes just not do the thing you asked at all.

That is not a sign you prompted wrong. It is the current state of the medium. The founders who finish are the ones who plan for that gap, verifying every output that touches the core experience rather than trusting the happy-path demo and discovering the holes in production.

Steal it

Lower your expectation of what the tool finishes on its own, and raise your own oversight to match. Expect errors on anything novel, and verify every output that matters instead of trusting the demo. The gap between what the prompt promised and what shipped is the work, so plan for it rather than being surprised by it.

Watch on YouTube · 01:30

YC founder survey · Y Combinator

Compete on taste and debugging, not on speed

The Y Combinator founder survey is the on-the-record version of the meme that AI is not very good at debugging, and debugging is most of the job. The survey found founders writing less code and spending more time thinking and reviewing, with human taste becoming more important, not less. AI generates code well, but debugging remains the hard part, sometimes coming down to giving very explicit instructions or just rolling the dice. The skills that still decide who ships, especially past the initial zero-to-one phase, are taste, debugging, and system thinking.

This reframes the last 20 percent. It is not a tooling problem you solve by switching to a better model. It is the exact zone where the human skills the tools do not replace become the whole game, so the most valuable thing you can build is your own ability to read the AI's output critically and fix what it got wrong.

Steal it

Spend your scarce hours on the two skills AI does not hand you: taste, knowing what good looks like, and debugging, knowing why the thing broke. Anyone can generate the first draft now. The founder who can read the AI's output critically and fix what it got wrong is the one who actually ships.

Watch on YouTube · 02:54

YC Startup School · Y Combinator

Reset instead of stacking fixes

Y Combinator's Startup School compresses the hard-won tactical advice into one habit that saves stuck projects. When the AI goes down a rabbit hole, the worst move is to keep prompting fixes on top of fixes. Each failed attempt accumulates bad code, and you end up in the doom spiral builders describe where nothing runs and you cannot tell which patch broke what. The discipline is to use version control religiously and avoid multiple bug-fix attempts without resetting.

Commit every time something works, so that the moment a fix makes things worse you reset to the last good state and start clean with a sharper prompt. This single habit, cheap resets backed by frequent commits, is the difference between a vibe-coded project that gets finished and one that collapses under its own accumulated mess in the final stretch.

Steal it

When a bug survives two fix attempts, do not stack a third. Reset to your last working commit and try a cleaner prompt from there. Commit religiously so resetting is cheap, and you avoid the doom spiral of piling broken patches on top of broken patches until nothing runs.

Watch on YouTube · 02:21

Read it for your situation

How to use this playbook

MVP works, stuck at the wall: Start with play 01 (ship the jankiest thing) and play 05 (hire one dev for the risky part). Decide what users will tolerate, then buy a few hours of real engineering only for the part you cannot afford to break. That is the ship-versus-hire call, made cheaply.
About to take it to real users: Bring play 03 (the vibe-versus-engineering line) and play 04 (the security trifecta). Before launch, mark which features are prototype and which are production, and fix the security wall first. The parts touching money, identity, and other people's data get the engineering posture.
Caught in the debugging doom spiral: Start with play 09 (reset, don't stack fixes) and play 08 (compete on taste and debugging). Commit every time something works so resets are cheap, and put your scarce hours into reading the AI's output critically instead of prompting blindly.

Gavel's chat sits on top of all nine. Tell it what you built, where it broke, and what you have already tried, and it points you at the play that fits your situation, with the same timestamped citations you just read. It will not write your code. It will tell you whether to ship, rebuild, or hire, and show you where these operators disagree.

Common founder questions

Frequently asked

Is vibe coding production ready?: For prototyping, yes. For production, not on its own. Simon Willison draws the line: vibe coding is a non-technical way to prototype, while shipping reliable software is agentic engineering, a separate discipline. Steven Sinofsky calls the current state a significant gap between what the concept promises and what the tools actually execute. The fix is judgment, not another prompt.
Why does my vibe-coded app break in production?: Because the demo and production are different problems. The first 80 percent, the happy path, is what AI tools are best at. The last 20 percent, payments, auth, security, and edge cases under real load, is where they break, because there is less training data and the cost of a wrong line is high. As the Y Combinator survey puts it, AI generates code well but debugging stays hard.
Should I ship, rebuild, or hire a developer for my vibe-coded MVP?: Ship if the janky version still delivers value and users tolerate it, which is Garry Tan's test. Hire a developer for the one or two parts you cannot afford to get wrong, the way George at Wrestle AI brought in a dev just for payment integration. Rebuild only when the architecture, not the polish, is the blocker.
Where does AI coding actually fail?: On security, novel problems, and anything needing system-level judgment. Simon Willison points to the lethal trifecta of prompt-injection risks the industry is normalizing. The a16z panel separates vibe coding, focused on output, from enterprise coding, which pays attention to implementation detail. AI raises the floor of what is buildable; it does not yet raise the ceiling on reliability.
Do I need to learn to code to finish a vibe-coded app?: Not necessarily. Lazar at Lovable argues a non-technical background can even be an advantage. But the Y Combinator survey is blunt that taste, debugging skill, and system thinking still decide who ships reliably. You do not need to write every line. You need to know which lines you cannot trust the model with, and decide whether to learn them, route around them, or hire them out.
How do you take a vibe-coded app to production?: In order: mark which features are prototype and which are production, then secure anything touching money, identity, or other data first. Hire one developer narrowly for the parts you cannot trust the model with, like payments and auth. Debug with discipline by committing often and resetting instead of stacking fixes. Then ship the jankiest version that still works.

Stuck at the last 20%?
Get the ship-rebuild-hire call for your app.

Gavel won't write your code. Describe what you built and where it broke, and it tells you whether to ship the janky version, rebuild, or hire, citing the same operators you just read.

Free with Google. 20 credits/month forever. Pick a plan in 30 seconds after signin.

The frameworks underneath

Go deeper on the shipping frameworks

Garry Tan

Launch-Jankiest Rule

Ship the jankiest MVP that still provides value. If users tolerate it, the problem is real. The fastest way to finish is to lower the bar and launch.

Read the framework

Paul Graham

Do Things That Don't Scale

Startups don't take off by themselves. Recruit the first users by hand and do the unscalable finishing work that delights them.

Read the framework

Vibe-coded to production. Nine plays for finishing the last 20%.

How do you take a vibe-coded app to production?

Nine operators on the same wall, and the call each one made.

Ship the jankiest thing that still works

Do the unscalable finishing work by hand

Know the line between vibe coding and engineering

Treat security as the wall, not a checkbox

Hire one dev for the part you can't trust AI with

Separate vibe coding from enterprise coding

Mind the gap between the concept and the execution

Compete on taste and debugging, not on speed

Reset instead of stacking fixes

How to use this playbook

Frequently asked

Stuck at the last 20%?Get the ship-rebuild-hire call for your app.

Go deeper on the shipping frameworks

Launch-Jankiest Rule

Do Things That Don't Scale

One new playbookevery Monday morning.

Stuck at the last 20%?
Get the ship-rebuild-hire call for your app.

One new playbook
every Monday morning.