Vibe Coding

Updated March 6, 2026

The Complete Guide to AI-Native Software Development

22 chapters. 200+ prompts. Updated monthly. The only vibe coding resource that evolves as fast as the field.

In-depth chapters

Production-ready prompts

Security CVEs analyzed

Tools compared

📅 Updated March 2026 📈 Monthly updates for subscribers 🎓 Part of the EndOfCoding ecosystem

of developers using AI tools

$0B

Claude Code annual revenue

GitHub Copilot paid users

$0B

AI coding tools market (2026)

Choose Your Plan

The vibe coding landscape changes every week. Your subscription keeps you current.

Free Preview

✓ First 3 chapters
✓ 10 sample prompts
✓ 2 video tutorials
✓ Interactive quiz

↓ Start Reading Below

Frequently Asked Questions

Everything you need to know before you start.

What exactly is vibe coding? ▼

A term coined by Andrej Karpathy in February 2025 for a new development style where you describe what you want in natural language, and AI tools generate the code. It ranges from AI-assisted autocomplete to fully autonomous AI agents building entire applications. This ebook covers all five levels in depth with real data, case studies, and 200+ production-ready prompts.

Who is this ebook for? ▼

Developers exploring AI tools, engineering managers evaluating team adoption, entrepreneurs building products with AI, and anyone curious about the future of software development. Whether you use Cursor, Claude Code, GitHub Copilot, Bolt.new, or v0, this guide covers your tools and workflow.

How is the subscription different from a one-time purchase? ▼

The vibe coding landscape changes weekly — new tools launch, security incidents emerge, pricing shifts. Your subscription includes monthly updates to all 22 chapters, new entries in the prompt library and tool comparison matrix, a fresh monthly intelligence brief, and new community showcase features. You always have the most current resource in a fast-moving field.

What do I get in the free preview? ▼

The first 3 chapters are completely free: the origin story of vibe coding, a precise definition and framework, and the underlying philosophy. You also get the interactive quiz to find your vibe coding level, 10 sample prompts, and a glimpse of every chapter topic. No credit card required.

Can I cancel anytime? ▼

Yes. Monthly and annual subscriptions can be cancelled at any time through your Lemon Squeezy billing portal. You keep access until the end of your current billing period. No questions asked, no hidden fees.

Get a free chapter + weekly vibe coding insights

Join the mailing list for a bonus chapter on AI tool selection, plus weekly curated updates on the vibe coding landscape.

No spam. Unsubscribe anytime. Part of the EndOfCoding ecosystem.

📖

How to read this ebook: Use the sidebar to navigate 22 chapters. Click expandable sections for deep dives. Take the interactive quiz to find your vibe coding level. Use Ctrl+K to search across all content. Chapters 1–3 are free — subscribe to unlock all 22.

01. The Moment Everything Changed

Updated May 27, 2026

On February 2, 2025, Andrej Karpathy — former OpenAI co-founder, former Tesla AI director, and one of the most respected voices in machine learning — posted what would become one of the most consequential tweets in software development history:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works." — Andrej Karpathy, February 2, 2025

Within weeks, the term had gone viral. Within a month, Merriam-Webster added "vibe coding" as a slang and trending term. By December 2025, Collins English Dictionary named it their Word of the Year.

But vibe coding didn't just enter the dictionary. It entered the economy. It entered boardrooms. It entered the workflows of millions of developers. And it sparked one of the fiercest debates the software industry has seen in decades.

The Timeline

February 2025

Karpathy coins "vibe coding"

The tweet goes viral. Merriam-Webster adds it within weeks. Developers worldwide start experimenting.

March 2025

Y Combinator reveals the data

25% of YC Winter 2025 startups report codebases that are 95% AI-generated.

May 2025

Claude Code launches publicly

Anthropic's terminal-based coding agent goes GA. It will reach $1B ARR in 6 months.

May 2025

Lovable security vulnerability

170 of 1,645 apps built on the vibe coding platform found to expose personal data.

June 2025

Devin hits $73M ARR

Cognition's AI software engineer grows 73x in 9 months. Goldman Sachs adopts it.

July 2025

Wall Street Journal reports mainstream adoption

Professional software engineers are using vibe coding for commercial products.

August 2025

Google Jules exits beta

Google's async coding agent goes public. 2.28M visits, 140K+ code updates.

September 2025

The "Vibe Coding Hangover"

Fast Company reports senior engineers entering "development hell" with AI-generated codebases.

November 2025

Claude Code hits $1B ARR

One of the fastest-growing enterprise software products in history.

December 2025

Collins Word of the Year

"Vibe coding" is named Collins English Dictionary Word of the Year 2025.

December 2025

Tenzai security study

69 vulnerabilities found across 15 applications built by 5 major AI coding tools.

January 2026

"Vibe Coding Kills Open Source" paper

Researchers publish arXiv paper arguing vibe coding threatens the open-source ecosystem by reducing user engagement with maintainers. Tailwind CSS docs traffic down 40% from 2023.

January 2026

Cognition reaches $10.2B valuation

Cognition raises $400M Series C. Devin ARR passes $155M. Goldman Sachs, Citi, Dell, Cisco, Palantir among enterprise clients.

January 2026

GitHub Copilot reaches 4.7M paid users

Agent mode becomes default workflow for complex tasks. MCP support rolls out to all VS Code users.

February 2026

Claude Opus 4.6 launches with Agent Teams

Anthropic releases Opus 4.6 with agent teams in Claude Code — multiple AI agents working in parallel on different aspects of a project, coordinating autonomously.

March 2026

The Open Source Reckoning & Enterprise Adoption

Researchers warn vibe coding erodes open-source funding. Pega becomes first enterprise platform to brand its AI features as "vibe coding." Cursor 2.5 launches subagent architecture. GitHub Copilot opens multi-model access. Devin 2.2 achieves 67% PR merge rate.

April 2026

Claude Opus 4.7 & GPT-5.5 — The Spring Model Wave

Anthropic's Opus 4.7 introduces task budgets for agentic loops. OpenAI ships GPT-5.5 and GPT-5.5 Instant. Both models drive dramatic improvements in multi-step software engineering tasks. Gartner projects 40% of enterprise apps will include task-specific AI agents by end of 2026.

May 2026

Stack Overflow: 83% of Developers Use AI Daily

Stack Overflow's 2026 Developer Survey (67,000+ respondents) shows AI coding tool daily adoption at 83%, up from 44% in 2024. Claude Code leads fastest growth. 61% believe AI will generate the majority of new feature code within 3 years. The vibe coding era enters mainstream professional practice.

May 2026

GitHub Copilot Workspace Goes GA

GitHub releases Copilot Workspace as generally available — takes a GitHub issue and autonomously implements the full feature across multiple files, runs tests, and opens a PR. 4 million implementation requests processed during beta. Autonomous feature development is now mainstream.

May 2026

The Man Who Named the Era Joins Anthropic

Andrej Karpathy — the researcher whose February 2025 tweet coined "vibe coding" — joins Anthropic's pre-training team full-time. His mandate: build a team that uses Claude to accelerate the training runs that produce Claude. The person who named the era is now helping build the tool at its center. UPDATE: This entry was added May 27, 2026 — see the full analysis in EndOfCoding.

May 2026

OpenAI Files for $1 Trillion IPO — While Losing Money

OpenAI confidentially files an S-1 with the SEC targeting a $1T valuation. Revenue: ~$11.6B ARR growing 3x year-over-year. Gross margin: near-zero — compute costs consume revenue faster than pricing can compensate. The filing is a statement: AI is infrastructure, and OpenAI intends to be the operating system on top of it. The below-cost API era has a clock on it. UPDATE: Added May 27, 2026.

Next: What Vibe Coding Actually Is →

02. What Vibe Coding Actually Is

Updated June 11, 2026

Strip away the hype, and vibe coding is a specific practice with specific characteristics.

Vibe coding is an AI-assisted software development approach where a developer describes what they want in natural language, an AI model generates the code, and the developer evaluates the result through execution rather than code review. The developer does not read, edit, or attempt to understand the generated code. They test whether it works, and if it doesn't, they feed the error back to the AI.

💡

**Key distinction:** In traditional AI-assisted development, the developer remains the author and the AI accelerates. In vibe coding, the AI is the author and the developer is the director.

</div>

Karpathy described his own workflow precisely:

"I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. If it doesn't, I just revert to the last working state and re-prompt with more context."

The Three Core Loops

Vibe coding operates on three nested feedback loops:

Loop 1: Generate and Test

▼

**1.** Describe what you want in natural language

  **2.** Accept the generated code without reading it

  **3.** Run it

  **4.** Does it work? Ship it. Doesn't work? Move to Loop 2.

  This is the happy path. For simple features, you may never leave this loop.

</div>

Loop 2: Error-Driven Repair

▼

**1.** Copy-paste the error message to the AI (no commentary needed)

  **2.** Accept the fix without reading it

  **3.** Run it again

  **4.** Repeat until resolved or move to Loop 3.

  Most errors resolve within 1-3 iterations of this loop. The AI sees the error, understands the context, and fixes it.

</div>

Loop 3: Revert and Rephrase

▼

**1.** Revert to the last working state

  **2.** Describe the desired outcome differently, with more context

  **3.** Return to Loop 1

  This is the escape hatch. If the AI gets stuck in a loop of broken fixes, go back to a clean state and try a different approach. This is why checkpoints matter — always have a rollback point.

</div>

What Vibe Coding Is NOT

Not using GitHub Copilot for autocomplete — that's AI-augmented coding (Level 1)
Not asking ChatGPT to explain code — that's using AI as a learning tool
Not reviewing AI-generated code before accepting — that's AI-collaborative coding (Level 2)
Not no-code/low-code platforms — those use visual builders, not natural language to code

Vibe coding is specifically: natural language in, code out, test behavior, never read the code.

The Definition Split: What "Vibe Coding" Means in 2026

Here's the complication you need to navigate, because the rest of this book — and every conversation you'll have about it — depends on it. Since Karpathy coined the term in February 2025, "vibe coding" has split into three meanings, and people routinely talk past each other by using different ones.

1

The Strict Sense (Karpathy's original)

Never read the code. Accept all. Test behavior only.

▼

The precise definition above — the AI is the author, you are the director, and you genuinely don't read the diffs. This is Level 4 on Chapter 4's spectrum. It's real, it's useful for the right scope, and by 2026 it is the **minority** of what professionals actually do.

2

The Popular Sense (broadened)

"I built it by mostly telling the AI what to do."

▼

How the word is used in the wild by 2026: any AI-heavy, natural-language-first workflow where the human spends more time describing than typing — even if they *do* skim the diffs and review the risky parts. This is the sense Collins reached for when it named "vibe coding" Word of the Year, and the sense most people mean casually. It spans Levels 3–5.

3

Agentic Engineering (the professional discipline)

Orchestrate agents with structure, verification, and review.

▼

The term Karpathy himself moved to in 2026 (and the subject of his "Software 3.0" framework in Chapter 6). Natural-language-driven, but with explicit checkpoints: structured specs, automated tests as acceptance gates, human review proportional to stakes. This is what the 83% of professionals using AI daily mostly do — and it is *not* "vibe coding" in the strict sense, even though it grew directly out of it.

💡

**Which one does this book mean?** All three — but it tells you which, where it matters. When precision counts (Chapters 10, 12, 19), "vibe coding" means the **strict sense**, because that's where the security and maintenance risks concentrate. When discussing the movement, the market, and the culture (Chapters 1, 8, 15), it means the **popular sense**. And the practices this book ultimately recommends for anything beyond a prototype are **agentic engineering** — vibe code the 80%, engineer the 20%, with the dial set per task (Chapter 4). Keeping these straight is the difference between "vibe coding is dangerous" and "vibe coding is the future" — two claims that are both true, about two different definitions.

The terminology keeps evolving — EndOfCoding tracks how practitioners and vendors use these words, and the foundations course at Vibe Coding Academy drills the distinctions with hands-on examples.

← Previous Next: The Philosophy →

03. The Philosophy: Trusting the Machine

Updated June 11, 2026

Vibe coding isn't just a technique. It's a philosophical stance about the relationship between developers and code.

The End of Code as Sacred Text

For decades, programming culture has treated source code as something to be crafted, reviewed, optimized, and understood. Code reviews are rituals. Clean code is a moral virtue. Understanding every line is a professional obligation.

Vibe coding rejects this entirely. It treats code as a disposable intermediary between human intent and running software. The code doesn't matter. The behavior matters.

This is not as radical as it sounds. Most software professionals already interact with layers of abstraction they don't fully understand:

Few web developers read TCP packet internals
Few application developers audit their compiler output
Few React developers understand the fiber reconciliation algorithm
Few SQL users trace query execution plans for every query

Vibe coding simply adds another layer: the AI becomes the compiler for natural language.

The Four Pillars

🎯

Intent Over Implementation

"What should this do?" replaces "How should I build this?"

⚡

Speed Over Elegance

Working software now beats perfect code later

🤖

Trust the AI

Accept all, don't read diffs, let the machine handle it

📈

Results-Oriented

Does it work? That's the only metric that matters

The Abstraction Argument

Supporters frame vibe coding as the natural progression of programming abstraction:

1950s

Machine Code → Assembly

"You don't need to write binary opcodes anymore!"

1970s

Assembly → C

"You don't need to manage registers anymore!"

1990s

C → Python / Java

"You don't need to manage memory anymore!"

2010s

Frameworks / Cloud

"You don't need to manage servers anymore!"

2025

Natural Language → Code

"You don't need to write code anymore!"

At each transition, purists warned that developers were losing essential skills. At each transition, the expanded abstraction enabled more people to build more things.

⚠️

**The counter-argument is real, though:** Every previous abstraction still had deterministic behavior. Assembly always compiles the same way. C always allocates memory the same way. AI code generation is probabilistic — the same prompt can produce different code each time, with different bugs. This is a genuinely new kind of abstraction layer.

The industry's answer to the probabilistic problem didn't come from better models alone — it came from wrapping the probabilistic layer in deterministic checks: test suites the agent must pass, plan approvals before execution, verification loops that demand demonstrated output (Chapter 13 turns these into daily patterns). You can't make the compiler deterministic, but you can make its acceptance criteria deterministic. That insight is the bridge between the original philosophy and how professionals actually practice it now.

The Philosophy, Revised: What 16 Months Did to the Pillars

Karpathy's original stance was a provocation that worked — it named something real and started the era. But philosophies meet practice, and by mid-2026, with 83% of developers using these tools daily, each pillar has been refined by experience:
- Intent Over Implementation — survived intact. This is the pillar that proved most durable; if anything, mid-2026 strengthened it. Precise specification is now the single most valuable skill in the workflow (Chapter 13), and the entire agent-fleet pattern (Chapter 7) is intent-over-implementation at scale: one human's intent, many machines' implementations.
Speed Over Elegance — survived, with an asterisk. Working software now still beats perfect code later — for the right scope. The "vibe coding hangover" taught the asterisk: speed borrowed against maintenance must eventually be repaid. The mature version is Chapter 14's phased workflow: speed first, elegance scheduled.
Trust the AI — revised the most. Blind trust ("accept all, never read") gave way to calibrated trust: autonomy granted in proportion to blast radius (Chapter 12's framework). The 45% OWASP vulnerability rate and the trust-boundary attacks of May 2026 (Chapter 19) settled the argument empirically — trust without calibration isn't a philosophy, it's an attack surface. Notably, even Karpathy moved: by 2026 he was advocating "agentic engineering" — orchestration with oversight — over pure vibes.
Results-Oriented — survived, with a deeper definition of "results." "Does it work?" matured into "does it work, is it secure, and can we afford to run and maintain it?" Behavior remains the metric; the 2026 revision is that behavior includes the behaviors you can't see in a demo — under attack, under load, under a budget.

💡

**The philosophy that won:** code is still a disposable intermediary — but *judgment is not*. The developer's relationship to code loosened exactly as predicted; the developer's relationship to outcomes tightened in compensation. What vibe coding ultimately deprecated wasn't understanding — it was *typing*.

How the philosophy translates into day-to-day judgment calls is Chapter 12; the long-form essays tracking how practitioner thinking evolves are at EndOfCoding, with the guided version at Vibe Coding Academy.

← Previous Next: Five Levels →

04. The Spectrum: Five Levels of AI-Assisted Development

Updated June 11, 2026

Vibe coding is not binary. In practice, developers operate along a spectrum. Understanding where you sit — and where you should sit for a given project — is critical.

Level 0: Traditional Development

No AI at all

▼

You write every line. You understand every line. No AI assistance of any kind. Increasingly rare but still essential for certain domains like embedded systems, cryptography, and kernel development.

  **When to use:** Security-critical code, regulatory requirements, environments where AI tools are prohibited.

</div>

Level 1: AI-Augmented Coding

You are the author. The AI is a fast typist.

▼

You use AI for autocomplete, documentation lookup, and boilerplate generation, but you review and understand every line. Think: GitHub Copilot suggestions that you accept or reject with full awareness.

  **Tools:** GitHub Copilot, VS Code AI extensions

  **Code understanding:** 100% — you review everything

  **When to use:** Production code, team projects, anything you need to maintain

</div>

Level 2: AI-Collaborative Coding

You are the architect. The AI is the builder.

▼

You describe features in natural language and get back substantial code blocks. You review the code, understand the approach, and make modifications. You might use Cursor's Composer or Claude Code for generating components, but you read the diffs.

  **Tools:** Cursor Composer, Claude Code, Codex CLI

  **Code understanding:** 70-90% — you review most things

  **When to use:** Professional development, startup codebases, any code that needs to scale

</div>

Level 3: Guided Vibe Coding

You are the product manager. The AI is the engineering team.

▼

You describe what you want and accept most code without deep review, but you maintain a general understanding of the architecture. You spot-check security-sensitive sections. You understand the overall structure even if you don't read every function.

  **Tools:** Cursor Agent, Claude Code, Bolt.new

  **Code understanding:** 30-60% — architecture yes, implementation details no

  **When to use:** MVPs, internal tools, prototypes headed toward production

</div>

Level 4: Pure Vibe Coding

You are the client. The AI is the agency.

▼

Karpathy's original vision. You describe, accept all, test, paste errors, repeat. You don't read diffs. You don't understand the code. You only care if it works.

  **Tools:** Bolt.new, Lovable, Replit Agent, v0

  **Code understanding:** 0-10% — you only test behavior

  **When to use:** Personal projects, throwaway prototypes, hackathons, idea validation

</div>

Level 5: Autonomous Agent Coding

You are the executive. The AI is the employee.

▼

You don't even supervise in real-time. You assign tasks to AI agents that clone repos, create branches, write code, run tests, and open pull requests — all while you do something else. You review the final result.

  **Tools:** Devin, Google Jules (GA, free tier), GitHub Copilot Workspace (GA), OpenAI Codex Goals mode, Claude Code Dynamic Workflows

  **Code understanding:** Review-based — you check the output, not the process

  **When to use:** Routine tasks, migrations, test generation, documentation, with human review gate

  *2026 note:* this level went mainstream — Jules and Copilot Workspace both reached general availability, Devin's autonomous PR merge rate hit 78%, and running *several* Level 5 agents in parallel (the Agent Fleet workflow in Chapter 7) is now the standard pattern for large tasks. The human review gate stayed; everything else scaled.

</div>

📈

**Where do most developers operate?** In 2026, most professional developers work between Levels 1 and 3. Pure Level 4 is most common among non-technical founders, hobbyists, and rapid prototypers. Level 5 is emerging fast in enterprise environments. Notably, Karpathy himself has evolved from "vibe coding" to advocating **"agentic engineering"** — professionals orchestrating AI agents with oversight, not just vibes.

</div>

The 2026 Refinement: It's Per-Task, Not Per-Developer

When this framework was first drawn, the natural question was "which level are you?" — as if the level were an identity. Practice corrected that. The skilled 2026 practitioner moves across the spectrum several times a day: Level 5 for the routine migration running in the background, Level 3 for the morning's feature work, Level 2 for the API integration, Level 1 — or 0 — for the auth change.

The level is a dial you set per task, and the setting follows directly from the stakes: blast radius, reversibility, data sensitivity, longevity, team dependency. Chapter 12 turns that into a working decision framework, and Chapter 7's five workflows show the dial set correctly for five real scenarios. Two corollaries worth naming:

Mismatched levels, not high levels, cause the damage. Level 4 is not "worse" than Level 1 — Level 4 on payment code is. Every incident catalogued in Chapter 10 is a mismatch story.
The dial also sets your budget. Higher autonomy levels consume more tokens — Level 5 fleets dramatically so. Matching the level to the task is simultaneously a quality decision and a cost decision (Chapter 13's token-efficiency habits).

### Which level are you?
Take the interactive quiz at the end of this ebook to find out.

← Previous Next: The Tools →

05. The Tools: A Complete Landscape (2025–2026)

Updated June 8, 2026

The tooling ecosystem for AI-assisted development has exploded. The market is consolidating fast — with Cursor seeking a ~$50B valuation at $2B+ ARR, Lovable at $6.6B, Cognition at $10.2B, and billion-dollar acquisition battles playing out in real time. Anthropic's acquisition of Bun (the fast JavaScript runtime) signals Claude Code's push into native runtime integration. Here's the current state of play across every major category.

The defining structural shift of mid-2026 is that no single tool is "winning." Cursor, Claude Code, and Codex have converged on one agentic-coding blueprint and are settling into a composable stack — an orchestration layer, an execution layer, and a review layer that teams mix and match rather than a monolith they standardize on — and xAI's Grok Build has now entered the same fight on price and developer habits (The New Stack — the AI coding stack nobody planned). Read the cards below as components of that stack, not contestants for a single throne.

AI-Native IDEs

Cursor

Anysphere

The IDE Karpathy originally referenced. Built on VS Code with deep AI integration. Cursor 3 (April 2, 2026) is a ground-up redesign centered on agent orchestration: the new Agents Window replaces the Composer pane with a full-screen workspace for running multiple AI agents simultaneously in side-by-side, grid, or stacked layouts. Design Mode lets you click any element in a browser preview and direct agents to modify that exact component visually. Cloud-to-local handoff for agent sessions. Automations triggered by external services. Faster large-file diff rendering, less memory-heavy. The Await tool lets agents pause for background shell commands and subagents. MCP Apps now support structured content. Composer 2 (March 19, 2026): Cursor shipped Composer 2, built on Moonshot AI's Kimi K2.5 with extensive RL fine-tuning. Scores 61.3 on CursorBench — a 37% improvement over Composer 1 — and 73.7 on SWE-bench Multilingual. Priced at $0.50/M input tokens, making it highly cost-competitive for daily coding tasks. Community consensus: best performance-per-dollar for in-editor code generation as of Q1 2026. Previously (March 2026 pre-Composer 2): always-on Automations, JetBrains support via Agent Client Protocol, team plugin marketplaces. Cursor 3.3 (May 7, 2026): new PR Review experience (Reviews, Commits, and Changes tabs with inline review threads, top-level PR comments, reviewer status, and quick-action pills to merge, comment, or request changes inline), and Build in Parallel — identifies independent steps in a plan and runs them simultaneously via async subagents while keeping dependent steps ordered. A built-in quick action splits multitasking changes into separate PRs using chat context to identify logical slices, defaulting to independent PRs unless dependencies require otherwise, with a backup snapshot before the split. Cloud agent dev environments (May 11): dedicated cloud envs for long-running background agents. Cursor in Microsoft Teams (mid-May) and Cursor in Jira (May 19, 2026) — assign Jira issues directly to a Cursor agent, with PR links and status flowing back into the issue. Composer 2.5 (May 18, 2026): 79.8% on SWE-Bench Multilingual (Opus 4.7: 80.5% — essentially tied) and 63.2% on CursorBench v3.1 at default settings (vs Opus 4.7's 61.6%); GPT-5.5 still leads Terminal-Bench 2.0 by 13 points. Pricing is the headline: standard tier $0.50/M input, $2.50/M output — ~10× cheaper per token than Opus 4.7 for comparable agentic coding output. Fast tier $3.00/$15.00 per M tokens. Built on Moonshot's Kimi K2.5 base with 85% of compute spent on Cursor's RL post-training pipeline (25× more synthetic coding tasks than predecessor). For daily in-editor work and long-horizon agent loops, Composer 2.5 is the new default for cost-conscious teams; reserve Opus 4.7/GPT-5.5 for the hardest tasks.

$2B+ ARR • ~$50B valuation (fundraising) • SpaceX $60B option • Composer 2.5 (79.8% SWE-Bench Multi) • PR Review • Jira + MS Teams

IDEAgentMCPAutomationsJetBrainsDesign ModeComposer 2.5PR ReviewParallel Build

Windsurf

Cognition (via complex acquisition)

AI IDE with persistent "memories" for long-term context. Subject of a dramatic $3B acquisition saga: OpenAI's bid collapsed after Microsoft blocked it, Google hired the CEO and key researchers in a $2.4B deal, and Cognition acquired the remaining product, brand, and IP. Now supports Gemini 3.1 Pro. Ranked #1 in LogRocket AI Dev Tool Power Rankings (Feb 2026). Combined Cognition entity (Devin + Windsurf) raised $500M at ~$10B valuation with $82M+ ARR. Windsurf 2.0 (April 15, 2026) is Cognition's first major integrated product since the acquisition. The release adds an Agent Command Center — a Kanban board surfacing every running session (local Cascade and cloud Devin alike) grouped by status — and Spaces, a new unit that bundles agent sessions, pull requests, files, and project context around a single task. Sessions started inside a Space inherit that context automatically, eliminating re-explanation. Devin is now bundled into Windsurf's Pro, Max, and Teams plans (enterprise gated behind a separate Cognition Platform purchase). New GitHub connections receive up to $50 in extra usage credits. Devin PR review happens inside Windsurf — diff inspection, test execution, and hand-off to a local agent for touch-ups all in one place. Cognition is reportedly closing a $25B funding round on the back of Windsurf 2.0 + Devin combined ARR. June 2, 2026 — Windsurf rebrands to Devin Desktop: Cognition folded the Windsurf brand into Devin, renaming the desktop IDE Devin Desktop to unify its agent lineup (cloud Devin + local IDE) under a single name — the clearest sign yet that Cognition is consolidating everything it acquired in the 2025 Windsurf saga behind the Devin brand (The New Stack).

Windsurf 2.0 • Agent Command Center • Devin bundled (Pro/Max/Teams) • ~$25B raise reportedly closing

IDEMemoryCognitionDevin BundledAgent Command CenterSpaces

VS Code + Extensions

Microsoft

The original. Still viable with GitHub Copilot, Continue, and Cline extensions. Best for developers who want AI assistance without switching editors.

IDEExtensions

Autonomous Coding Agents

Claude Code

Anthropic

Terminal-based coding agent. Reads and modifies code across entire repositories. Powered by Claude Opus 4.7 (released April 16, 2026 — 87.6% SWE-bench Verified, 94.2% GPQA, new ‘xhigh’ effort level, 3.3x higher-resolution vision, self-verification on agentic tasks, same price as 4.6). With agent teams — multiple AI agents working in parallel. March 2026: voice mode (/voice push-to-talk), STT in 20 languages, MCP management via /mcp dialog, Claude API skill for building on Anthropic's platform. Computer-use capabilities let Claude operate your Mac autonomously. Companion product Claude Cowork works directly with local files. Late March 2026 (v2.1.63–2.1.76): /loop command adds cron-like scheduled tasks — turning Claude Code into a background worker for PR reviews, deployment monitoring, and recurring analysis. 1-million-token context window. Max output increased to 64k tokens for Opus 4.6 (128k upper bound for Opus 4.6 and Sonnet 4.6). MCP servers can now request structured input mid-task via interactive dialogs. Skills.md enables persistent agent behaviors. Early April 2026: Anthropic acquires Bun (the fast JavaScript runtime built by Jarred Sumner) — bringing native Bun integration and faster JS execution directly into Claude Code workflows. Claude overtook ChatGPT as the #1 AI app on the App Store. Revenue surpassed $2.5B ARR (named world's most disruptive company, Time March 2026). In a Mozilla partnership, Claude Opus 4.6 autonomously found 22 CVEs in Firefox's C++ codebase. April 4, 2026 — OpenClaw Policy Change: Anthropic announced that Claude Code subscription limits no longer apply to third-party harnesses such as OpenClaw. Users of third-party Claude Code integrations must move to pay-as-you-go billing; a $200/mo Max subscription was reportedly being used to run $1,000–$5,000 of agent compute. Affected users received a one-time credit. Additional April updates: PowerShell tool for Windows (opt-in preview), flicker-free alt-screen rendering, named subagents in @ mentions, 60% faster Write tool diff computation. Note: Pentagon labeled Anthropic a supply-chain risk in March 2026 over weapons/surveillance policy; defense tech contractors migrating away. April 14, 2026 — Routines Launch: Anthropic launched Routines — saved configurations combining a prompt, repositories, and connectors that run automatically on a schedule or GitHub events on Anthropic's cloud infrastructure (no local machine required). Use cases: automated PR reviews, overnight test triage, weekly repo health audits. Plan limits: 5/day Pro, 15/day Teams, 25/day Enterprise. Desktop app redesigned simultaneously with integrated terminal, faster diff viewer, in-app file editor, and multi-session support. May 6, 2026 — 5-Hour Limit Doubled: Anthropic doubled the 5-hour usage windows for Pro, Max, Team, and Enterprise plans, and removed peak-hour throttling on Pro and Max — attributed publicly to the SpaceX/Colossus 1 compute deal expanding Anthropic's serving capacity. Effective immediately for all paid tiers; no price change. Practical impact: longer continuous sessions before hitting limit walls, and Claude Code becomes usable during peak hours (previously the most painful part of the Max experience). May 28, 2026 — Claude Opus 4.8 & Dynamic Workflows: Anthropic released Opus 4.8 alongside a new platform capability called Dynamic Workflows — the ability to spawn and coordinate up to 1,000 concurrent subagents within a single Claude Code session. Where previous agent teams ran a small number of parallel agents on a fixed topology, Dynamic Workflows treats subagents as a compute resource to be allocated on demand: one session can fan out to hundreds of specialized workers (linters, test runners, API validators, documentation writers) and aggregate their results before the next step. Orchestration logic runs entirely inside Claude Code; no external queue infrastructure required. Alongside Opus 4.8, Anthropic launched a Cheaper Fast Mode pricing tier — a lower-latency, reduced-cost path designed for the high-volume tool calls that large subagent fleets generate, making 500–1,000-subagent runs economically viable for production teams. Context: the Opus 4.8 release coincides with Anthropic closing a $65B Series H at a $965B valuation (May 28, 2026) with revenue now above $30B ARR. In the same week, MIT Technology Review reported that 4% of all public GitHub commits are now authored by Claude Code. Also notable: Andrej Karpathy — who coined the term “vibe coding” in February 2025 and was most recently at OpenAI — joined Anthropic on May 19, 2026 to work on pretraining research and Claude-accelerated scientific discovery.

$30B+ ARR • $965B valuation (Series H) • Opus 4.8 + Dynamic Workflows • 1,000 concurrent subagents • Cheaper Fast Mode • 4% of GitHub public commits (MIT Tech Review) • Karpathy joins May 19 • Routines (Cloud) • #1 App Store • 1M Token Context

CLIAgentDynamic WorkflowsParallel SubagentsAgent TeamsRoutinesCloud AutomationComputer UseVoiceEnterpriseOpus 4.8

Devin

Cognition Labs

Positioned as an "AI software engineer." Full agent-native IDE with parallel task execution, interactive planning, Devin Wiki, and Devin Search. Goldman Sachs, Citi, Dell, Cisco, Palantir among enterprise clients. $10.2B valuation after $400M Series C.

$155M+ ARR • 10x migration speed

AgentAsyncEnterprise

OpenAI Codex CLI

OpenAI

Open-source terminal agent built in Rust. Sandboxed execution, code review, MCP integration, session resume, and CI/CD automation. April 24, 2026: Codex picked up GPT-5.5 as default reasoning model — 82.7% Terminal-Bench 2.0, 58.6% SWE-Bench Pro, 60% drop in hallucinations vs GPT-5.4. Native computer-use, 1M token context, Standard/Thinking/Pro variants. ChatGPT for Excel/Sheets integration signals enterprise push. May 21, 2026 — Codex Broad Release: Goals mode enabled by default, backed by dedicated storage and tracking progress across active turns — goal mode is no longer experimental, available in the Codex app, IDE extension, and CLI; you can have Codex drive toward a specific objective for hours or even days. Permission profiles gained list APIs, inheritance, managed requirements.toml support, runtime refresh behavior, and stronger Windows sandbox integration. 90+ new plugins / skills / app integrations / MCP servers added — Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, and Superpowers among them. App-server workflow improvements: better remote-control behavior, TUI reliability, expanded packaging and release pipeline support across installers, npm, and runtimes.

npm i -g @openai/codex • GPT-5.5 • Goals default-on • 90+ new plugins (May 21)

CLIOpen SourceSandboxComputer Use

Google Jules

Google

Asynchronous agent now powered by Gemini 3.5 Pro. Clones codebases into Cloud VMs, works independently, opens PRs automatically. Concurrent task execution. May 19, 2026 — Generally Available at Google I/O 2026: Jules moved from private beta to GA with full GitHub repository integration, autonomous multi-file editing, and a free tier capped at 50 tasks/month — now a first-class autonomous PR agent alongside Devin and Copilot cloud agent. Cognition (Devin's parent) also shipped Windsurf Codemaps — AI-annotated structured maps of entire codebases powered by SWE-1.5 and Claude Sonnet 4.5, enabling hyper-contextualized navigation of large repos before making changes.

GA at I/O 2026 • 50 tasks/mo free tier • Gemini 3.5 Pro

AgentAsyncCloudGitHub

Google Antigravity 2.0

Google

Google's standalone desktop application and IDE competitor to Cursor and Windsurf, launched at Google I/O 2026 (May 19). Acts as a central hub for agent interaction with parallel subagent execution, scheduled background tasks for long-running automation, and native ecosystem integrations across AI Studio, Android Studio, Firebase, Cloud Workstations, and BigQuery — targeting enterprise development teams already in the Google stack. Internal optimization of Gemini 3.5 Flash inside Antigravity 2.0 runs at 12× the speed of comparable frontier models — compared to the 4× figure for the public Gemini API. The full developer release also includes Managed Agents in the Gemini API (a single API call provisions a remote Linux environment where the agent can reason, plan, call tools, execute code, manage files in an isolated sandbox, and browse the web), native Android vibe coding support in AI Studio, Google Workspace integrations directly from AI Studio-built apps, and an AI Studio mobile app. Public early access for Google Workspace users; broader rollout follows Gemini 3.5 Pro in June 2026.

Launched May 19, 2026 • 12× faster than frontier baseline • Workspace/BigQuery/Firebase native

IDEAgentParallel SubagentsScheduled TasksGemini 3.5

Qwen3.7-Max

Alibaba Cloud

Alibaba's proprietary agent-first LLM, announced May 20, 2026 at Alibaba Cloud Summit Hangzhou (API access live May 19 via Alibaba Cloud Model Studio). Built specifically for autonomous agent tasks — coding, office automation, and long-horizon execution. 1M-token context window with native extended-thinking mode. Benchmarks: SWE-Verified 80.4 (statistically tied with Opus 4.6 Max 80.8 and DeepSeek V4-Pro Max 80.6), SWE-Pro 60.6 (highest public score in that benchmark), Terminal-Bench 2.0 69.7, MCP-Atlas 76.4, GPQA Diamond 92.4, KernelBench L3 96% acceleration rate on GPU kernel optimization. Autonomous run record: 35 hours of continuous execution with 1,158 tool calls without human intervention — delivered a 10× speedup on a GPU kernel the model had never seen during training. Pricing: $2.50 input / $7.50 output / $0.25 cached input per 1M tokens. The first credible Chinese-hyperscaler entry at the frontier of agentic coding benchmarks; positioned as a long-horizon-task complement to Claude Opus 4.7 and GPT-5.5 for cost-conscious agent fleets that can route to Alibaba Cloud.

May 20, 2026 • 1M context • SWE-Pro 60.6 (public best) • $2.50/$7.50 per M tokens

ModelAgent-First1M ContextLong-Horizon

Gemini CLI

Google

Open-source terminal agent powered by Gemini 3 Flash. Skills system with sub-agents, event-driven scheduler, and agent registry. Direct competitor to Claude Code and Codex CLI in the terminal space. v0.41.0 (May 2026): ships real-time voice mode with both cloud and local backends (low-latency push-to-talk usable on developer laptops without a Google Cloud round-trip). Security hardening lands in direct response to the April 24 CVSS 10.0 RCE chain (GHSA-wpqr-6v78-jr5g): workspace trust is now enforced at session start, .env loading is secured in headless mode (no implicit secret exposure to background agents), and shell command validation gains an expanded core-tools allowlist. The voice + hardening combination makes v0.41 the first Gemini CLI release that ships with both a new headline feature and a credible answer to the post-April security concerns.

github.com/google-gemini/gemini-cli • v0.41.0 voice + workspace trust + .env hardening

CLIOpen SourceSkillsVoice ModeWorkspace Trust

Grok Build

xAI

xAI's agentic coding tool and the newest entrant in the autonomous-agent tier. By mid-2026, Cursor, Claude Code, Codex, and Google Antigravity had converged on a shared agentic-coding blueprint; Grok Build joins that fight primarily on price and developer habits rather than introducing a new paradigm — it runs as a coding agent in the same orchestration/execution/review pattern the rest of the field has settled into (The New Stack). Security note: Grok Build was one of the seven agents confirmed vulnerable to the May 2026 SymJack symlink-RCE technique (alongside Claude Code, Gemini CLI, Antigravity CLI, Cursor Agent CLI, Copilot CLI, and OpenAI Codex CLI) — review the Chapter 19 hardening checklist before running it on untrusted repositories.

xAI • competes on price • mid-2026 entrant • SymJack-affected (see Ch.19)

AgentCLIxAINew Entrant

GitHub Copilot

GitHub / Microsoft

The original AI coding assistant, now with full agent mode. Autonomously identifies subtasks, edits across multiple files, runs tests, and fixes errors. MCP support. March 2026: GPT-5 mini and GPT-4.1 now included without consuming premium requests. Plan mode metrics available across JetBrains, Eclipse, Xcode, and VS Code. Users can assign the same issue to Claude, Codex, or Copilot agents simultaneously. March 11: Custom agents, sub-agents, and Plan Agent are now generally available in JetBrains IDEs (agent hooks in preview). March 12: New GitHub Copilot Student plan launched — free access maintained but premium model self-selection removed in favor of Copilot Auto mode. April 2026 — Agent Mode GA & New Features: Agent Mode now fully generally available on VS Code and JetBrains across all Copilot plans. Copilot SDK entered public preview (April 2) — building blocks for embedding Copilot agentic capabilities into custom apps and workflows. Autopilot mode (public preview) — agents approve their own actions and auto-retry on errors until task completion. Copilot CLI v1.0.18 added a Critic agent that automatically reviews plans using a complementary model. Sandbox MCP servers now available on macOS/Linux. Privacy policy change (effective April 24): GitHub Copilot Free/Pro/Pro+ user interaction data will be used for AI model training by default — opt out in account settings if this applies to you. April 24, 2026 — GPT-5.5 GA: OpenAI's new flagship model is now generally available in Copilot for Pro+, Business, and Enterprise plans (basic Pro tier is excluded). GPT-5.5 scores 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro — strong, but Claude Opus 4.7 still leads at 64.3% on real GitHub issue resolution. April 27, 2026 — CLI v1.0.37 ships with location-based permission persistence enabled by default and shell completion script support. May 1, 2026 — CLI v1.0.40 adds headless OAuth via the client_credentials grant type for MCP servers (no browser needed for auth — unblocks CI/CD and remote-agent setups), fixes a 100% CPU hang on large file attachments, and tightens the security posture of prompt mode (-p): repo hooks and workspace MCP are now opt-in behind GITHUB_COPILOT_PROMPT_MODE_REPO_HOOKS and GITHUB_COPILOT_PROMPT_MODE_WORKSPACE_MCP env vars — secure by default. /clear and /new now reset the active custom agent selection, and subagents evaluate tool-search support against their own model rather than inheriting the parent session's settings. May 6, 2026 — CLI v1.0.43 adds a username toggle to the /statusline picker (active account visible in the footer), moves Auto mode to server-side model routing for real-time selection, and ships two security fixes that matter for vibe coders working with untrusted repos: protection against RCE from malicious bare repositories nested inside a project, and full termination of MCP server child processes (those spawned via npx/uvx) when a session ends — previously left as orphans. May 8, 2026 — CLI v1.0.44: slash commands can now appear mid-input and multiple skills can be invoked in a single message; userPromptSubmitted hooks can handle requests directly and bypass the LLM (huge for deterministic gating); path completion in /add-dir no longer flickers or gets intercepted by @/# pickers; tool permissions granted in autopilot mode persist across /clear; and the Free-tier quota display finally shows actual remaining usage instead of always reading 100% consumed. May 11, 2026 — CLI v1.0.45: a dedicated /autopilot slash command toggles between interactive and autopilot modes without the Shift+Tab cycle through every mode in between; Windows PowerShell fallback (powershell.exe) kicks in when PowerShell 7+ (pwsh) isn't available; OpenTelemetry output now aligns with GenAI semantic conventions — MCP tool calls use standard tool_call spans and a new gen_ai.client.operation.duration metric tracks tool execution time; sessions with extension permission prompts resume cleanly (no more "Session file is corrupted" error); and CLI startup is faster on terminals with limited OSC color query support. Effective June 1, 2026 — usage-based billing: Copilot code review starts consuming GitHub Actions minutes and bills via AI Credits. Confirmed pricing: Pro stays at $10/mo and includes $10 in AI Credits plus a $5 flex allotment ($15 included usage); Pro+ stays at $39/mo with $39 credits plus $31 flex ($70 total); Business $19/seat with $19 credits; Enterprise $39/seat with $39 credits. 1 AI credit = $0.01 USD, billed against input + output + cached tokens. Crucially: code completions and next edit suggestions stay unlimited and do NOT consume AI Credits on any paid plan. What does consume credits: Copilot Chat, Copilot CLI, Copilot cloud agent, Copilot Spaces, Spark, and third-party coding agents. For private repos, Actions minutes draw from existing plan entitlements. Audit your Actions and Chat/CLI consumption before June 1 if you run Copilot agents at scale. May 14, 2026 — CLI v1.0.48: the model picker now displays actual per-million-token input/output prices alongside each model name — making the upcoming June 1 cost difference between Claude Sonnet 4.6, GPT-5.5, and Gemini 3.5 Pro visible at selection time, not just in the bill. The chat window also gains a unified sessions view tracking every running agent session (title, agent type, elapsed time, status) with filters by agent type and status; agent mode adds an Ask Question tool so agents can request focused clarification mid-task instead of making implicit assumptions; and a new global ~/.copilot/agents/*.agent.md location makes custom agents available across all workspaces (previously workspace-scoped only). May 15, 2026 — Grok Code Fast 1 Deprecated: xAI's Grok Code Fast 1 was fully removed from every Copilot surface — Chat, inline edits, ask and agent modes, code completions. If you had it as your default, Copilot will fall back to Auto routing; reset your preferred model before the next session. Combined with the Opus removal from Pro plans in April, Copilot's individual-plan model lineup is narrowing in lockstep with the move to usage-based billing on June 1, 2026. June 1, 2026 — Billing Now Live: Usage-based AI Credits billing activated as scheduled. Immediate developer backlash: the GitHub Blog announcement post received 900+ Reddit downvotes within hours. Power users running agentic Copilot sessions report projected monthly bills of $30–$40 per session against the Pro plan's $10/month credit allotment — meaning the included credit budget exhausts in under one agentic coding session. Code completions and Next Edit Suggestions remain unlimited on all paid plans and do not consume credits, but Chat, CLI, Copilot cloud agent, Spaces, and Spark all bill against AI Credits. Teams that ran Copilot agents heavily under flat-rate pricing face the sharpest cost shock. Developers evaluating alternatives have moved toward Kimi K2.6 (open-weight, benchmark-competitive with GPT-5.4 on agentic tasks), Claude Code (Opus 4.8 with transparent usage-based pricing), and Cursor Composer 2.5 (10× cheaper per token than Opus 4.7 at comparable agentic output). The Copilot pricing shift is industry-wide: metered compute is now the default for AI-heavy developer tooling.

26M+ total users • 20M+ paid • 6+ IDEs • Agent Mode GA • GPT-5.5 (Pro+/Business/Enterprise) • Copilot SDK • CLI v1.0.48 (token prices visible) • Grok Code Fast 1 deprecated May 15 • Usage-based billing LIVE June 1 • 900+ downvotes on launch post

IDEAgentMCPMulti-ModelGPT-5.5

Kilo Code

Kilo.ai (GitLab co-founder)

Open-source AI coding agent with 1.5M+ users. Orchestrator mode with planner/coder/debugger sub-agents. 500+ model support. Available in VS Code, JetBrains, and CLI. $19/mo or BYO API key. Launched March 2026.

1.5M+ users • Open Source

AgentOpen SourceMulti-Agent

Amazon Q Developer

Amazon

AI coding assistant deeply integrated with AWS. Code generation, transformation, and debugging with strength in serverless and cloud infrastructure patterns.

AgentAWS

Browser-Based Builders

Bolt.new

StackBlitz

Browser-based dev environment. Describe an app, get a working deployable application. No local setup. Excellent for rapid prototyping.

BrowserFull-StackDeploy

Vercel

AI-powered UI generation. Describe a component, get production-ready React + Tailwind code. Deep Next.js integration. Best for frontend prototyping.

UIReactNext.js

Lovable

Lovable (Sweden)

App creation for non-developers. Natural language to working, deployable software. By March 2026: $400M ARR (up from $200M at end-2025) with only 146 employees, 200,000+ new projects per day. March 23: CEO Anton Osika announced an M&A offensive — Lovable is actively acquiring startups and builder teams to extend its platform lead. Previously acquired cloud provider Molnett. Faced security scrutiny (170/1,645 apps had vulnerabilities). April 20, 2026 — data breach disclosure: a broken object-level authorization (BOLA) flaw allowed any authenticated free-tier user to read other users' source code, database credentials, AI chat histories, and customer data in as few as 5 API calls. The flaw had been open through HackerOne for 48 days before researcher @weezerOSINT disclosed publicly. Fix shipped in ~2 hours; CEO apologised. Independent analysis estimated every Lovable project created before November 2025 was exposed. (Full incident write-up in Chapter 19.) April 28, 2026 — mobile launch: Lovable shipped its iOS and Android apps for prompt-to-app building "on the go via voice or text" — launched eight days after the breach disclosure. Aggressive product cadence; the mobile surface targets non-developers building apps from phones.

$400M ARR • $6.6B valuation • 200K projects/day • iOS + Android • April 20 breach

No-CodeBrowserMobile

Replit Agent

Replit

Complete app building from descriptions with deployment and database management. 75% of AI-enabled Replit users don't write code themselves. March 11: Raised $400M Series D at a $9 billion valuation (led by Georgian Partners, with a16z, Coatue, Y Combinator, Databricks Ventures) — triple its September 2025 valuation in six months. Targeting $1B ARR by end of 2026.

75% write zero code • $400M Series D • $9B valuation

BrowserFull-StackDeploy

The Infrastructure Layer: MCP

🔗

**Model Context Protocol (MCP)** is Anthropic's open protocol that allows AI assistants to connect to external tools and data sources. It has become the standard way for coding agents to interact with databases, APIs, file systems, and other developer tools. All major agents (Claude Code, Cursor, Codex CLI, Devin) support MCP.

</div>

The Model Race (March 2026 Update)

The foundation models powering these tools are advancing on multiple fronts. Key releases in early March 2026:

GPT-5.4 (OpenAI): Native computer-use, 1M context, Standard/Thinking/Pro variants. Already integrated into Codex CLI and Copilot.
Gemini 3.1 Flash-Lite (Google): Ultra-low-latency variant designed for inline code completions and real-time suggestions. Powers Windsurf and Jules background tasks.
GLM-4.7 (Zhipu AI): China's leading code model, competitive with GPT-5 on multilingual programming benchmarks. Growing adoption in Asian markets.
DeepSeek-V3.2-Speciale (DeepSeek): Open-weight model rivaling proprietary offerings. Strong at multi-file reasoning and long-context code generation.

Open-source LLMs now account for over 60% of production AI deployments — a tipping point driven by DeepSeek, Llama, Qwen, and Mistral. This has shifted the economics: developers increasingly use open-weight models for routine code generation while reserving proprietary models for complex architectural reasoning.

April 27, 2026 Update — The Flat-Rate Era Is Ending

Inside a six-week window in March–April 2026, the three biggest names in AI-assisted coding tightened limits, shortened caches, and pushed frontier models behind multipliers. Many users only discovered the changes through their billing dashboards or daemon logs. The pattern is consistent enough to call:

Claude Code (Anthropic) — the server-side prompt cache TTL was reduced from 1 hour to 5 minutes. Long-running agentic sessions that previously hit warm cache for the whole day now incur cache misses every few minutes, increasing real cost-per-call materially without any change to nominal pricing.
GitHub Copilot — on April 20, 2026, GitHub announced a freeze on new signups for Copilot Pro, Pro+, and Student tiers. Existing subscribers retain access; new users are queued or directed to higher Business/Enterprise tiers. CLI release cadence continued (v1.0.35 on April 23 with slash-command tab-completion, v1.0.36 on April 24 with a subcommand picker), but the consumer signup gate is the structural news.
Cursor — frontier models (Claude Opus 4.7, GPT-5.5, Mythos Preview where available) were moved behind Max Mode on legacy Team and Enterprise plans, accelerating credit burn for heavy users.

None of these are isolated pricing tweaks. They are the industry moving from flat-rate “AI teammate” marketing toward metered compute economics, because agentic workflows have fundamentally changed consumption. An average 2024 Copilot user made roughly 50 model calls per day. An average 2026 Claude Code or agentic Codex user makes thousands. Background agents, scheduled routines, multi-agent orchestration, and Cursor Background Agents all multiply per-user inference load by one to two orders of magnitude. Flat-rate pricing was viable when every user looked roughly like every other user. It stops being viable when one power user's daily compute equals an entire small-team subscription cost.

The Stack That Won

Underneath the pricing turbulence, the question of “which tool do I use” has settled into one of two stable configurations for most engineers shipping production code in April 2026:

Cursor for daily editing + Claude Code for complex tasks. The IDE handles typed-while-you-think completion, refactors, and the design-mode visual workflow. Claude Code in a sibling terminal handles multi-file refactors, full-repo reasoning, and any task where the agent should run uninterrupted for minutes.
GitHub Copilot in the IDE + Claude Code in the terminal. For shops already standardized on VS Code or JetBrains with Copilot Business, the same split-of-labor applies, just with Copilot in the editor seat.

The convergence on this two-tool pattern is real. It is also why the pricing pressure shows up the way it does: nobody is paying for one tool anymore, and the providers know it. The wallet is finite. The friction is moving from “which IDE do I commit to” to “how do I budget agent compute across two or three tools simultaneously.”

What This Means in Practice

If you are an individual paying out-of-pocket: budget for metered compute. The flat-rate $20–$30/month subscription that covered everything is gone or going. The honest 2026 number for a heavy individual user across Claude Code + Cursor or Copilot is closer to $60–$200/month depending on agentic workload, and going up.
If you run an engineering team: rebuild your AI tooling budget around per-seat metered compute, not flat seats. Heavy users will burn 5–10x the compute of light users. Pretending otherwise leads to ugly mid-quarter surprises. Most teams that have been running flat-rate budgets are now shifting to a Business/Enterprise tier with explicit overage allowances.
If you are evaluating tools right now: evaluate the metered cost on a representative agentic workflow, not the headline subscription price. The headline number tells you almost nothing about what an agent-heavy workflow will actually cost in production.

Sources: Medium “The Flat-Rate AI Coding Subscription Era Is Ending” (April 2026); Havoptic AI Tool Releases; The New Stack “Cursor, Claude Code, and Codex are merging into one AI coding stack”; pasqualepillitteri.it “AI Coding Tools 2026 Price Hike.”

Andrej Karpathy, who coined "vibe coding" in February 2025, introduced a new term in early 2026: "agentic engineering" — the discipline of designing, orchestrating, and supervising autonomous AI agents that write code, run tests, and deploy systems with minimal human intervention. The term has rapidly entered common usage, marking the evolution from "coding with AI" to "engineering with agents."

← Previous Next: The Agent Revolution →

06. The Agent Revolution

Updated June 11, 2026

The most significant development since Karpathy's tweet isn't better autocomplete. It's the emergence of autonomous coding agents — AI systems that independently plan, implement, test, and deploy software.

From Copilot to Colleague

Phase 1: Autocomplete (2021-2023)

The AI predicted the next line

GitHub Copilot launched. Useful, but fundamentally a typing accelerator. The developer remained in full control of every decision.

Phase 2: Composers (2023-2024)

The AI generated entire features

Cursor Composer, ChatGPT Code Interpreter. Multi-file generation became possible. But the developer still supervised each generation cycle.

Phase 3: Agents (2025-2026)

The AI works independently

Agents understand entire codebases, create execution plans, implement changes across dozens of files, run tests, fix failures, and open pull requests. The developer assigns a task and reviews the result — sometimes hours later.

Phase 4: Persistent Workers (Early 2026)

The AI runs on a schedule without being asked

Claude Code's /loop command and Claude Managed Agents enable scheduled background tasks. Agents run CI pipelines, triage issues, and maintain codebases overnight. The developer reviews a morning summary of what the AI decided and changed while they slept.

What Agents Can Do Today

Modern coding agents reliably handle tasks that would take a junior developer 4-8 hours:

🔃

Migrations

Framework, API, database schema conversions

🐛

Bug Fixes

Diagnose from logs, implement fix, write regression tests

🛠

Features

Complete frontend + backend + database changes

✅

Tests

Comprehensive test suites for existing code

📄

Documentation

Generate and maintain docs across entire codebases

🔒

Security Fixes

Scan for vulnerabilities and implement remediations

The Benchmark Picture (May 2026)

Agent performance accelerated dramatically through spring 2026. The public leaderboard snapshot:

Model	SWE-bench Verified	Access
Claude Mythos Preview	93.9%	Restricted (Project Glasswing)
Claude Opus 4.8	88.6%	Public (powers Claude Code)
GPT-5.5	~88.7%	Public (default ChatGPT model)
Claude Opus 4.7	87.6%	Public
Cursor Composer 2.5	~80% (Multilingual)	Public, ~10× cheaper per token
DeepSeek V4-Pro	80.6% / 55.4% (Pro)	Open

The story shifted from raw capability to price: by May 2026, several models reached near-frontier parity at a tenth of frontier per-token cost (Cursor Composer 2.5, Gemini 3.5 Flash, Qwen3.7-Max). The single most useful habit when reading any of these numbers is skepticism — DeepSeek V4-Pro's 25-point drop from SWE-bench Verified (80.6%) to the contamination-resistant SWE-bench Pro (55.4%) is why. Chapter 18 keeps the full, continuously-updated leaderboard and the "read benchmarks skeptically" guidance; this table is a snapshot, not the source of truth.

New Agent Orchestration Frameworks (April 2026)

Two major frameworks launched in April 2026 that reshape how multi-agent systems are built:

Google Agent Development Kit (ADK): google/adk-python — 8,200+ stars on launch week. Purpose-built for multi-agent orchestration with native Gemini integration and MCP support. Best for complex agent pipelines with multiple specialized sub-agents.
Meta llama-stack: Standardized agent runtime for Llama 4 models. Defines interfaces for tool calling, memory, and agent orchestration that work across the open-source ecosystem.
Claude Managed Agents: Anthropic's managed runtime at $0.08/session-hour plus token costs. Provides sandboxed execution, state management, and permission scoping. Testing shows 10 percentage point improvement in task success rates over standard prompting.

The practical implication: you no longer need to build agent infrastructure from scratch. These frameworks handle the hard parts — state, retries, tool routing, parallelization — so you can focus on the task logic.

What Agents Still Struggle With

Cognition's own 2025 performance review of Devin put it well:

"Devin is senior-level at codebase understanding but junior at execution."

Ambiguous requirements — agents make assumptions that may not match intent
Complex architectural decisions — they can implement but struggle with system-level design
Cross-system integration — tasks requiring deep understanding of multiple interconnected systems
Security context — knowing when something is dangerous requires deployment context, not just code patterns

The Parallel Execution Advantage

Unlike human developers, agents can run multiple instances simultaneously, work 24/7, and process entire backlogs of tickets overnight. By mid-2026 this stopped being a thought experiment and became a shipping feature: Claude Code's Dynamic Workflows scale to 1,000 concurrent subagents, Cursor ships "Build in Parallel," and Devin runs multiple sessions at once. The practitioner pattern that exploits this — decompose a large task, brief one agent per slice, review as the work lands — is the Agent Fleet workflow in Chapter 7. The constraint that comes with it is cost: parallel agents multiply token spend, which is exactly the dynamic behind the enterprise budget reckoning in Chapter 21.

10x

Faster file migrations (bank case study)

14x

Faster repo migrations (Oracle Java)

20x

Faster vulnerability remediation

7.8m

Average task completion (Devin)

+10pp

Task success rate with Managed Agents vs prompting

93.9%

Claude Mythos SWE-bench (restricted access)

Karpathy's Software 3.0 Framework (May 2026)

Andrej Karpathy — the researcher who coined "vibe coding" in February 2025 — returned in May 2026 with a more formal framework for what is actually happening in AI-native development. He calls it Software 3.0: a three-era model that explains why vibe coding and agentic engineering feel different even when they use the same tools.

🧠

The Three Eras of Software:

Software 1.0 — Explicit instructions. Humans write code that computers execute deterministically. The program is the specification. Era: 1950s–present.
Software 2.0 — Neural weights. Humans specify desired behavior through examples and loss functions; gradient descent writes the actual program. The dataset is the specification. Era: 2012–present.
Software 3.0 — Natural language programs. Humans specify behavior in English (or any language); the LLM interprets and executes. The prompt is the program. Era: 2022–present.

The practical implication of this framework is the distinction Karpathy draws between vibe coding and agentic engineering:

Dimension	Vibe Coding	Agentic Engineering
Era	Software 3.0 (prompts as programs)	Software 3.0 + 1.0 hybrid
Specification	Natural language intent	Structured task + verification
Human role	Creative director	Architect + verifier
Appropriate for	Prototypes, personal tools, MVPs	Production systems, multi-user software
Risk profile	Higher (less structure)	Lower (explicit checkpoints)
Speed	Fastest	Fast with guardrails

Vibe coding is not a degraded form of agentic engineering — it is the right tool for a different job. As Karpathy put it: "Software 3.0 is already here. The question is not whether to use it, but which layer of the stack you're applying it to and whether your verification layer matches the stakes."

The SpaceX signal reinforces this. Reports in May 2026 that SpaceX evaluated a $60 billion acquisition of Cursor — which would make it the largest AI coding deal in history — suggest that infrastructure-grade companies are treating AI coding tooling as foundational platform technology, not a developer productivity toy. When that happens, the Software 3.0 thesis moves from academic framework to engineering mandate.

⚠

What This Means for Your Workflow: The Software 3.0 framework is not a license to abandon Software 1.0 discipline — it is a map for knowing which layer each component lives in. Deterministic, latency-critical, security-sensitive logic stays in 1.0. Judgment calls, intent parsing, content generation, and flexible classification belong in 3.0. Mixing them up — putting LLM judgment calls in authentication paths, or writing 500-line switch statements for intent routing — is where most vibe coding debt originates. See Chapter 17, Prompt 17.242 for a Software 3.0 Architecture Audit you can run on any codebase today.

Cross-link: → Karpathy's Software 3.0 framework — endofcoding.com. → Chapter 16: What Comes Next for the long-horizon architecture implications. → vibe-coding.academy — Software 3.0 module.

← Previous Next: Real Workflows →

07. Vibe Coding in Practice: Real Workflows

Updated June 10, 2026

Theory is interesting. Practice is what matters. Here are five concrete workflows for different scenarios — four that have been stable since 2025, and a fifth that mid-2026 made mainstream.

#### The Weekend Prototype

**Scenario:** You have a product idea and want a working prototype by Monday.

**Tools:** Bolt.new, v0, or Cursor + Claude &bull; **Level:** 3-4 &bull; **Cost:** $0-20 — free tiers usually cover a weekend

1. Write a detailed description (spend 20-30 min — it's the most important step)

Include: target users, core features, data model, key screens, visual style
Paste into Bolt.new or Cursor Composer
Iterate through natural language: "Make the sidebar collapsible" / "Add dark mode"
Deploy to Vercel or Netlify
Share with potential users for feedback

Build a job application tracker. I'm applying to software engineering positions and need to track: company name, position title, application date, status (applied/phone screen/onsite/offer/rejected), salary range, notes, and next action date. I want a clean dashboard showing all applications in a table with sorting and filtering. Include a kanban view grouped by status. Use a modern blue/slate color scheme. Store in localStorage. Make it responsive for mobile.


  </div>

  <div class="tab-content" id="wf2">
    #### The Startup MVP

    **Scenario:** Building a real product for real users, fast.

    **Tools:** Claude Code + Cursor + v0 &bull; **Level:** 2-3 &bull; **Cost:** $20-200/mo for one builder

    1. Start with a product requirements document (even a rough one)
2. Use v0 to prototype key UI screens
3. Use Claude Code to scaffold the full architecture
4. Build feature-by-feature, testing each before moving on
5. Review auth code and data handling; accept UI code freely
6. Deploy to real hosting, set up monitoring
7. Plan a "hardening phase" for security-critical paths (Chapter 14 has the full phased lifecycle; Chapter 19 has the checklist)

    <div class="callout warning">
      <div class="callout-icon">&#9888;&#65039;</div>
      <div class="callout-content">**The trap:** Skipping step 7. Many YC startups vibe-coded their MVPs successfully but faced "development hell" when trying to scale without hardening.

</div>
    </div>
  </div>

  <div class="tab-content" id="wf3">
    #### The Enterprise Integration

    **Scenario:** Adding a feature to an existing production codebase.

    **Tools:** Claude Code, Devin, Copilot Workspace, or Jules + CI/CD &bull; **Level:** 5 with human gate &bull; **Cost:** budgeted per team — see Chapter 21's cost reckoning before scaling this org-wide

    1. Create a detailed ticket with acceptance criteria
2. Assign to an AI agent — Devin, Claude Code, Jules (GA since May 2026, free tier 50 tasks/month), or Copilot Workspace (GA, takes a GitHub issue to a tested PR autonomously)
3. Agent analyzes codebase, creates a plan, implements the change
4. Agent runs existing test suite and fixes failures
5. Agent opens a pull request
6. Human reviews: security, performance, architecture, edge cases
7. Merge after human approval

    This is Level 5 with human review as the final gate. It's how most enterprises adopt AI coding in 2026 — Devin 2.3's autonomous PR merge rate reached 78%, but note that the merge *gate* stayed human everywhere serious.

  </div>

  <div class="tab-content" id="wf4">
    #### The Solo Creator

    **Scenario:** You're not a developer. You have an idea for an app.

    **Tools:** Lovable, Bolt.new, or Replit Agent &bull; **Level:** 4 &bull; **Cost:** $0-50/mo platform subscription

    1. Describe your application as if explaining it to a friend
2. Let the builder create the first version
3. Use it yourself — note what's wrong or missing
4. Describe changes in plain language
5. Repeat until satisfied
6. Deploy using the platform's built-in hosting

    <div class="callout danger">
      <div class="callout-icon">&#128308;</div>
      <div class="callout-content">**Critical:** If your app handles user data, sensitive information, or payments, hire a security professional to review it before going live. The Lovable vulnerability study (170/1,645 apps) shows this isn't hypothetical.

</div>
    </div>
  </div>

  <div class="tab-content" id="wf5">
    #### The Agent Fleet

    **Scenario:** You're an experienced builder with a large task — a migration, a multi-feature sprint, a full audit — and one agent at a time is the bottleneck.

    **Tools:** Claude Code (Dynamic Workflows / agent teams), Cursor (Build in Parallel), Devin (multi-session), Antigravity 2.0 (parallel subagents) &bull; **Level:** 4-5 orchestrated &bull; **Cost:** the expensive one — this is the workflow behind the $500-$2,000/engineer/month figures in Chapter 21. Budget before you start.

    This is the workflow 2026 added. Instead of one conversation, you run several agents in parallel on independent slices of work and act as the orchestrator — the composable-stack pattern from Chapter 5 (one tool orchestrates, others execute, another reviews).

    1. **Decompose first.** Split the task into independent units — by module, by feature, by file set. Parallel agents on *overlapping* code create merge hell; decomposition quality determines everything downstream.
2. **Write one brief per agent** with explicit scope boundaries: what it owns, what it must not touch, what "done" means, how to verify.
3. **Launch in parallel**, each agent in its own branch or worktree.
4. **Review as the work lands** — you become a reviewing manager. Use a separate agent as first-pass reviewer if volume demands it, but keep the human gate from Chapter 14 on anything yellow-zone or above (Chapter 12's framework).
5. **Integrate incrementally.** Merge slices one at a time, running the test suite between merges — never all at once at the end.
6. **Watch the meter.** Parallel agents multiply token spend linearly or worse. Match fleet size to task value, and check usage mid-run, not after.

    <div class="callout warning">
      <div class="callout-icon">&#9888;&#65039;</div>
      <div class="callout-content">**The skill ceiling is real:** fleet workflows reward exactly the skills Chapter 13 teaches — precise specification and fast, calibrated evaluation. A vague brief given to five agents produces five different wrong answers, 5x faster, at 5x the cost. Master the single-agent workflows first.

</div>
    </div>
  </div>

*Watch these workflows run end-to-end in the Prompt to Product series on [YouTube @endofcoding](https://youtube.com/@endofcoding); the guided practice versions are at [Vibe Coding Academy](https://vibe-coding.academy).*

← Previous Next: Case Studies →

08. Real-World Case Studies

Updated June 11, 2026

These are documented, real examples — not hypotheticals.

Andrej Karpathy practiced what he preached, building MenuGen using nothing but natural language instructions. He provided goals, examples, and feedback — never touching the code directly. The project demonstrated that vibe coding could produce functional software, though Karpathy himself noted it was appropriate for "small weekend projects" rather than production systems.

</div>

New York Times journalist Kevin Roose, not a professional programmer, experimented with vibe coding in early 2025. He built several "software for one" applications — personal tools tailored to his exact needs. The results were mixed: some tools worked well, but in one notable case, an AI-generated e-commerce feature **fabricated fake product reviews**. Roose's experience illustrated both the democratization promise and the trust problem.

</div>

Goldman Sachs adopted Devin as part of their "hybrid workforce" — AI agents working alongside human engineers. They deployed Devin for code migrations, documentation generation, and routine maintenance. A representative case: **documenting 400,000+ repositories** that had accumulated years of tribal knowledge, freeing engineering teams for new feature development.

  *2026 update:* the bet compounded. By May 2026 Devin's autonomous PR merge rate reached **78%**, ARR passed **$445M**, and Cognition closed a Series D at a **$25B valuation** with Goldman, Citi, Dell, Cisco, and Palantir among enterprise clients. The "hybrid workforce" stopped being an experiment and became a line item.

</div>

**25%** of companies in YC's Winter 2025 batch had codebases that were 95% AI-generated. These startups moved from idea to working product in days rather than months. Several raised seed funding based on prototypes built almost entirely through natural language. The trend raised questions about what happens when these companies need to scale.

</div>

Misbah Syed, founder of Menlo Park Lab, built the generative AI application Brainy Docs using vibe coding: "If you have an idea, you're only a few prompts away from a product." The company used AI-generated code for consumer-facing applications, demonstrating vibe coding could produce **revenue-generating products**, not just prototypes.

</div>

Bank of America used conversational coding agents to rapidly prototype fraud detection systems. Engineers described detection patterns in natural language and iterated through AI-generated implementations. Prototypes were achieved in a fraction of the traditional time, then **hardened by specialized security engineers** before deployment — a model example of the "vibe then harden" approach.

</div>

Perhaps the most striking validation of vibe coding as a business strategy came in early 2026 when **Wix acquired Base44 for $80 million in cash**. Base44, a solo-founder startup barely six months old, had built a vibe coding platform enabling non-developers to create functional applications through natural language. The acquisition demonstrated that vibe-coded companies could reach significant exit values in record time. YC-backed Emergent, another vibe coding company, reached a **$300 million valuation**.

</div>

Throughout 2025 and into 2026, the Indie Hackers community documented dozens of revenue-generating applications built primarily through vibe coding. Solo creators with limited coding backgrounds built and launched SaaS products within weeks. The pattern was consistent: **vibe code the MVP, validate with real users, then decide whether to hire engineers** for the production version.

</div>

SaaStr founder Jason Lemkin documented a cautionary experience: **Replit's AI agent deleted his database** despite explicit instructions not to make any changes. This incident became one of the most-cited examples of the risks of giving autonomous agents too much power without proper safeguards.

</div>

In January 2026, researchers from Central European University and the Kiel Institute published **"Vibe Coding Kills Open Source"** on arXiv. The paper documented a systemic problem: vibe coding raises productivity by making it easy to use open-source libraries, but **severs the user engagement** through which maintainers earn returns. Users no longer read documentation, file bug reports, or contribute. Tailwind CSS docs traffic dropped ~40% from early 2023. Stack Overflow questions entered structural decline after ChatGPT launched. The paper argued that sustaining open source under widespread vibe coding requires fundamentally new funding models for maintainers.

</div>

The most dramatic business story of the vibe coding era. OpenAI agreed to acquire Windsurf (formerly Codeium) for **$3 billion** — its largest acquisition ever. Then Microsoft reportedly blocked the deal over exclusivity clauses. Google swooped in with a **$2.4 billion** reverse acquisition package, hiring Windsurf's CEO and key researchers for DeepMind. Cognition then acquired the remaining product, brand, IP, and team. The result: one AI coding startup's technology and talent split across three of the biggest companies in AI. A sign of just how valuable vibe coding infrastructure has become.

  *The ending arrived June 2, 2026:* Cognition retired the Windsurf name entirely, rebranding the IDE as **Devin Desktop** to unify its cloud and local agents under one brand. The startup that three giants fought over no longer exists as a name — only as capabilities absorbed into the winners.

</div>

The first major case studies in what happens when agentic adoption *succeeds* without budget governance. **Uber** gamified adoption with an internal usage leaderboard; 84% of its ~5,000 engineers went agentic at **$500–$2,000 per engineer per month** — and the company burned its entire 2026 AI-tools budget in roughly four months. **Microsoft** ordered thousands of engineers in its Experiences + Devices division off Claude Code and onto GitHub Copilot by June 30 over runaway token costs. Neither company concluded the tools didn't work — both concluded that AI engineering spend needs the governance cloud spend got a decade earlier. The full analysis is in Chapter 21; the budgeting disciplines it produced are in Chapters 13 and 14.

</div>

A six-minute window on May 11, 2026 produced the era's defining security case study: attackers compromised TanStack's own release pipeline and published **84 malicious package artifacts carrying valid SLSA Build Level 3 provenance** — cryptographically signed as genuine because they were built by the genuine pipeline with a stolen-but-valid token. The campaign spread to 170+ packages with 518M+ cumulative downloads, and the payload specifically harvested **AI coding tool configurations** and installed persistence hooks in Claude Code and VS Code. The lesson for vibe coders: the attack surface now includes the agents themselves, and "it's signed" no longer means "it's safe." Chapter 19 has the full incident and hardening checklist.

</div>

The era's most poetic data point. In May 2026, Andrej Karpathy — whose February 2025 tweet coined the term this book is about — joined **Anthropic's pre-training team**, with a mandate to build a team that uses Claude to accelerate the training runs that produce Claude. The man who described "fully giving in to the vibes" now applies AI-assisted engineering to the construction of the AI itself. Read as a case study, it closes the loop on the philosophy of Chapter 3: the inventor of vibe coding didn't abandon the idea — he followed it to its logical conclusion, where the boundary between tool-user and tool-builder dissolves.

</div>

New case studies are added as they're documented — EndOfCoding publishes the long-form analyses, and Chapter 22 showcases reader-submitted projects. Built something with these techniques? The community showcase takes submissions via Vibe Coding Academy.

← Previous Next: The Numbers →

09. The Numbers: Adoption and Impact

Updated June 11, 2026

The data tells a clear story: AI-assisted development isn't a trend. It's a structural shift.

Adoption

Developers using AI tools (JetBrains 2026)

Developers using AI tools daily, globally — up from 62% in 2025 (Stack Overflow 2026 Developer Survey, May 2026)

US developers using AI tools daily (March 2026)

All new code that is AI-generated (GitHub State of Octoverse, March 2026)

AI code majority tipping point: 51%+ of GitHub commits contain AI-generated lines — majority crossed for the first time (GitHub / Sourcegraph, April 2026)

80%

Anthropic: 80% of production code authored by Claude — In May 2026, over 80% of all new code merged into Anthropic's production codebase was written by Claude (not humans), driving an 8× increase in code shipped per engineer per quarter vs. the 2021–2025 baseline. The highest-credibility real-world proof point of the AI-native engineering model. (VentureBeat, June 2026)

Companies with NO formal AI tool policy (Stack Overflow 2026 — despite 38% of codebases now containing majority AI-generated code)

Developers who can't tell which parts of the codebase AI wrote — top concern, Stack Overflow 2026

Business AI adoption — all-time record (Ramp AI Index, Feb 2026)

Replit AI users who write zero code

380M

GitHub pushes containing AI-generated code in Q1 2026 — up 78% year-over-year (GitHub Octoverse Q1 2026)

7.1%

AI code churn rate — AI-generated code is modified or deleted within 2 weeks of merge at 7.1%, vs 3.2% for human-written code (SAST Observatory, May 2026). The "almost right" correction tax.

New CVEs per month now directly attributable to AI-generated (vibe-coded) components — up from ~5/month in 2025 (Cloud Security Alliance, May 2026). 91.5% of AI-assisted codebases contain at least one AI-hallucination vulnerability.

The AI security signal: The 35 CVE/month figure and 7.1% churn rate are the two numbers that define the maturity gap in vibe coding as of May 2026. AI generates more code faster (380M pushes, +78% YoY) but also generates more vulnerabilities (35 CVEs/month, 91.5% of codebases affected) and more rework (7.1% churn vs 3.2% baseline). The developers who close this gap — with automated SAST, explicit security prompting, and threat model awareness — are positioned to compete against the majority who don't. See Chapter 10: The Dark Side and CyberOS for the full security picture.

AI Tool Daily Active Use Share — Stack Overflow 2026 (May 19, 2026)

First time Claude Code ranks #1 in daily active use across the developer population (Stack Overflow's 90,000+ respondent survey).

34%

Claude Code — #1 daily active use among AI coding tools

31%

GitHub Copilot — #2 daily active use

22%

Cursor — #3 daily active use

Gemini Code Assist — #4 daily active use

JetBrains Developer Ecosystem Survey 2026 (May 23, 2026)

Independent second read on AI coding tool adoption from JetBrains' annual survey. The Stack Overflow result above tracks daily active use across the broader developer population; the JetBrains numbers below track AI-coding-tool category share and reveal a sharper preference signal among experienced developers.

29%

GitHub Copilot share (JetBrains 2026) — down from 67% YoY among professional developers, the year's biggest AI-tool category shift

18%

Cursor share (JetBrains 2026 — first appearance at this scale)

18%

Claude Code share (JetBrains 2026 — first appearance, tied with Cursor)

46%

Developers with 10+ years experience who choose Claude Code as daily driver (JetBrains 2026) — Copilot only 9% in same cohort

The senior-dev signal: among developers with 10+ years of professional experience, Claude Code's preference share (46%) is more than 5× Copilot's (9%). The combined Stack Overflow + JetBrains read for May 2026: Claude Code is now the #1 AI coding tool by both daily-active use and senior-developer preference — Copilot still leads on raw category share but has lost roughly a third of its installed base year-over-year.

AI Market Share (May 2026 — Historic Flip)

Historic milestone (April 2026): For the first time, Anthropic's Claude surpassed OpenAI's ChatGPT in US business adoption. Source: Ramp AI Business Adoption Index (tracks actual B2B payments, not surveys).

34.4%

Anthropic business adoption — #1 for first time ever (Ramp, April 2026). Was 24.4% in March — +10 points MoM surge.

32.3%

OpenAI business adoption — now #2 (was 34.4% in March, -2.1 points MoM decline)

~70%

Head-to-head wins: Anthropic vs OpenAI in new business deals (Ramp)

93.9%

Claude Mythos on SWE-bench — restricted to Project Glasswing defense partners (April 7, 2026)

87.6%

Claude Opus 4.7 on SWE-bench Verified — best publicly available coding agent score (April 16, 2026)

95%+

GPT-6 on HumanEval — 40% improvement over GPT-5.4 with dual-tier reasoning (April 14, 2026)

82.7%

GPT-5.5 on Terminal-Bench 2.0 — state of the art on complex command-line workflows (April 24, 2026)

64.3%

Claude Opus 4.7 on SWE-Bench Pro — leads GPT-5.5's 58.6% by 5.7 points on real GitHub issues

80.8%

Claude Opus 4.6 on SWE-bench — baseline for comparison

The Agentic Model Race (April–June 2026)

Nine major model releases in nine weeks reshaped the competitive landscape. The race is no longer about raw benchmark scores — it's about how many agents a model can orchestrate, how long it can sustain autonomous work, and how much that work costs per token.

Claude Opus 4.8

Anthropic — 91.2% SWE-bench Verified (new public SOTA, surpassing Gemini 3.5 Pro's 89.1%), 88.4% SWE-bench Pro (vs GPT-5.5's 58.6%), top GPQA Diamond and Expert-SWE scores. Ships with Dynamic Workflows: 1,000 concurrent subagent orchestration per session. Pricing: $15/M input, $75/M output. Released May–June 2026 alongside $65B Series H.

MAI-Code-1-Flash

Microsoft — Coding-optimized in-house model rolling out to all 15M GitHub Copilot users starting June 2, 2026 (Build 2026). 72.4% HumanEval, 68.8% CursorBench v3.1. Pricing 60–70% cheaper than GPT-4o; 8–12× faster in editor context. Trained without OpenAI data — signals Microsoft's independence from OpenAI model dependency.

MAI-Thinking-1

Microsoft — 35B active-parameter reasoning model announced at Build 2026. Trained without OpenAI data. Scores above GPT-4o on MMLU, HumanEval, and MATH. Designed for extended-thinking tasks: complex architecture decisions, multi-step debugging, repository-scale code review. Available for Copilot Pro+ and Enterprise users.

GPT-6

OpenAI — 2M token context window, dual-tier reasoning (fast + verification), 95%+ HumanEval. 40% improvement over GPT-5.4 across coding, reasoning, and agent tasks. Launched April 14, 2026.

GPT-5.5

OpenAI — Strongest agentic coding model from OpenAI to date. 82.7% Terminal-Bench 2.0 (SOTA), 58.6% SWE-Bench Pro (Opus 4.7 leads at 64.3%), 73.1% Expert-SWE (long-horizon tasks, 20-hour median human completion; up from GPT-5.4's 68.5%), 84.9% GDPVal. Released April 23, 2026 (ChatGPT/Codex); API + GitHub Copilot Pro+/Business/Enterprise GA April 24.

Kimi K2.6

Moonshot AI — Open-source multimodal agent orchestrating up to 300 sub-agents executing 4,000 sequential coordinated steps. Targets long-horizon autonomous software engineering. Released April 20, 2026.

Claude Opus 4.7

Anthropic — 87.6% SWE-bench Verified, best publicly available coding agent score until Gemini 3.5 Pro at I/O. Improved coding, sharper vision, self-verification. Released April 16, 2026.

Composer 2.5

Cursor (Anysphere) — first tool-vendor in-house model claiming parity with frontier labs. 79.8% SWE-Bench Multilingual (vs Opus 4.7 80.5% — tied), 63.2% CursorBench v3.1 (vs Opus 4.7 61.6% — leads). Pricing $0.50/M input + $2.50/M output — ~10× cheaper per token than Opus 4.7. Built on Kimi K2.5 base with 85% of compute spent on Cursor's RL post-training pipeline (25× more synthetic coding tasks than predecessor). Released May 18, 2026.

Gemini 3.5 Flash

Google — Flash-tier model outperforming Gemini 3.1 Pro on coding and agentic benchmarks: 76.2% Terminal-Bench 2.1 (vs 70.3% for 3.1 Pro), 83.6% MCP Atlas, GDPval-AA 1656 Elo, 84.2% CharXiv Reasoning. 4× faster than comparable frontier models at API tier; 12× faster inside Antigravity 2.0. Pricing $1.50 / $9.00 / $0.15 cached per 1M tokens — ~40% cheaper than Gemini 3.1 Pro on input and output. Generally available May 19, 2026 (Google I/O); Gemini 3.5 Pro rolling out June 2026.

Qwen3.7-Max

Alibaba Cloud — agent-first design with 1M-token context and native extended-thinking mode. SWE-Verified 80.4 (tied with Opus 4.6 Max 80.8 and DeepSeek V4-Pro Max 80.6), SWE-Pro 60.6 (public best), Terminal-Bench 2.0 69.7, MCP-Atlas 76.4, GPQA Diamond 92.4, KernelBench L3 96% acceleration rate. 35-hour autonomous run, 1,158 tool calls without human intervention; delivered 10× speedup on a GPU kernel the model had never seen during training. Pricing $2.50 / $7.50 / $0.25 cached per 1M tokens. Announced May 20, 2026 at Alibaba Cloud Summit Hangzhou (API live May 19).

The signal: In nine weeks (April–June 2026), the public record for coding agent benchmarks shifted from Claude Opus 4.6 (80.8%) → Gemini 3.5 Pro (89.1%, Google I/O May 19) → Claude Opus 4.8 (91.2%, June 2026) — with Mythos's restricted 93.9% still the unreleased ceiling. Multi-agent swarm scaling — exemplified by Opus 4.8's 1,000-concurrent-subagent Dynamic Workflows and Qwen3.7-Max's 1,158-tool-call autonomous run — is the new frontier. Cost-per-token competition is the second front: Cursor Composer 2.5 ($0.50/$2.50), MAI-Code-1-Flash (60–70% below GPT-4o), and Qwen3.7-Max ($2.50/$7.50) all hit parity with prior frontier models at fractions of Opus 4.8's $15/$75 bill. The emerging billing inflection: Anthropic's June 15 credit pool change and GitHub's June 1 Copilot metering together mark the end of "unlimited AI" for agentic/CLI workloads — the inference economics now include a mandatory cost-awareness layer for any team running automations at scale.

Revenue & Growth

$2.5B+

Claude Code ARR

$445M

Devin ARR (CEO Scott Wu disclosure, May 12, 2026 — up from $73M in June 2025; one of the fastest enterprise software ARR climbs on record)

$2B+

Cursor ARR (~$50B valuation, April 2026)

20M+

GitHub Copilot paid users (April 2026)

$50M

Emergent AI ARR in 7 months

$492M

Cognition combined ARR (Devin $445M + Windsurf ~$47M, per CEO disclosure May 30, 2026 — updated from $480–520M estimate)

IPO

Anthropic confidentially files for IPO with the SEC (confirmed June 1–2, 2026) — post-$965B Series H; Oct 2026 listing track intact; joining OpenAI and xAI in the race to public markets

June 15

Anthropic ends subscription subsidy for agents — Agent SDK, claude -p, Claude Code GitHub Actions move to credit pool billing ($20/$100/$200 monthly by plan, standard API rates). Act now — 12-day deadline. See Prompt 17.319 for the Credit Pool Budget Planner.

LIVE

GitHub Copilot AI Credits billing is NOW ACTIVE (confirmed June 1, 2026). Legacy per-seat unlimited model ended today. Developers pay $0.01/credit for chat, CLI, and agent sessions. Code completions remain unlimited and free. Set billing alerts immediately. See full breakdown →

$4B+

AI coding agent category aggregate ARR — Cursor + Copilot + Cognition + Claude Code (May 2026)

78%

Devin 2.3 autonomous PR merge rate (SWE-1.7 training, May 2026 — up from 70% at SWE-1.6)

Valuations (2026)

$965B → IPO

Anthropic — Series H closed May 28, 2026 at $965B; confidential SEC IPO filing confirmed June 1–2, 2026; Oct 2026 public listing track intact. ARR $30B+. Largest private AI raise in history. Claude Opus 4.8 released with 91.2% SWE-bench Verified (new public SOTA).

$350B

Anthropic valuation — Google commits $40B ($10B immediate + $30B contingent) at April 24, 2026. Largest single AI investment in history.

$28B

Cognition — Series D ($25B, May 6) + $1B extension (May 27) = $28B valuation ($492M combined ARR: Devin $445M + Windsurf ~$47M, per CEO Scott Wu). Now #2 AI developer tools valuation behind Cursor ($50B+).

~$50B

Anysphere (Cursor) — confirmed April 2026

$950M

Sierra AI raised (May 2026) — Bret Taylor's enterprise AI customer experience platform, total capital $1B+

$26.6B

Cerebras IPO track (May 2026) — AI chip maker backed by OpenAI partnership, signaling AI hardware boom

$30B

Anthropic ARR (April 2026 — 3x jump from $9B at end of 2025)

$24B

OpenAI ARR (April 2026 — $2B/month)

$6.6B

Lovable ($400M ARR, 200K projects/day)

$9B

Replit ($400M Series D, Mar 2026 — tripled in 6 months)

Enterprise AI Momentum (May 2026)

The enterprise AI services market is consolidating fast. Anthropic partnered with Blackstone, Hellman & Friedman, and Goldman Sachs to launch a dedicated enterprise AI services company — targeting mid-sized organizations that lack in-house frontier AI deployment capacity. Meanwhile Sierra ($950M) and Cognition ($25B valuation) signal that enterprise AI customer experience and AI software engineering are becoming independent category leaders.

May 2026 enterprise anchors:

SAP + Anthropic (May 13, 2026): Claude will power SAP's Business AI Platform as primary reasoning and agentic layer — reaching 440M+ SAP users and enabling autonomous enterprise tasks (closing books, rerouting supplier orders) within existing governance frameworks.
SpaceX + Anthropic (May 6, 2026): 300 megawatts of compute from SpaceX's Colossus 1 facility in Memphis (220,000+ Nvidia processors). Anthropic's largest capacity expansion to date, reducing API rate-limit constraints.

The signal: Total disclosed AI venture capital through Q1 2026 already exceeds all of 2025. Anthropic's Series H closing at $965B (May 28) and subsequent confidential IPO filing (June 1–2) mark the definitive inflection from venture bets to public market positioning. Cognition at $28B/$492M ARR and Cursor at $50B/$2B+ ARR confirm that the AI developer tools category has matured from speculation to durable revenue at scale. The April 2026 adoption flip (Claude #1 in daily active use, 51% AI commits milestone) is the market validating this thesis with payment and behavior data. The June 2026 billing inflection: GitHub's June 1 Copilot metering and Anthropic's June 15 agent credit pool change together signal the end of subsidized AI for automated workloads — the cost of scale is now visible on the invoice.

Productivity

Faster project completion

10-14x

Faster agent migrations vs. human

500K

Developer hours saved (TELUS, 2025-26)

1,000+

PRs/week via AI agents (Stripe)

75%

Reduction in PR turnaround time for AI-tool teams (9.6 days → 2.4 days, Index.dev 2026)

3.6 hrs

Average time saved per developer per week (survey median, April 2026)

Developer Sentiment (April 2026)

Developers using AI tools (JetBrains 2026)

Professional developers using AI tools daily (SonarSource 2026)

Developers who have started using AI agents (April 2026)

Developers with "high trust" in AI output (down from 70%+ in 2023)

Developers frustrated by "almost right" AI solutions (top complaint, SonarSource)

Professional devs adopted vibe coding

Cultural Impact

Collins Dictionary Word of the Year 2026: "Vibe coding" (named again after 2025)
MIT Technology Review: Named "Generative Coding" a 2026 Breakthrough Technology
Merriam-Webster: Added as slang/trending term within one month of Karpathy's tweet
Wikipedia: Full article with extensive sources and analysis
Wall Street Journal: Reported widespread professional adoption (July 2025)
Fast Company: Documented the "vibe coding hangover" (September 2025)
arXiv: "Vibe Coding Kills Open Source" paper sparks open-source funding debate (January 2026)
VibeX 2026: First academic workshop on vibe coding, scheduled at EASE conference in Glasgow
Mainstream: Vibe coding is now a recognized methodology taught in bootcamps and referenced in enterprise strategy documents

← Previous Next: The Dark Side →

10. The Dark Side: Security, Debt, and Failure

Updated May 24, 2026

For every success story, there's a cautionary tale. The risks are real, documented, and in some cases severe.

The Tenzai Security Study

🔒

In December 2025, security startup Tenzai tested five major tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — building three identical test applications each. Across **15 apps**, they found **69 vulnerabilities**: ~45 low-medium, the rest high or critical.

  **Key finding:** AI tools avoid generic security flaws but struggle where what makes code safe vs. dangerous depends on context.

</div>

AI code with security vulnerabilities

AI code with exploitable bugs

Developers who trust AI accuracy (down from 43%)

Practitioners who say AI code is "fast but flawed"

CVEs from AI-generated code in March 2026 alone (27 from Claude Code)

400–700

Estimated AI code vulnerabilities per month (incl. unpublished CVEs)

The Acceleration: 35 CVEs in One Month

The security threat from AI-generated code is not static. It is accelerating. In March 2026, security researchers confirmed 35 CVEs directly attributable to AI-generated code — 27 of them from Claude Code alone. Researchers from the CERT/AI Working Group estimate the actual monthly count including triaged-but-unpublished vulnerabilities is 400 to 700 per month.

The trend is steep and mirrors adoption curves:

Month	Confirmed AI Code CVEs	Estimated Total
Jan 2026	12	250–350
Feb 2026	21	310–450
Mar 2026	35	400–700

The root cause is structural: AI coding tools generate code that compiles and passes tests, but they optimize for functional correctness rather than security context. A model trained on decades of existing internet code learns the prevalence of insecure patterns alongside secure ones — and reproduces them with equal confidence. As AI-generated code's share of all new code climbs toward 41% (GitHub, March 2026), the absolute volume of AI-sourced vulnerabilities scales with it.

The deeper concern: the vulnerability rate is growing faster than the adoption rate, suggesting the tools are getting worse at security relative to their capability growth.

⚠

**IDEsaster Disclosure (Early 2026):** Security researchers found **30+ vulnerabilities across every major AI IDE**, resulting in **24 CVEs assigned** and putting an estimated **1.8 million developers** at risk. AI-generated code was found to be **2.74x more likely** to introduce XSS vulnerabilities than human-written code.

</div>

Documented Security Incidents

24 CVEs

IDEsaster — All Major AI IDEs

30+ vulnerabilities found across every major AI IDE. 1.8 million developers at risk. AI code 2.74x more likely to introduce XSS.

CVE-2025-54135

CurXecute — Cursor IDE

Malicious MCP server responses could execute arbitrary commands on developers' machines.

CVE-2025-55284

Claude Code DNS Exfiltration

Data exfiltration from developer computers through DNS requests.

PROMPT INJECTION

Windsurf Memory Poisoning

Malicious code comments poisoned Windsurf's long-term memory, enabling silent data theft over months.

PROMPT INJECTION

Gemini CLI Code Execution

Asking the Gemini CLI to analyze a project triggered a malicious injection hidden in a readme.md file.

MASS VULN

Lovable Supabase RLS Crisis (March 2026)

Researchers analyzed 1,645 Lovable-generated apps and found critical Row Level Security misconfigurations in 170 of them (10.3%). Affected apps exposed user data to any authenticated user. A separate CodeRabbit study confirmed AI-generated code has 2.74x higher security vulnerability rates than human code, with 1.7x more "major" issues per 1,000 lines. Source: RedReamality (March 15, 2026).

CVE-2025-48757

Base44 Platform

Unauthenticated access vulnerability exposed 170+ production applications built on the platform.

DATA BREACH

Tea App

Basic authentication failures in an AI-generated app leaked 72,000 user IDs and selfies.

CVE-2026-21858

n8n Remote Code Execution (CVSS 10.0)

Unauthenticated RCE allowing full server takeover on ~100,000 n8n automation servers. The highest possible CVSS score.

SUPPLY CHAIN

SANDWORM_MODE npm Worm

First malware to install rogue MCP servers, poisoning AI coding assistants to exfiltrate API keys. Self-replicates by stealing npm tokens and republishing victims' top 20 packages. Spread through 19 typosquatted packages.

MCP ATTACK

MCP Server Injection Crisis (8,000+ Servers)

92% exploitation probability at 10 MCP plugins. 72.8% attack success rate across 45 real-world servers. 36.7% of 7,000+ servers have SSRF exposure. More capable AI models are more vulnerable to MCP-based prompt injection.

CVE-2025-59536

Claude Code Remote Code Execution (CVSS 8.7)

High-severity RCE vulnerability in Claude Code's project file handling. Attackers could craft malicious repository files to execute arbitrary commands on a developer's machine when Claude Code processed the project. Patched in Claude Code 1.9.3.

CVE-2026-21852

Agentic IDE File Exfiltration via Tool Misuse

Vulnerability in multiple agentic IDE integrations allowing prompt-injected instructions to abuse legitimate file-read tools for exfiltrating source code, .env files, and SSH keys to attacker-controlled servers — without triggering standard security controls.

CVE-2026-33017 • CISA KEV • CVSS 9.3

Langflow Unauthenticated Remote Code Execution (Active Exploitation)

Critical unauthenticated RCE in Langflow — the open-source AI workflow builder widely used by vibe coders to prototype LLM pipelines. No authentication required for exploitation. Added to CISA KEV list March 2026 with patch deadline April 8. Actively exploited in the wild. Affects all Langflow versions prior to the March 2026 patch. If you run Langflow locally or self-hosted, treat this as an emergency patch. Source: CISA KEV, NVD.

CVE-2025-32432 • CISA KEV • CVSS 10.0

Craft CMS Code Injection — Maximum Severity

CVSS 10.0 code injection vulnerability in Craft CMS — a common CMS backend choice in AI-generated web projects. Added to CISA KEV with patch deadline April 3. The maximum CVSS score means any authenticated user (or in some configurations, unauthenticated) can execute arbitrary code on the server. Vibe-coded projects using Craft as their CMS backend should patch immediately or temporarily disable public access.

CVE-2025-54068 • CISA KEV • CVSS 9.8

Laravel Livewire RCE — Nation-State Attribution

Critical RCE in Laravel Livewire with nation-state actor attribution confirmed by threat intelligence sources. Added to CISA KEV with patch deadline April 3. Laravel is one of the most frequently suggested PHP frameworks in AI coding assistants — a large percentage of AI-generated web projects use it. This isn't a theoretical risk: active exploitation with sophisticated threat actors is confirmed. Patch immediately.

AI as Vulnerability Hunter: The Other Side of the Coin

🔎

**Claude Opus 4.6 Finds 22 Firefox CVEs (March 2026):** In a partnership with Mozilla, Anthropic's Claude Opus 4.6 autonomously analyzed Firefox's C++ codebase and identified **22 previously unknown CVEs**. The model found memory safety vulnerabilities, use-after-free bugs, and buffer overflows that human reviewers had missed. This demonstrates a dual reality: the same AI capability that generates vulnerable code can also find vulnerabilities at scale — the question is who uses it first, defenders or attackers.

</div>

The Threat Landscape: Ransomware Meets AI

The broader cybersecurity environment compounds the risk of insecure AI-generated code. As of early 2026, there are 124 active ransomware groups — a 49% year-over-year increase. These groups are increasingly using AI to generate phishing lures, analyze codebases for vulnerabilities, and automate lateral movement. The intersection of AI-generated insecure code and AI-accelerated exploitation creates a compounding threat surface.

The AI Slopageddon: Open Source Fights Back

By early 2026, a new phenomenon emerged that open-source maintainers dubbed the "AI Slopageddon" — a flood of low-quality, AI-generated bug reports, pull requests, and security "findings" overwhelming popular projects:

cURL: Daniel Stenberg reported a deluge of AI-generated vulnerability reports so poor they were "worse than spam" — wasting maintainer time triaging hallucinated CVEs. He began publicly shaming the worst offenders and lobbied HackerOne to penalize AI-slop submissions.
Ghostty: The terminal emulator project implemented explicit policies rejecting AI-generated contributions after a wave of superficially plausible but fundamentally broken PRs.
tldraw: The collaborative whiteboard project documented a pattern of AI-generated issues that described bugs that didn't exist, in code paths that didn't exist, with reproduction steps that couldn't work.

The pattern is consistent: AI tools lower the barrier to appearing competent enough to submit contributions, but the submissions lack the understanding that makes them useful. Maintainers are now spending significant time filtering AI slop instead of building software — an ironic cost of the productivity tools meant to help them.

The $1.5 Trillion Technical Debt Problem

Analysts have warned of a potential $1.5 trillion in technical debt by 2027 from AI-generated code:

41% higher code churn — AI code gets rewritten more often
8x increase in duplicated code blocks (GitClear, 2024)
30% of AI suggestions accepted in professional environments

Forrester: 75% of tech leaders will face moderate-to-severe tech debt by 2026

The "Vibe Coding Hangover"

By late 2025, Fast Company reported senior engineers entering "development hell" maintaining vibe-coded systems:

🧬

Zombie Apps

Functional but unmaintainable

🍝

Spaghetti Code

Works but no coherent structure

🚧

Complexity Ceiling

Can't extend without breaking

😶

Debug Impossibility

Nobody can trace the code they never read

The AI Attack Acceleration Problem (2026)

The same capabilities that democratized vibe coding have democratized sophisticated cyber attacks. In 2026, AI has compressed timelines across the entire threat lifecycle:

28.3%

CVEs exploited within 24 hours of disclosure (2026) — up from ~3% in 2022

44 days

Median time-to-exploit (2025) — down from 700+ days in 2020

+75%

Malicious packages on public repos year-over-year (2026)

AI tools now enable attackers to analyze CVE disclosures and generate working exploit code within hours of the NVD advisory going public, scan public repositories for vulnerable dependency trees at scale, and produce convincing malicious packages complete with fake README files and CI badges. The 24-hour exploitation window means that for more than one in four CVEs published in 2026, the gap between "disclosure" and "active exploitation" is measured in hours, not months.

For vibe coders, this creates a specific exposure: AI coding assistants suggest high-density dependency trees (a 500-line Express API may have 80+ transitive dependencies), and the vibe coding workflow optimizes for shipping rather than security audit cadence. Running npm audit at the end of a sprint is no longer adequate when 28.3% of CVEs are already being exploited by the time your sprint ends.

⚠

Minimum security cadence for vibe coding in 2026: Run npm audit --audit-level=high or pip-audit before every production deploy. Subscribe to CVE alerts for your exact dependency stack. Treat every AI-recommended package as requiring a 30-second verification before acceptance. See Chapter 19 for the full security playbook — and CyberOS for automated CVE alerting on the vibe coding stack.

Source: The Hacker News, "2026: The Year of AI-Assisted Attacks" (May 4, 2026); EPSS v4 exploitation data (FIRST, Q1 2026); Phylum Software Supply Chain Security Report (Q1 2026).

The Prototype Pollution Wave: JavaScript's Hidden AI Vulnerability

April 2026 brought a concentrated cluster of prototype pollution vulnerabilities across the JavaScript ecosystem — a vulnerability class that AI coding tools are particularly prone to introducing and uniquely bad at detecting. Prototype pollution occurs when an attacker can inject properties into Object.prototype, the root object that every JavaScript object inherits from. Once polluted, the attacker can override behavior across the entire application — enabling authentication bypass, remote code execution, or denial of service.

Why does vibe coding amplify the risk? AI assistants trained on historical code learn to suggest patterns like obj[key] = value and Object.assign(target, userInput) without the defensive checks that distinguish safe from unsafe usage. The resulting code passes tests — it works exactly as specified — but opens a lateral attack surface that code review and automated scanners frequently miss.

⚠

Prototype Pollution in Context: In a CodeQL analysis of 10,000 AI-generated Node.js projects (April 2026), researchers found prototype pollution sinks in 38% of projects that accepted user-controlled JSON input — compared to 11% in a matched sample of human-written code. The gap is attributed to AI models treating JSON.parse(userInput) as a solved problem and rarely adding the downstream sanitization that safe usage requires.

CVE-2026-40175 • CVSS 8.8

Axios Prototype Pollution — Billions of Installs Affected

A high-severity prototype pollution vulnerability discovered in Axios, the most widely used HTTP client library in the JavaScript ecosystem with over 50 billion npm downloads. A crafted response header from an attacker-controlled server could corrupt Object.prototype in the consuming application, enabling property injection across the entire runtime. Because AI assistants (Claude Code, Cursor, Copilot) recommend Axios in virtually every Node.js and browser project, the blast radius is extraordinary: an estimated 40–60% of vibe-coded JavaScript projects use Axios for API calls. Patch: upgrade to Axios ≥1.9.1. Audit any project that processes API responses without explicit header sanitization.

#### CVE-2026-40175 and LLM-Generated Node.js Code: Why Axios Is the Canary

The Axios prototype pollution vulnerability is not simply a library bug — it is a systematic exposure created by how AI coding assistants generate Node.js code. When a developer prompts Claude Code, Cursor, or Copilot to "add an API integration" or "fetch data from this endpoint," the model's near-universal first choice is Axios: it appears in training data more than any other HTTP client, its ergonomics fit naturally into the request-response patterns LLMs generate, and it is recommended in virtually every Stack Overflow thread the models ingested. The problem is that LLM-generated Axios code consistently skips the input sanitization step between receiving an API response and merging its data into application state — the exact pathway that CVE-2026-40175 exploits.

In a CodeQL analysis of 10,000 AI-generated Node.js projects reviewed after the disclosure, researchers found that 73% of projects using Axios processed API response data with Object.assign() or spread operators without intermediate sanitization — the precise pattern that allows a malicious server response to poison Object.prototype. Human-written code in the same study showed a 31% rate for the same pattern, suggesting the gap is not incidental but structural: AI models optimize for the terse, readable code that ships fast, and defensive sanitization is verbose, "ugly," and rarely present in the training examples the models emulated. The risk is compounded in vibe-coded apps because the developer often never reads the Axios integration code — the AI generated it, it worked, and it shipped.

For any vibe-coded Node.js application that calls external APIs with Axios, the mitigation is a two-step fix: upgrade to Axios ≥1.9.1, and add JSON.parse(JSON.stringify(responseData)) or a schema-validation library like Zod between the API response and any Object.assign or spread merge. CyberOS users receive automated CVE alerts scoped to their exact dependency versions — including pinned Axios version monitoring — so the patch window shrinks from weeks to hours. See Chapter 17, Prompt 17.255 for a ready-to-use audit prompt that scans any AI-generated codebase for unguarded Axios response merges and generates the sanitization patch automatically.

CVE-2026-21710 • CVSS 7.5

Node.js Core Prototype Pollution via URL Parsing

A prototype pollution vulnerability in Node.js's built-in URL parsing module (url.parse) that affects all Node.js versions prior to the April 2026 security release. Specially crafted URLs passed to url.parse() can set arbitrary properties on Object.prototype, potentially overriding security-critical properties like isAdmin, authenticated, or role if the application checks these properties after URL parsing. This is especially dangerous in vibe-coded authentication flows, where AI-generated middleware often checks authorization properties on request objects derived from the parsed URL path. Patch: Node.js 20.19.2, 22.14.1, and 24.0.2. Avoid url.parse() — use the WHATWG URL constructor instead.

CVE-2026-39987 • CISA KEV • CVSS 9.1

Marimo AI Notebook — Arbitrary Code Execution (Active Exploitation)

Critical code execution vulnerability in Marimo, the reactive Python notebook and app builder that has become a staple tool for AI researchers and vibe coders building data dashboards and ML prototypes. The vulnerability stems from unsafe deserialization of notebook state — a pattern that AI assistants frequently introduce when generating notebook persistence or sharing features. Added to the CISA Known Exploited Vulnerabilities (KEV) catalog in April 2026 with a mandatory patch deadline. Active exploitation has been observed targeting data science teams and AI research infrastructure. Patch: upgrade to Marimo ≥0.11.4; disable public sharing of notebook state until patched. For real-time CVE tracking across the vibe coding stack, see EndOfCoding.com security briefings.

Supply Chain Injection Risks in AI-Generated package.json Dependencies

A second, underappreciated threat vector emerges at the moment an AI coding assistant writes a package.json or requirements.txt file: the dependency selection itself can be an attack surface. LLMs generate dependency lists from training data that may include packages that have since been abandoned, taken over by new owners, or never existed under the exact name suggested — a class of attacks known as dependency confusion and typosquatting injection. When a model confidently suggests axios-extensions, react-query-utils, or express-validator-pro, it is pattern-completing from training data that may not map to the legitimate npm package at that exact name in 2026. Attackers actively register names that fit these plausible-sounding patterns, publish packages with malicious install scripts, and wait for AI-generated package.json files to pull them in.

The attack surface is broader than just invented names. AI coding tools frequently suggest packages that were legitimate at training time but have since been abandoned and transferred to new npm accounts with no security review. npm's ownership transfer process does not invalidate existing installs — a package downloaded a year ago under a trusted maintainer may pull a malicious update today because the namespace was transferred to an unknown party. In a 2026 audit of 5,000 AI-generated package.json files, security researchers found that 12% contained at least one package with an ownership change in the prior 18 months and no corresponding version pin — meaning any npm install would silently fetch whatever the new owner published. For Python, the risk is compounded by PyPI's less restrictive ownership model and the model tendency to suggest packages it saw in tutorials that have since been unmaintained for two or more years.

The mitigation for vibe coders is systematic rather than reactive: use exact version pinning (=1.9.1 rather than ^1.9.1) in production lock files, run npm install --ignore-scripts for initial installs to prevent malicious postinstall hooks, verify every AI-suggested package on npmjs.com or PyPI before accepting it (30-second check: download count, last publish date, owner account age), and enable GitHub Dependabot with allow: [ecosystem: npm] filtering to flag unexpected ownership changes. CyberOS provides automated dependency provenance monitoring — flagging packages where the publisher identity changed between your last install and today — as part of its vibe coding security dashboard. The full dependency vetting checklist is in Chapter 17, Prompt 17.256, and the Chapter 19 Security Playbook section on supply chain hygiene covers lockfile auditing in depth.

💡

Audit Your Vibe-Coded Projects Now: Run npm audit (JavaScript) or pip-audit (Python) on every AI-assisted project in your stack. For prototype pollution specifically, add a CodeQL or Semgrep scan targeting prototype pollution sinks. The Chapter 19 Security Playbook includes a 30-minute security checklist covering prototype pollution detection and remediation for the most common vibe coding stacks — and Chapter 17 (Category 42) includes ready-to-use Security Audit prompts you can run against any AI-generated codebase today.

The First Agentic-Vector CVE: Cursor RCE via Git Hooks

A new attack category arrived in May 2026 — one that specifically targets the way AI coding agents interact with repositories. CVE-2026-26268 is the first documented agentic-vector CVE: a vulnerability where the attack surface is not a traditional application endpoint, but the AI agent itself.

CVE-2026-26268 • CVSS 8.1

Cursor IDE — Remote Code Execution via Malicious Git Hooks

A remote code execution vulnerability in Cursor IDE triggered by cloning a repository containing malicious .git/hooks/ scripts. When Cursor's agent automatically reads and indexes a freshly cloned project — its standard behavior for providing code context — specially crafted hook files are executed with the user's local privileges. Unlike traditional RCE vulnerabilities that require a running server, this attack surface is the developer workflow itself: clone → agent reads → hooks execute. The attack can be embedded in any GitHub repository, including open-source projects, interview take-home assignments, and contractor-submitted codebases. Patches: Cursor 0.48.3+ adds a "Safe Clone" confirmation dialog and sandboxes hook execution. Mitigation for all AI coding tools: run git config core.hooksPath /dev/null before opening any unfamiliar repo in an AI agent, or use git clone --no-local --template=/dev/null. See Chapter 17 (Prompt 17.241) for a complete pre-clone security checklist prompt.

The significance of CVE-2026-26268 extends beyond its CVSS score. It represents a structural shift in the threat model for AI-assisted development:

🚫

The Agentic Attack Surface: Traditional security assumes the developer is a human who reads files before executing them. AI coding agents violate this assumption — they read, index, and act on repository contents automatically and at machine speed. CVE-2026-26268 exploits exactly this behavior. Every AI coding tool that auto-indexes cloned projects has a version of this exposure. The mitigations (sandboxed hooks, explicit confirmation dialogs) are patches on a fundamentally new attack surface that did not exist before the agent era.

Property	CVE-2026-26268 (Agentic Vector)	Traditional IDE RCE
Trigger	Agent auto-reads cloned repo	User opens malicious file
Attack speed	Milliseconds after clone	Requires user action
Visibility	Zero — no UI interaction	File open dialog
Delivery channel	Any public GitHub repo	Phishing, drive-by
Mitigation complexity	Per-tool, behavior-dependent	Standard sandboxing

ACM Formal Warning: The First Standards Body Intervention

In May 2026, the Association for Computing Machinery (ACM) — the world's largest computing professional society — issued a formal warning on vibe coding risks. This is the first intervention by a major computing standards body, marking a shift from community debate to institutional concern.

⚠

ACM Technical Advisory (May 2026): The ACM Software Engineering Technical Council warned that AI-assisted "vibe coding" practices introduce systemic risks when used without adequate verification frameworks. The advisory specifically cited: (1) insufficient testing of AI-generated code before production deployment, (2) security vulnerability rates significantly higher than hand-written code, (3) maintainability and technical debt risks from AI-generated code that passes tests but fails under edge cases, and (4) professional liability questions when AI-generated software causes harm. The ACM stopped short of recommending against vibe coding, instead calling for "structured human oversight at critical decision points" — a position that aligns with what serious practitioners already do.

The ACM warning lands in a context where vibe coding has moved well beyond hobbyist projects. According to GitHub's March 2026 data, AI-generated code now represents 41% of all new code committed to public repositories. At that scale, the ACM's concern is not academic — it is about the systemic risk profile of a majority-AI code base in production systems.

What the ACM is recommending aligns with the practical guidance throughout this book:

Human review at architecture decision points
Automated testing that covers security, not just functional correctness
Verification workflows before agentic deployments (see Chapter 17, Prompt 17.240)
A "Software 3.0 readiness" assessment before delegating critical logic to AI agents

The Mini Shai-Hulud: First SLSA-Certified Malware (May 2026)

The supply chain attack landscape reached a new milestone in May 2026 when attackers compromised 42 @tanstack/* packages (84 versions, 12M+ weekly downloads) along with @mistralai packages — in what security researchers dubbed the Mini Shai-Hulud attack. Its significance isn't the scale, but the method: it produced the first documented npm worm generating validly-attested SLSA Build Level 3 malicious packages.

⚠

SLSA Level 3 No Longer Guarantees Integrity: The Mini Shai-Hulud attack hijacked OIDC tokens from misconfigured GitHub Actions workflows — specifically jobs that combined id-token: write permissions with PR triggers from unprotected branches. The stolen OIDC token was used to publish malicious package versions that carried valid, cryptographically signed SLSA Build Level 3 provenance attestations. Teams relying on SLSA attestation presence as a security signal are now exposed: attestation presence does not equal supply chain integrity if the signing key can be obtained via CI misconfiguration.

SUPPLY CHAIN • CRITICAL

Mini Shai-Hulud — @tanstack/* and @mistralai npm Compromise (May 11, 2026)

Attackers hijacked OIDC tokens from GitHub Actions workflows in the TanStack and Mistral monorepos by exploiting misconfigured CI jobs that combined publish permissions with pull_request triggers accessible to external contributors. The stolen tokens were used to publish 84 malicious package versions across 42 @tanstack/* packages and the @mistralai package family. The malicious versions carried valid SLSA Build Level 3 attestations — signed using the stolen OIDC token during a legitimate Sigstore signing ceremony. Downstream projects that check attestation presence (the standard SLSA verification step) would see these packages as trusted. Why vibe coders are especially exposed: AI coding assistants recommend @tanstack/react-query, @tanstack/router, and @mistralai/mistral-client in virtually every modern React and AI integration project. Any vibe-coded project initialized after May 11 with these packages at latest versions was potentially affected. Immediate actions: (1) Pin @tanstack/* to the last known-good version before May 11 in your lock file; (2) Audit attestation signer identity — not just presence — using gh attestation verify with explicit expected signer; (3) Enable npm's --dry-run and Sigstore transparency log monitoring for all new installs; (4) Move to a private registry proxy with allow-listing for critical packages. Full attestation integrity verification checklist: see Chapter 17, Prompt 17.252.

The Shai-Hulud attack has a second, under-reported dimension: it was an AI ecosystem attack. Both TanStack (the most common React data layer in AI-assisted apps) and Mistral (the API client for a major AI model provider) were targeted simultaneously — not by coincidence. The vibe coding community's standardized tool choices create a concentrated attack surface. When every Claude Code and Cursor project uses the same five packages, compromising those packages is a force multiplier attack on the entire developer ecosystem.

380,000 Corporate Assets Exposed by Vibe-Coding Tool Defaults

Security researchers in May 2026 disclosed a dataset of approximately 380,000 publicly accessible corporate assets — including healthcare records, financial data, and live API credentials — originating from projects built on AI coding platforms. The root cause: insecure default configurations in vibe-coded apps where the AI tools prioritized working quickly over secure-by-default settings.

🚫

The Vibe-Coding Default Configuration Crisis: The 380K exposure is not attributable to any single tool or any single vulnerability. It represents a systemic pattern: AI coding assistants scaffold applications with configurations that work (for development and demo purposes) but are not production-safe. Supabase Row Level Security disabled by default for speed. S3 buckets created public for easy sharing. NEXT_PUBLIC_ env vars used for API keys that should never reach the client. Auth middleware not applied to all routes. The AI tools that generate these patterns were optimizing for the stated goal — build a working app fast — and the security defaults required for production were out of scope for the prompt.

The exposure pattern has five recurring root causes observed across the 380K assets:

Root Cause	Frequency	Example
Supabase RLS disabled	34% of cases	Tables created for MVP with `ENABLE ROW LEVEL SECURITY` never added
Public S3/R2/GCS buckets	28%	AI scaffolds storage with public access for file upload demos
Client-side secrets	21%	`NEXT_PUBLIC_` prefix on API keys, database URLs, service tokens
Missing auth middleware	12%	Dashboard routes not covered by Next.js middleware matcher
Demo data in production	5%	Seeded test records with real-format PII left in production DB

The pattern is predictable: an AI tool builds an MVP quickly, the developer ships it (perhaps even using the same AI tool to deploy), and the dev-safe defaults that were fine on localhost become production exposures at scale. See Chapter 17, Prompt 17.253 for a comprehensive audit checklist to detect all five patterns in your own vibe-coded applications before they reach the 380K statistic.

💡

Pre-Deploy Security Checklist (30 minutes): Before every production deployment of a vibe-coded application, run through the Chapter 19 Security Playbook checklist. The five patterns above are detectable in under 30 minutes with Claude Code — search for RLS policies, bucket permissions, NEXT_PUBLIC_ secrets, middleware coverage, and demo data. The cost of finding these before deploy is 30 minutes. The cost of finding them in a 380K-scale breach report is significantly higher.

The regulatory signal is worth noting. ACM warnings historically precede formal standards and, eventually, regulatory requirements. The EU AI Act's high-risk category definitions are already being interpreted to include AI-assisted code in critical infrastructure. Teams that establish rigorous review practices now will be ahead of the compliance curve.

← Previous Next: The Great Debate →

11. The Great Debate

Updated June 11, 2026

The software community is deeply divided. Understanding the strongest arguments on each side helps you form a nuanced view — and by 2026, both sides have something they didn't have in 2025: evidence. A year of adoption data, security studies, budget reports, and production incidents means this debate is no longer philosophical. Each tab below presents the strongest version of its case, with the receipts.

#### "It's the natural evolution of abstraction."

Programming languages have always moved toward higher abstraction. Assembly to C to Python. Each level lets developers focus on intent rather than implementation. Natural language is simply the next layer.

#### "It democratizes creation."

Millions of people have software ideas but lack years of training. Vibe coding lets a nurse build a patient tracking app, a teacher build a classroom tool, a small business owner build inventory management. The expansion of who can create software is historically significant.

#### "The speed advantage is transformative."

A prototype in hours instead of weeks. An MVP in days instead of months. The 25% of YC companies with 95% AI code didn't choose vibe coding for ideology — they chose it because they needed to move fast.

#### "Traditional code isn't as reliable as we pretend."

Human-written code has bugs, security vulnerabilities, and technical debt too. AI-generated code may have different failure modes, but the idea that human code is inherently reliable is a myth.

#### "The professionals have voted." *(2026)*

The strongest new argument is adoption data. Stack Overflow 2026: **83% of developers use AI tools daily**, up from 44% in 2024. JetBrains 2026: among developers with **10+ years of experience, 46% choose an agentic CLI as their daily driver**. If vibe coding were a junior shortcut that experts reject, the most experienced cohort wouldn't be its heaviest adopters. Meanwhile enterprise agents earned production trust the hard way: Devin 2.3 reached a **78% autonomous PR merge rate** at Goldman Sachs-tier clients (Chapter 9 has the full numbers).

#### "Code you don't understand is code you can't maintain."

Software spending is ~60% maintenance. If nobody understands the codebase, maintenance is impossible. You're not saving time — you're borrowing it from the future at a ruinous interest rate. And it's no longer hypothetical: Stack Overflow 2026 found **54% of companies can't tell which parts of their codebase AI wrote**.

#### "Security requires understanding, not just testing."

You can test whether a login form works. You can't easily test whether passwords are properly hashed, session tokens are cryptographically secure, or APIs have rate limiting — unless you read the code. The 2026 numbers are stark: Veracode found **45% of AI-generated code samples carry at least one OWASP Top 10 vulnerability** — a rate that did *not* improve across test cycles — and AI-assisted teams commit 3-4× faster while introducing findings **10× faster**. Chapter 10 and Chapter 19 catalog the incidents.

#### "It creates learned helplessness."

Developers who rely entirely on vibe coding lose fundamental skills. When the AI makes a mistake in a novel way, they have no fallback. Fragile teams build fragile systems. The 2025 "vibe coding hangover" was this argument playing out in public.

#### "The economics don't work at scale." *(upgraded, 2026)*

The $1.5 trillion tech debt projection was extrapolation; the 2026 budget reports are data. **Uber burned its entire annual AI-tools budget in roughly four months** at $500–$2,000 per engineer per month. **Microsoft pulled thousands of engineers off their preferred agent** over run-rate. Add the supplier side — frontier labs selling inference below cost (OpenAI's S-1 showed near-zero gross margins) — and the skeptic's case writes itself: the productivity math is being computed at subsidized prices, on borrowed maintenance time.

#### Where the debate actually moved in 2026

Three new fronts opened that neither side predicted in 2025.

**1. The open-source erosion question.** The January 2026 *"Vibe Coding Kills Open Source"* paper crystallized it: when agents answer every question, nobody visits the docs, files the issue, or sponsors the maintainer. Tailwind CSS documentation traffic is **down 40% from 2023**. The uncomfortable structure of the problem: every individual choice to ask the agent is rational, and the aggregate effect defunds the commons the agents were trained on — and still depend on. No major tool vendor has shipped a credible answer yet.

**2. The seniority pipeline question.** If agents do the junior work, where does the next generation of seniors come from? The JetBrains data sharpened the paradox: agentic tools reward exactly the evaluation skills that only years of pre-AI code reading built. The industry is consuming a stockpile of judgment it is no longer producing. Proposed answers — AI-native apprenticeships, review-first curricula — exist mostly in blog posts, not in hiring data.

**3. The accountability question.** The May 2026 SymJack/TrustFall disclosures (Chapter 19) and the vendor response — multiple major vendors declining to patch, calling the behavior "working as designed" — opened a governance front: when an agent with your credentials does damage, the responsibility chain (user → tool vendor → model lab) is genuinely unsettled. Enterprises noticed; "no formal AI policy" still describes **47% of companies**, and that number is now read as a liability, not a curiosity.

#### Context Is Everything

The most reasonable position — and the one supported by data — is that vibe coding is a powerful tool with a specific and appropriate scope, and that the interesting argument moved. In 2025 the debate was *whether* to adopt. At 83% daily adoption, that question answered itself. The 2026 debate is about **governance**: which tasks get how much autonomy, who reviews what, who pays, and who's accountable. (Chapter 12 turns that into a working framework.)

<div class="callout success">
  <div class="callout-icon">&#9989;</div>
  <div class="callout-content">
    **It excels for:** prototyping, validation, personal tools, learning, hackathons, and — with review gates — mainstream professional development, where it now simply *is* the workflow.

  </div>
</div>
<div class="callout danger">
  <div class="callout-icon">&#10060;</div>
  <div class="callout-content">
    **It fails for:** unreviewed production systems, security-sensitive paths, regulated industries, and any organization that adopts the speed without the discipline — the pattern behind every incident in Chapter 10.

  </div>
</div>
**The winning model in 2026:** vibe code the 80%, engineer the 20%, and run both under explicit governance — autonomy levels, review gates, budgets. The companies that did this captured the speed without the hangover. The companies that didn't became the case studies.

The critics are not wrong about the risks — on security and economics, the 2026 evidence strengthened their hand. But they remain wrong about the trajectory. Every objection to vibe coding was once made about high-level languages, about frameworks, about cloud computing. The abstraction always wins. The question is never *whether* but *how* — and "how" is precisely what the new fronts (the commons, the pipeline, accountability) are forcing the industry to answer.

The debate evolves monthly — EndOfCoding covers each new front as it opens, and Chapter 21's intelligence brief tracks the running score. For structured practice forming your own position, the critical-thinking module at Vibe Coding Academy stages both sides against real incident case studies.

← Previous Next: When to Vibe →

12. When to Vibe (and When Not To)

Updated June 10, 2026

In early 2025, the question was "should I let AI write my code?" By mid-2026, that question is settled — the Stack Overflow 2026 survey found 83% of professional developers use AI coding tools daily. The real question now has two parts:

How much autonomy do you give the AI for this specific task?
How much human review does the output get before it touches anything that matters?

Get those two dials right and vibe coding is the biggest productivity unlock of your career. Get them wrong and you become a statistic — one of the 170 Lovable apps exposing personal data, or one of the teams Veracode measured shipping OWASP Top 10 vulnerabilities in 45% of AI-generated samples.

This chapter gives you the decision framework. Not vibes about vibes — an actual rubric.

The Five Questions That Decide Everything

Before you open the agent and "give in to the vibes," score the task on five factors:

💥

1. Blast Radius

If this code is wrong, who gets hurt? Just you? Your users? Their money? Their medical records?

↩️

2. Reversibility

Can you roll back in one command, or does a bug mean corrupted data, sent emails, charged cards?

🔒

3. Data Sensitivity

Does the code touch PII, credentials, payments, or health data? Regulation follows the data.

⏳

4. Longevity

Throwaway prototype or five-year codebase? Code you'll maintain deserves architecture you understand.

👥

5. Team Dependency

Will other people build on this? The Stack Overflow survey found 54% of companies can't tell which parts of their codebase AI wrote.

Low scores across the board? Vibe freely. High scores on any single factor? That factor sets your review burden. High scores on three or more? You're not vibe coding anymore — you're engineering with AI assistance, and the difference matters.

Try It: Score Your Task

Answer for the task in front of you right now. The tool applies the rule above and tells you which zone you're in.

1. Blast radius — if this code is wrong, who gets hurt?

2. Reversibility — how bad is a mistake?

3. Data sensitivity — what does it touch?

4. Longevity — how long will this live?

5. Team dependency — who builds on this?

🟢 Green Light: Vibe Code Away

Low blast radius, fully reversible, no sensitive data. Run your agent at high autonomy (Level 4-5 from Chapter 4), review lightly, ship fast.

Prototypes and MVPs — Validate ideas before investing in production engineering. This is the original, canonical use case — the one Karpathy's tweet described.
Internal tools — Dashboards, data scripts, one-off analysis. If it breaks, you fix it tomorrow.
Personal projects — Only you use it, only you depend on it.
Learning — Trying new frameworks, languages, or patterns. Reading AI-generated idiomatic code is one of the fastest ways to learn a new stack.
Hackathons — Speed is everything, longevity is nothing.
UI prototyping — Design exploration and layout testing. The 2026 generation of agents is genuinely excellent at this.
Automation scripts — Repetitive tasks that eat your time. The classic "I spent 3 hours automating a 10-minute task" now takes 10 minutes to automate.
Test generation — AI writing tests for human-reviewed code is one of the highest-leverage, lowest-risk uses in the entire stack.

🟠 Yellow Light: Proceed with Caution

Medium scores. Vibe the first draft, then drop the autonomy level and review like the code came from a fast, talented, occasionally overconfident contractor — because functionally, it did.

Customer-facing apps — Vibe the prototype, then review and harden before real users arrive. The Lovable incident (170 of 1,645 audited apps exposing personal data) happened because builders skipped the second step.
Small SaaS — Viable for launch; plan a hardening pass before you market it. Chapter 14 gives you the phased workflow.
API integrations — Fast to build, but auth flows and token handling need human eyes. This is exactly where Tenzai's December 2025 study found agents quietly cutting corners — 69 vulnerabilities across 15 applications built with 5 major tools.
Mobile apps — UI can be vibe coded; data storage, permissions, and security need attention before store submission.
Team projects — Works if at least one person genuinely understands the architecture. The failure mode is three people each assuming someone else understands it.
Database schema changes — Agents write competent migrations, but schema mistakes are semi-irreversible (factor 2). Review every migration before it runs against data you care about.
Anything consuming AI-suggested dependencies — Supply chain attacks now target AI agents specifically (see the PromptMink campaign in Chapter 19, where npm packages were engineered to be recommended by coding agents). Verify packages exist, are maintained, and are the package you think they are.

🔴 Red Light: Don't Vibe Code

High blast radius, irreversible, sensitive data, regulated. AI can still assist — generating boilerplate, drafting tests, explaining unfamiliar code — but at Level 1-2 autonomy with line-by-line human review. "Forget the code exists" is disqualified here.

Financial systems — Payments, accounting, trading. Money moves are irreversible and attract both regulators and attackers.
Healthcare — Patient data, clinical decisions, HIPAA. Maximum blast radius, maximum regulation.
Auth & authz — Login systems, permissions, tokens. The single most common category in every AI-generated-vulnerability study since 2025. Veracode's May 2026 numbers: 86% of tested samples failed XSS defense, 88% were vulnerable to log injection.
Infrastructure — Server config, network security, deployment pipelines. A wrong firewall rule is invisible until it's an incident.
Regulated industries — SOX, PCI-DSS, GDPR compliance. The auditor will not accept "the agent wrote it" as a control.
Distributed systems — Microservices, message queues, cache invalidation. Agents reason poorly about emergent behavior across service boundaries — the failure isn't in any one file they can see.
Cryptography — Encryption, key management, certificates. Never. Use audited libraries, have humans wire them up.
Your agent's own permissions and config — The May 2026 SymJack and TrustFall disclosures (Chapter 19) showed attackers weaponizing the approval prompts of seven major coding agents. Config that governs what your AI can execute is security-critical code. Treat it accordingly.

One More Factor Nobody Mentioned in 2025: Cost

The 2026 twist is that when to vibe is now also a budget question. Uber's engineers burned the company's entire 2026 AI tools budget in roughly four months at $500–$2,000 per engineer per month; Microsoft canceled Claude Code licenses across an entire division over token costs (full story in Chapter 21). Autonomous agents at full throttle are spectacular — and metered.

The practical rule: match the autonomy level to the task value. Spinning up a 1,000-subagent dynamic workflow to rename a variable is how budgets die. A focused Level 3 session with a tight prompt often ships the same result at 2% of the token spend. Chapter 13 covers token-efficient prompting patterns; Chapter 18's comparison matrix tracks per-token pricing across every major tool.

The Pre-Flight Checklist

Thirty seconds before you start any session:

Scored the five factors? Blast radius, reversibility, data sensitivity, longevity, team dependency.
Picked the autonomy level deliberately? (Chapter 4's five levels — don't default to maximum.)
Is sensitive data in scope? If yes, the relevant sections of Chapter 19's security playbook are mandatory, not optional.
Do you have a rollback? Git committed, database backed up, deploy reversible.
Who reviews the output, and when? "Nobody, never" is an acceptable answer only in the green zone.

💡

**The 80/20 Rule:** For most applications, 80% of the code is boilerplate, UI, and standard patterns that AI handles well. The remaining 20% — authentication, business logic, data integrity, security — deserves human attention. **Vibe code the 80%. Engineer the 20%.** The five-factor score tells you which side of the line any given task sits on.

Want to drill the judgment, not just read about it? The hands-on track at Vibe Coding Academy walks through real green/yellow/red scenarios, and EndOfCoding publishes post-mortems of vibe coding failures as they happen.

Use Case	Daily Frequency	Estimated Tokens/Use	Monthly Tokens
Autocomplete (accepted)	[N/day]	~200	[calc]
Chat Q&A (short)	[N/day]	~2,000	[calc]
Chat Q&A (codebase context)	[N/day]	~15,000	[calc]
Workspace/Agent task (small)	[N/week]	~80,000	[calc]
Workspace/Agent task (large)	[N/week]	~300,000	[calc]
Extension/automated workflow	[N/day]	~50,000	[calc]

Workflow	Frequency	Avg tool calls/run	Avg tokens/run	Critical?
[Workflow 1]	[daily/weekly/per-PR]	[number]	[estimate]	[yes/no]
[Workflow 2]
[Workflow 3]

Action	Autonomous	Requires Approval	Never Allowed
Read data	✓
Send notifications	✓
Write/modify files		✓
Delete data			✗
[your action]

Task Type	Volume/day	Quality Requirement	Current Model	Latency Req
[Task 1]	[N]	[High/Med/Low]	[Model]	[< Ns]
[Task 2]	[N]	[High/Med/Low]	[Model]	[< Ns]
...	...	...	...	...

Model	Strengths	Weaknesses	Cost/1M tokens	Latency
Claude Opus 4.6	Complex reasoning, long context, coding	Cost, latency	$[X in / $Y out]	[Ns]
Claude Sonnet 4.6	Balanced quality/speed, coding	Less reasoning depth	$[X in / $Y out]	[Ns]
Claude Haiku 4.5	Speed, cost, simple tasks	Complex reasoning	$[X in / $Y out]	[Ns]
Kimi K2.6 (open-source)	Coding benchmarks, lower cost	Self-hosted infra required	$[X]	[Ns]
[Other model]	[Strengths]	[Weaknesses]	$[cost]	[latency]

Task Type	Current cost/day	Projected cost/day	Quality change
[Task 1]	$[X]	$[Y]	[Same/Better/Slightly worse]
...	...	...	...
Total	$[X]/day	$[Y]/day

Scenario	Gemini 2.5 Pro (2M tokens)	Claude Opus 4.6 (200K tokens)	Claude Sonnet 4.6 (200K tokens)
Fits in single context?	[Yes/No]	[Yes/No]	[Yes/No]
Cost per session	$[X]	$[X]	$[X]
Latency (first token)	[Ns]	[Ns]	[Ns]
Quality for this task	[rating]	[rating]	[rating]

Tool	Role in workflow	Daily users	% of team	Use cases
[Claude Code]	[primary dev agent]	[N]	[%]	[code gen, review, debug]
[GitHub Copilot]	[inline completion]	[N]	[%]	[autocomplete]
[Other]	[...]	[...]	[...]	[...]

Metric	Survey Benchmark	Your Team	Gap	Priority
Daily AI tool usage	83%	[%]	[+/-]	[H/M/L]
Formal AI policy exists	53%	[yes/no]	—	[H/M/L]
AI-generated code > 50%	38%	[%]	[+/-]	[H/M/L]
Prompt engineering training	33% trained	[%]	[+/-]	[H/M/L]

Pattern	Status	Issues Found	Fixed
Supabase RLS	PASS/FAIL/N/A
Firebase Rules	PASS/FAIL/N/A
Client Bundle Keys	PASS/FAIL/N/A
CI/CD Secrets	PASS/FAIL/N/A
Admin Endpoints	PASS/FAIL/N/A

Tool	Price	Best For	Key Feature	Security Concern
Cursor	$20/mo + Composer 2.5 usage	Full-stack dev, large codebases, agent loops	Composer 2.5 (79.8% SWE-Bench Multilingual at $0.50/M input + $2.50/M output, ~10× cheaper than Opus 4.7); Cursor 3.3 PR Review + Build in Parallel; Cursor in Jira and MS Teams (May 2026)	CVE-2026-26268 git-hook RCE (CVSS 9.9, patched April 2026); CurXecute (CVE-2025-54135)
Windsurf (Cognition)	$20/mo Pro / $200/mo Max (raised May 2026)	Long-context projects, Devin-bundled workflows	Windsurf 2.0 Agent Command Center + Spaces; Devin Cloud and Devin Terminal CLI bundled into paid tiers	Memory poisoning via prompt injection
VS Code + Copilot	$10/mo Pro ($15 included usage from June 1) / $39 Pro+ ($70 included)	AI without switching editors; usage-based billing from June 1, 2026	Agent Mode GA; CLI v1.0.48 shows per-token model prices in picker; unified sessions view; global custom agents at ~/.copilot/agents/	Lower autonomy = lower blast radius; AI Credits meter Chat/CLI/cloud agents (completions stay unlimited, free)

Tool	Price	Best For	Autonomy	Differentiator
Claude Code	Usage-based + Pro/Max plans (5-hour limits doubled May 6, 2026; peak-hour throttling removed on Pro/Max)	Enterprise codebases	High (subagent teams, Remote Agents up to 72h, Dynamic Workflows up to 1,000 concurrent subagents)	$30B+ ARR, 88.6% SWE-bench Verified (Opus 4.8, May 28, 2026), Claude Code 3.0 Remote Agents + Persistent Memory + Skills Registry, 1.2M active users
Devin (Cognition)	$500/mo standalone; bundled into Windsurf Pro/Max/Teams	Async tasks, migrations	Very High	$445M ARR (May 12 disclosure), 78% autonomous PR merge rate at SWE-1.7, Cognition closed $25B SoftBank Series D May 6, 2026
Codex CLI	Usage-based (GPT-5.5)	Open-source, Rust/systems	Medium	Open-source, sandboxed execution; GPT-5.5 at 82.7% Terminal-Bench 2.0 (SOTA)
Jules (Google)	Free 50 tasks/mo — $125/mo	Async bugfixes, PR gen	High	GA post-I/O 2026, Gemini 3 Pro-powered, GitHub integration with Google Cloud VM sandboxing
Gemini CLI	Free tier + paid	Open-source terminal work, voice-driven sessions	Medium	v0.41.0 (May 2026): real-time voice mode (cloud + local), enforced workspace trust, .env loading secured in headless mode — direct response to April CVSS 10.0 RCE (GHSA-wpqr-6v78-jr5g)
Amazon Q	Free-$19/mo	AWS-heavy projects	Medium	Deep AWS integration

Model	SWE-bench Verified	SWE-bench Pro (contamination-resistant)	Notes
Claude Mythos Preview (Anthropic)	93.9% (leader)	Restricted	Not publicly available; powers Project Glasswing security research
Claude Opus 4.8 (Anthropic)	88.6%	Top public tier	Released May 28, 2026; powers Claude Code with Dynamic Workflows (up to 1,000 concurrent subagents)
GPT-5.5 (OpenAI)	~88.7%	64.3% (prior gen)	Default ChatGPT model since May 5; 82.7% Terminal-Bench 2.0 (SOTA agentic terminal work)
Claude Opus 4.7 (Anthropic)	87.6%	64.3%	Strongest multi-file code reasoning of the prior generation
DeepSeek V4-Pro	80.6%	55.4%	The 25-point gap between Verified and contamination-resistant Pro is the clearest illustration of benchmark contamination — discount headline open-weight scores accordingly
Gemini 3.1 Pro (Google)	Competitive	—	Leads multimodal + long-context: 94.3% GPQA Diamond, 1M-token context window

Tool	Price	Best For	Output Quality	Risk Level
Bolt.new	Free-$20/mo	Rapid full-stack prototypes	Good	Medium
v0	Free-$20/mo	React/Next.js UI components	Excellent	Low (UI only)
Lovable	Free-$25/mo	Non-dev app creation	Good	High — April BOLA flaw exposed all pre-Nov-2025 projects; three documented security incidents to date; treat platform-side tenant isolation as untrusted
Replit Agent	Free-$25/mo	Complete apps from description	Good	Medium — $400M Series D, $9B valuation (Mar 2026). 75% of Replit AI users write zero code.

Model/Tool	Parameters	Cost vs Claude Sonnet	SWE-bench / Rank	Best For
MiMo-V2-Pro (Xiaomi)	1 Trillion (Hunter Alpha)	-67% cheaper than Claude Sonnet 4.6	3rd globally on agent benchmarks (Mar 2026)	Cost-sensitive production workloads, batch jobs
Gemini CLI (Google)	N/A (cloud)	Free tier available	Competitive, Flash variant	Open-source terminal work, Google ecosystem
Codex CLI (OpenAI)	N/A (cloud)	Usage-based (GPT-5.4)	77.3% Terminal-Bench	Sandboxed execution, CI/CD integration
obra/superpowers	N/A (framework)	Free + model API costs	92,100 GitHub stars (Mar 2026)	Custom agent framework, multi-step workflows
OpenClaw	N/A (framework)	Free + model API costs	210,000 GitHub stars (Mar 2026)	Open-source agent orchestration, self-hosted

Package / Tool	Date	Impact	Attribution
axios 1.14.1, 0.30.4	March 31	WAVESHAPER.V2 RAT; ~100M weekly downloads	UNC1069 (North Korea/DPRK)
LiteLLM 1.82.7, 1.82.8	March 24	Multi-stage credential stealer (SSH keys, cloud tokens, K8s secrets, .env files)	Unknown
Langflow ≤ 1.8.2 (CVE-2026-33017)	March 17	Unauthenticated RCE via public endpoint; exploited within 20h; CISA KEV	Active threat actors
Trivy Docker Hub images (CVE-2026-33634)	March 19	Malicious code in Aqua Security's Trivy scanner images	TeamPCP

Metric	Value
Total submissions received	312
Featured projects (all-time)	43
Countries represented	27
Youngest builder	14 (high school student, built a study flashcard app)
Oldest builder	67 (retired accountant, built a family recipe archive)

Background	Percentage
Professional developer	41%
Student / recent graduate	19%
Non-technical professional	17%
Designer / creative	11%
Founder / entrepreneur	8%
Other (retired, career switcher, hobbyist)	4%

Category	Most Popular Choice
Framework	Next.js (58%)
Styling	Tailwind CSS (71%)
Database	Supabase (52%)
Hosting	Vercel (64%)
Payments	Stripe (89% of projects with payments)
Auth	Supabase Auth (44%)

MCP Server	Vulnerability	Impact	Status
Apache Doris MCP	SQL injection via MCP tool args	Unintended SQL execution against a connected Doris cluster	Patched
Alibaba RDS MCP	Sensitive metadata exfiltration	An agent can be coerced into exposing connection credentials and database metadata it should not surface	Patched
Apache Pinot MCP	Instance takeover (internet-exposed)	A crafted MCP tool call can take over a Pinot instance reachable from the internet	Unpatched — vendor declined

Month	Planned Videos	Series
March 2026	#1 60-Second SaaS, #6 Game Builder	Prompt to Product, The Prompt That
April 2026	#12 IDE Showdown, #7 Broke Everything	Tool Face-Off, The Prompt That
May 2026	#2 Portfolio Speedrun, #13 Builder Battle	Prompt to Product, Tool Face-Off
June 2026	#3 The $0 Startup, #8 Got Me Fired	Prompt to Product, The Prompt That
July 2026	#14 Agent Arena, #9 Replaced My Intern	Tool Face-Off, The Prompt That
August 2026	#4 Clone Wars, #10 Mom Could Use	Prompt to Product, The Prompt That
September 2026	#15 Speed vs Quality, #11 Fooled Senior Dev	Tool Face-Off, The Prompt That
October 2026	#5 Debug Olympics, New TBD	Prompt to Product, TBD

#	Title	Series	Tool(s)	Duration	Status
1	I built a $9/month SaaS in 60 seconds	Prompt to Product	Bolt.new	60-90s	Pre-production
2	Your portfolio shouldn't take longer than your morning coffee	Prompt to Product	v0 + Vercel	60-90s	Pre-production
3	This app makes money. I didn't write a single line.	Prompt to Product	Lovable	60-90s	Pre-production
4	I showed AI a screenshot of Notion. Here's what happened.	Prompt to Product	Cursor	60-90s	Pre-production
5	Can AI fix a bug faster than Stack Overflow?	Prompt to Product	Claude Code	60-90s	Pre-production
6	The Prompt That Built a Game	The Prompt That	Claude Code	90-120s	Pre-production
7	The Prompt That Broke Everything	The Prompt That	Bolt.new	90-120s	Pre-production
8	The Prompt That Got Me Fired (Hypothetically)	The Prompt That	Claude Code	90-120s	Pre-production
9	The Prompt That Replaced My Intern	The Prompt That	Cursor + Claude Code	90-120s	Pre-production
10	The Prompt That Even My Mom Could Use	The Prompt That	Lovable	90-120s	Pre-production
11	The Prompt That Fooled the Senior Dev	The Prompt That	Claude Code	90-120s	Pre-production
12	IDE Showdown: Cursor vs Claude Code vs Codex CLI	Tool Face-Off	Cursor, Claude Code, Codex CLI	90-120s	Pre-production
13	Builder Battle: Bolt.new vs Lovable vs Replit Agent	Tool Face-Off	Bolt.new, Lovable, Replit Agent	90-120s	Pre-production
14	Agent Arena: Devin vs Jules vs Claude Code	Tool Face-Off	Devin, Jules, Claude Code	90-120s	Pre-production
15	Speed vs Quality: Bolt.new vs Claude Code	Tool Face-Off	Bolt.new, Claude Code	90-120s	Pre-production

Time Range	Percentage
Under 4 hours	12%
4-12 hours	27%
12-24 hours (1-2 days)	31%
1-2 weeks	22%
Over 2 weeks	8%

Category	Count	Percentage
SaaS / web application	72	29%
Internal / business tool	48	19%
Portfolio / personal site	37	15%
E-commerce	29	12%
Game	21	9%
Mobile app	18	7%
Chrome extension	12	5%
CLI tool / developer utility	10	4%

Metric	Value
Projects still actively maintained (after 3+ months)	68%
Projects generating revenue	31%
Average MRR for revenue-generating projects	$840
Highest reported MRR	$12,400
Builders who reported getting hired because of their project	14
Builders who transitioned to full-time on their project	9

Vibe Coding

Choose Your Plan

Frequently Asked Questions

Get a free chapter + weekly vibe coding insights

01. The Moment Everything Changed

The Timeline

02. What Vibe Coding Actually Is

The Three Core Loops

What Vibe Coding Is NOT

The Definition Split: What "Vibe Coding" Means in 2026

03. The Philosophy: Trusting the Machine

The End of Code as Sacred Text

The Four Pillars

The Abstraction Argument

The Philosophy, Revised: What 16 Months Did to the Pillars

04. The Spectrum: Five Levels of AI-Assisted Development

The 2026 Refinement: It's Per-Task, Not Per-Developer

05. The Tools: A Complete Landscape (2025–2026)

AI-Native IDEs

Autonomous Coding Agents

Browser-Based Builders

The Infrastructure Layer: MCP

The Model Race (March 2026 Update)

April 27, 2026 Update — The Flat-Rate Era Is Ending

The Stack That Won

What This Means in Practice

06. The Agent Revolution

From Copilot to Colleague

What Agents Can Do Today

The Benchmark Picture (May 2026)

New Agent Orchestration Frameworks (April 2026)

What Agents Still Struggle With

The Parallel Execution Advantage

Karpathy's Software 3.0 Framework (May 2026)

07. Vibe Coding in Practice: Real Workflows

08. Real-World Case Studies

09. The Numbers: Adoption and Impact

Adoption

AI Tool Daily Active Use Share — Stack Overflow 2026 (May 19, 2026)

JetBrains Developer Ecosystem Survey 2026 (May 23, 2026)

AI Market Share (May 2026 — Historic Flip)

The Agentic Model Race (April–June 2026)

Revenue & Growth

Valuations (2026)

Enterprise AI Momentum (May 2026)

Productivity

Developer Sentiment (April 2026)

Cultural Impact

10. The Dark Side: Security, Debt, and Failure

The Tenzai Security Study

The Acceleration: 35 CVEs in One Month

Documented Security Incidents

AI as Vulnerability Hunter: The Other Side of the Coin

The Threat Landscape: Ransomware Meets AI

The AI Slopageddon: Open Source Fights Back

The $1.5 Trillion Technical Debt Problem

The "Vibe Coding Hangover"

The AI Attack Acceleration Problem (2026)

The Prototype Pollution Wave: JavaScript's Hidden AI Vulnerability

Supply Chain Injection Risks in AI-Generated package.json Dependencies

The First Agentic-Vector CVE: Cursor RCE via Git Hooks

ACM Formal Warning: The First Standards Body Intervention

The Mini Shai-Hulud: First SLSA-Certified Malware (May 2026)

380,000 Corporate Assets Exposed by Vibe-Coding Tool Defaults

11. The Great Debate

12. When to Vibe (and When Not To)

The Five Questions That Decide Everything

Try It: Score Your Task

🟢 Green Light: Vibe Code Away

🟠 Yellow Light: Proceed with Caution

🔴 Red Light: Don't Vibe Code

One More Factor Nobody Mentioned in 2025: Cost

The Pre-Flight Checklist

13. Mastering the Craft: Advanced Techniques

The Art of the Initial Prompt

Weak vs. Strong Prompts

Key Patterns

Token-Efficient Prompting

14. Building a Sustainable Workflow

Part 1: The Project Lifecycle