Vibe Coding
The Complete Guide to AI-Native Software Development
22 chapters. 200+ prompts. Updated monthly. The only vibe coding resource that evolves as fast as the field.
Choose Your Plan
The vibe coding landscape changes every week. Your subscription keeps you current.
- ✓ First 3 chapters
- ✓ 10 sample prompts
- ✓ 2 video tutorials
- ✓ Interactive quiz
- ✓ All 22 chapters
- ✓ 200+ prompt library
- ✓ Video tutorials
- ✓ Monthly updates
- ✓ Tool comparison matrix
- ✓ Security playbook
- ✓ Everything in Monthly
- ✓ Bonus resources
- ✓ Early access to new content
- ✓ Priority support
30-day money-back guarantee. Cancel anytime. Payments handled securely by Lemon Squeezy (Merchant of Record). All prices in USD.
Frequently Asked Questions
Everything you need to know before you start.
Get a free chapter + weekly vibe coding insights
Join the mailing list for a bonus chapter on AI tool selection, plus weekly curated updates on the vibe coding landscape.
✓ You're in! Check your inbox for the bonus chapter.
No spam. Unsubscribe anytime. Part of the EndOfCoding ecosystem.
01. The Moment Everything Changed
On February 2, 2025, Andrej Karpathy β former OpenAI co-founder, former Tesla AI director, and one of the most respected voices in machine learning β posted what would become one of the most consequential tweets in software development history:
"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works." β Andrej Karpathy, February 2, 2025
Within weeks, the term had gone viral. Within a month, Merriam-Webster added "vibe coding" as a slang and trending term. By December 2025, Collins English Dictionary named it their Word of the Year.
But vibe coding didn't just enter the dictionary. It entered the economy. It entered boardrooms. It entered the workflows of millions of developers. And it sparked one of the fiercest debates the software industry has seen in decades.
The Timeline
02. What Vibe Coding Actually Is
Strip away the hype, and vibe coding is a specific practice with specific characteristics.
Vibe coding is an AI-assisted software development approach where a developer describes what they want in natural language, an AI model generates the code, and the developer evaluates the result through execution rather than code review. The developer does not read, edit, or attempt to understand the generated code. They test whether it works, and if it doesn't, they feed the error back to the AI.
</div>
Karpathy described his own workflow precisely:
"I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. If it doesn't, I just revert to the last working state and re-prompt with more context."
The Three Core Loops
Vibe coding operates on three nested feedback loops:
**2.** Accept the generated code without reading it
**3.** Run it
**4.** Does it work? Ship it. Doesn't work? Move to Loop 2.
This is the happy path. For simple features, you may never leave this loop.
</div>
**2.** Accept the fix without reading it
**3.** Run it again
**4.** Repeat until resolved or move to Loop 3.
Most errors resolve within 1-3 iterations of this loop. The AI sees the error, understands the context, and fixes it.
</div>
**2.** Describe the desired outcome differently, with more context
**3.** Return to Loop 1
This is the escape hatch. If the AI gets stuck in a loop of broken fixes, go back to a clean state and try a different approach. This is why checkpoints matter β always have a rollback point.
</div>
What Vibe Coding Is NOT
Not using GitHub Copilot for autocomplete β that's AI-augmented coding (Level 1)
Not asking ChatGPT to explain code β that's using AI as a learning tool
Not reviewing AI-generated code before accepting β that's AI-collaborative coding (Level 2)
Not no-code/low-code platforms β those use visual builders, not natural language to code
Vibe coding is specifically: natural language in, code out, test behavior, never read the code.
03. The Philosophy: Trusting the Machine
Vibe coding isn't just a technique. It's a philosophical stance about the relationship between developers and code.
The End of Code as Sacred Text
For decades, programming culture has treated source code as something to be crafted, reviewed, optimized, and understood. Code reviews are rituals. Clean code is a moral virtue. Understanding every line is a professional obligation.
Vibe coding rejects this entirely. It treats code as a disposable intermediary between human intent and running software. The code doesn't matter. The behavior matters.
This is not as radical as it sounds. Most software professionals already interact with layers of abstraction they don't fully understand:
Few web developers read TCP packet internals
Few application developers audit their compiler output
Few React developers understand the fiber reconciliation algorithm
Few SQL users trace query execution plans for every query
Vibe coding simply adds another layer: the AI becomes the compiler for natural language.
The Four Pillars
🎯Intent Over Implementation"What should this do?" replaces "How should I build this?"⚡Speed Over EleganceWorking software now beats perfect code later🤖Trust the AIAccept all, don't read diffs, let the machine handle it📈Results-OrientedDoes it work? That's the only metric that mattersThe Abstraction Argument
Supporters frame vibe coding as the natural progression of programming abstraction:
1950sMachine Code → Assembly"You don't need to write binary opcodes anymore!"1970sAssembly → C"You don't need to manage registers anymore!"1990sC → Python / Java"You don't need to manage memory anymore!"2010sFrameworks / Cloud"You don't need to manage servers anymore!"2025Natural Language → Code"You don't need to write code anymore!"At each transition, purists warned that developers were losing essential skills. At each transition, the expanded abstraction enabled more people to build more things.
⚠️**The counter-argument is real, though:** Every previous abstraction still had deterministic behavior. Assembly always compiles the same way. C always allocates memory the same way. AI code generation is probabilistic β the same prompt can produce different code each time, with different bugs. This is a genuinely new kind of abstraction layer.
04. The Spectrum: Five Levels of AI-Assisted Development
Vibe coding is not binary. In practice, developers operate along a spectrum. Understanding where you sit β and where you should sit for a given project β is critical.
**When to use:** Security-critical code, regulatory requirements, environments where AI tools are prohibited.
</div>
**Tools:** GitHub Copilot, VS Code AI extensions
**Code understanding:** 100% β you review everything
**When to use:** Production code, team projects, anything you need to maintain
</div>
**Tools:** Cursor Composer, Claude Code, Codex CLI
**Code understanding:** 70-90% β you review most things
**When to use:** Professional development, startup codebases, any code that needs to scale
</div>
**Tools:** Cursor Agent, Claude Code, Bolt.new
**Code understanding:** 30-60% β architecture yes, implementation details no
**When to use:** MVPs, internal tools, prototypes headed toward production
</div>
**Tools:** Bolt.new, Lovable, Replit Agent, v0
**Code understanding:** 0-10% β you only test behavior
**When to use:** Personal projects, throwaway prototypes, hackathons, idea validation
</div>
**Tools:** Devin, Google Jules, OpenAI Codex (cloud mode)
**Code understanding:** Review-based β you check the output, not the process
**When to use:** Routine tasks, migrations, test generation, documentation, with human review gate
</div>
</div>
Take the interactive quiz at the end of this ebook to find out.
<button class="quiz-btn quiz-btn-primary" style="margin-top:0.5rem;" onclick="goTo('ch-quiz')">Take the Quiz →</button>
05. The Tools: A Complete Landscape (2025β2026)
The tooling ecosystem for AI-assisted development has exploded. The market is consolidating fast β with Cursor seeking a ~$50B valuation at $2B+ ARR, Lovable at $6.6B, Cognition at $10.2B, and billion-dollar acquisition battles playing out in real time. Anthropic's acquisition of Bun (the fast JavaScript runtime) signals Claude Code's push into native runtime integration. Here's the current state of play across every major category.
AI-Native IDEs
Autonomous Coding Agents
/loop command adds cron-like scheduled tasks — turning Claude Code into a background worker for PR reviews, deployment monitoring, and recurring analysis. 1-million-token context window. Max output increased to 64k tokens for Opus 4.6 (128k upper bound for Opus 4.6 and Sonnet 4.6). MCP servers can now request structured input mid-task via interactive dialogs. Skills.md enables persistent agent behaviors. Early April 2026: Anthropic acquires Bun (the fast JavaScript runtime built by Jarred Sumner) — bringing native Bun integration and faster JS execution directly into Claude Code workflows. Claude overtook ChatGPT as the #1 AI app on the App Store. Revenue surpassed $2.5B ARR (named world's most disruptive company, Time March 2026). In a Mozilla partnership, Claude Opus 4.6 autonomously found 22 CVEs in Firefox's C++ codebase. April 4, 2026 — OpenClaw Policy Change: Anthropic announced that Claude Code subscription limits no longer apply to third-party harnesses such as OpenClaw. Users of third-party Claude Code integrations must move to pay-as-you-go billing; a $200/mo Max subscription was reportedly being used to run $1,000–$5,000 of agent compute. Affected users received a one-time credit. Additional April updates: PowerShell tool for Windows (opt-in preview), flicker-free alt-screen rendering, named subagents in @ mentions, 60% faster Write tool diff computation. Note: Pentagon labeled Anthropic a supply-chain risk in March 2026 over weapons/surveillance policy; defense tech contractors migrating away. April 14, 2026 — Routines Launch: Anthropic launched Routines — saved configurations combining a prompt, repositories, and connectors that run automatically on a schedule or GitHub events on Anthropic's cloud infrastructure (no local machine required). Use cases: automated PR reviews, overnight test triage, weekly repo health audits. Plan limits: 5/day Pro, 15/day Teams, 25/day Enterprise. Desktop app redesigned simultaneously with integrated terminal, faster diff viewer, in-app file editor, and multi-session support. May 6, 2026 — 5-Hour Limit Doubled: Anthropic doubled the 5-hour usage windows for Pro, Max, Team, and Enterprise plans, and removed peak-hour throttling on Pro and Max — attributed publicly to the SpaceX/Colossus 1 compute deal expanding Anthropic's serving capacity. Effective immediately for all paid tiers; no price change. Practical impact: longer continuous sessions before hitting limit walls, and Claude Code becomes usable during peak hours (previously the most painful part of the Max experience).requirements.toml support, runtime refresh behavior, and stronger Windows sandbox integration. 90+ new plugins / skills / app integrations / MCP servers added — Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, and Superpowers among them. App-server workflow improvements: better remote-control behavior, TUI reliability, expanded packaging and release pipeline support across installers, npm, and runtimes.client_credentials grant type for MCP servers (no browser needed for auth β unblocks CI/CD and remote-agent setups), fixes a 100% CPU hang on large file attachments, and tightens the security posture of prompt mode (-p): repo hooks and workspace MCP are now opt-in behind GITHUB_COPILOT_PROMPT_MODE_REPO_HOOKS and GITHUB_COPILOT_PROMPT_MODE_WORKSPACE_MCP env vars — secure by default. /clear and /new now reset the active custom agent selection, and subagents evaluate tool-search support against their own model rather than inheriting the parent session's settings. May 6, 2026 — CLI v1.0.43 adds a username toggle to the /statusline picker (active account visible in the footer), moves Auto mode to server-side model routing for real-time selection, and ships two security fixes that matter for vibe coders working with untrusted repos: protection against RCE from malicious bare repositories nested inside a project, and full termination of MCP server child processes (those spawned via npx/uvx) when a session ends — previously left as orphans. May 8, 2026 — CLI v1.0.44: slash commands can now appear mid-input and multiple skills can be invoked in a single message; userPromptSubmitted hooks can handle requests directly and bypass the LLM (huge for deterministic gating); path completion in /add-dir no longer flickers or gets intercepted by @/# pickers; tool permissions granted in autopilot mode persist across /clear; and the Free-tier quota display finally shows actual remaining usage instead of always reading 100% consumed. May 11, 2026 — CLI v1.0.45: a dedicated /autopilot slash command toggles between interactive and autopilot modes without the Shift+Tab cycle through every mode in between; Windows PowerShell fallback (powershell.exe) kicks in when PowerShell 7+ (pwsh) isn't available; OpenTelemetry output now aligns with GenAI semantic conventions — MCP tool calls use standard tool_call spans and a new gen_ai.client.operation.duration metric tracks tool execution time; sessions with extension permission prompts resume cleanly (no more "Session file is corrupted" error); and CLI startup is faster on terminals with limited OSC color query support. Effective June 1, 2026 — usage-based billing: Copilot code review starts consuming GitHub Actions minutes and bills via AI Credits. Confirmed pricing: Pro stays at $10/mo and includes $10 in AI Credits plus a $5 flex allotment ($15 included usage); Pro+ stays at $39/mo with $39 credits plus $31 flex ($70 total); Business $19/seat with $19 credits; Enterprise $39/seat with $39 credits. 1 AI credit = $0.01 USD, billed against input + output + cached tokens. Crucially: code completions and next edit suggestions stay unlimited and do NOT consume AI Credits on any paid plan. What does consume credits: Copilot Chat, Copilot CLI, Copilot cloud agent, Copilot Spaces, Spark, and third-party coding agents. For private repos, Actions minutes draw from existing plan entitlements. Audit your Actions and Chat/CLI consumption before June 1 if you run Copilot agents at scale. May 14, 2026 — CLI v1.0.48: the model picker now displays actual per-million-token input/output prices alongside each model name — making the upcoming June 1 cost difference between Claude Sonnet 4.6, GPT-5.5, and Gemini 3.5 Pro visible at selection time, not just in the bill. The chat window also gains a unified sessions view tracking every running agent session (title, agent type, elapsed time, status) with filters by agent type and status; agent mode adds an Ask Question tool so agents can request focused clarification mid-task instead of making implicit assumptions; and a new global ~/.copilot/agents/*.agent.md location makes custom agents available across all workspaces (previously workspace-scoped only). May 15, 2026 — Grok Code Fast 1 Deprecated: xAI's Grok Code Fast 1 was fully removed from every Copilot surface — Chat, inline edits, ask and agent modes, code completions. If you had it as your default, Copilot will fall back to Auto routing; reset your preferred model before the next session. Combined with the Opus removal from Pro plans in April, Copilot's individual-plan model lineup is narrowing in lockstep with the move to usage-based billing on June 1, 2026.Browser-Based Builders
The Infrastructure Layer: MCP
</div>
The Model Race (March 2026 Update)
The foundation models powering these tools are advancing on multiple fronts. Key releases in early March 2026:
- GPT-5.4 (OpenAI): Native computer-use, 1M context, Standard/Thinking/Pro variants. Already integrated into Codex CLI and Copilot.
- Gemini 3.1 Flash-Lite (Google): Ultra-low-latency variant designed for inline code completions and real-time suggestions. Powers Windsurf and Jules background tasks.
- GLM-4.7 (Zhipu AI): China's leading code model, competitive with GPT-5 on multilingual programming benchmarks. Growing adoption in Asian markets.
- DeepSeek-V3.2-Speciale (DeepSeek): Open-weight model rivaling proprietary offerings. Strong at multi-file reasoning and long-context code generation.
Open-source LLMs now account for over 60% of production AI deployments — a tipping point driven by DeepSeek, Llama, Qwen, and Mistral. This has shifted the economics: developers increasingly use open-weight models for routine code generation while reserving proprietary models for complex architectural reasoning.
April 27, 2026 Update — The Flat-Rate Era Is Ending
Inside a six-week window in March–April 2026, the three biggest names in AI-assisted coding tightened limits, shortened caches, and pushed frontier models behind multipliers. Many users only discovered the changes through their billing dashboards or daemon logs. The pattern is consistent enough to call:
- Claude Code (Anthropic) — the server-side prompt cache TTL was reduced from 1 hour to 5 minutes. Long-running agentic sessions that previously hit warm cache for the whole day now incur cache misses every few minutes, increasing real cost-per-call materially without any change to nominal pricing.
- GitHub Copilot — on April 20, 2026, GitHub announced a freeze on new signups for Copilot Pro, Pro+, and Student tiers. Existing subscribers retain access; new users are queued or directed to higher Business/Enterprise tiers. CLI release cadence continued (v1.0.35 on April 23 with slash-command tab-completion, v1.0.36 on April 24 with a subcommand picker), but the consumer signup gate is the structural news.
- Cursor — frontier models (Claude Opus 4.7, GPT-5.5, Mythos Preview where available) were moved behind Max Mode on legacy Team and Enterprise plans, accelerating credit burn for heavy users.
None of these are isolated pricing tweaks. They are the industry moving from flat-rate “AI teammate” marketing toward metered compute economics, because agentic workflows have fundamentally changed consumption. An average 2024 Copilot user made roughly 50 model calls per day. An average 2026 Claude Code or agentic Codex user makes thousands. Background agents, scheduled routines, multi-agent orchestration, and Cursor Background Agents all multiply per-user inference load by one to two orders of magnitude. Flat-rate pricing was viable when every user looked roughly like every other user. It stops being viable when one power user's daily compute equals an entire small-team subscription cost.
The Stack That Won
Underneath the pricing turbulence, the question of “which tool do I use” has settled into one of two stable configurations for most engineers shipping production code in April 2026:
- Cursor for daily editing + Claude Code for complex tasks. The IDE handles typed-while-you-think completion, refactors, and the design-mode visual workflow. Claude Code in a sibling terminal handles multi-file refactors, full-repo reasoning, and any task where the agent should run uninterrupted for minutes.
- GitHub Copilot in the IDE + Claude Code in the terminal. For shops already standardized on VS Code or JetBrains with Copilot Business, the same split-of-labor applies, just with Copilot in the editor seat.
The convergence on this two-tool pattern is real. It is also why the pricing pressure shows up the way it does: nobody is paying for one tool anymore, and the providers know it. The wallet is finite. The friction is moving from “which IDE do I commit to” to “how do I budget agent compute across two or three tools simultaneously.”
What This Means in Practice
- If you are an individual paying out-of-pocket: budget for metered compute. The flat-rate $20–$30/month subscription that covered everything is gone or going. The honest 2026 number for a heavy individual user across Claude Code + Cursor or Copilot is closer to $60–$200/month depending on agentic workload, and going up.
- If you run an engineering team: rebuild your AI tooling budget around per-seat metered compute, not flat seats. Heavy users will burn 5–10x the compute of light users. Pretending otherwise leads to ugly mid-quarter surprises. Most teams that have been running flat-rate budgets are now shifting to a Business/Enterprise tier with explicit overage allowances.
- If you are evaluating tools right now: evaluate the metered cost on a representative agentic workflow, not the headline subscription price. The headline number tells you almost nothing about what an agent-heavy workflow will actually cost in production.
Sources: Medium “The Flat-Rate AI Coding Subscription Era Is Ending” (April 2026); Havoptic AI Tool Releases; The New Stack “Cursor, Claude Code, and Codex are merging into one AI coding stack”; pasqualepillitteri.it “AI Coding Tools 2026 Price Hike.”
Andrej Karpathy, who coined "vibe coding" in February 2025, introduced a new term in early 2026: "agentic engineering" — the discipline of designing, orchestrating, and supervising autonomous AI agents that write code, run tests, and deploy systems with minimal human intervention. The term has rapidly entered common usage, marking the evolution from "coding with AI" to "engineering with agents."
06. The Agent Revolution
The most significant development since Karpathy's tweet isn't better autocomplete. It's the emergence of autonomous coding agents β AI systems that independently plan, implement, test, and deploy software.
From Copilot to Colleague
/loop command and Claude Managed Agents enable scheduled background tasks. Agents run CI pipelines, triage issues, and maintain codebases overnight. The developer reviews a morning summary of what the AI decided and changed while they slept.What Agents Can Do Today
Modern coding agents reliably handle tasks that would take a junior developer 4-8 hours:
The April 2026 Benchmark Picture
Agent performance has accelerated dramatically. The current public leaderboard (April 2026):
| Model | SWE-bench Verified | Access |
|---|---|---|
| Claude Mythos Preview | 93.9% | Restricted (Project Glasswing) |
| Claude Opus 4.6 | 80.8% | Public |
| Gemini 3.1 Pro | 80.6% | Public |
| GPT-5.4 | 75.0% | Public |
| Kimi K2.5 (open-source) | ~75% | Open |
Kimi K2.5 by Moonshot AI is the current #1 open-source option: 1 trillion parameter MoE architecture with 32 billion active parameters, competitive with frontier models at a fraction of the inference cost.
New Agent Orchestration Frameworks (April 2026)
Two major frameworks launched in April 2026 that reshape how multi-agent systems are built:
- Google Agent Development Kit (ADK):
google/adk-pythonβ 8,200+ stars on launch week. Purpose-built for multi-agent orchestration with native Gemini integration and MCP support. Best for complex agent pipelines with multiple specialized sub-agents. - Meta llama-stack: Standardized agent runtime for Llama 4 models. Defines interfaces for tool calling, memory, and agent orchestration that work across the open-source ecosystem.
- Claude Managed Agents: Anthropic's managed runtime at $0.08/session-hour plus token costs. Provides sandboxed execution, state management, and permission scoping. Testing shows 10 percentage point improvement in task success rates over standard prompting.
The practical implication: you no longer need to build agent infrastructure from scratch. These frameworks handle the hard parts β state, retries, tool routing, parallelization β so you can focus on the task logic.
What Agents Still Struggle With
Cognition's own 2025 performance review of Devin put it well:
"Devin is senior-level at codebase understanding but junior at execution."
- Ambiguous requirements β agents make assumptions that may not match intent
- Complex architectural decisions β they can implement but struggle with system-level design
- Cross-system integration β tasks requiring deep understanding of multiple interconnected systems
- Security context β knowing when something is dangerous requires deployment context, not just code patterns
The Parallel Execution Advantage
Unlike human developers, agents can run multiple instances simultaneously, work 24/7, and process entire backlogs of tickets overnight.
Karpathy's Software 3.0 Framework (May 2026)
Andrej Karpathy β the researcher who coined "vibe coding" in February 2025 β returned in May 2026 with a more formal framework for what is actually happening in AI-native development. He calls it Software 3.0: a three-era model that explains why vibe coding and agentic engineering feel different even when they use the same tools.
- Software 1.0 β Explicit instructions. Humans write code that computers execute deterministically. The program is the specification. Era: 1950sβpresent.
- Software 2.0 β Neural weights. Humans specify desired behavior through examples and loss functions; gradient descent writes the actual program. The dataset is the specification. Era: 2012βpresent.
- Software 3.0 β Natural language programs. Humans specify behavior in English (or any language); the LLM interprets and executes. The prompt is the program. Era: 2022βpresent.
The practical implication of this framework is the distinction Karpathy draws between vibe coding and agentic engineering:
| Dimension | Vibe Coding | Agentic Engineering |
|---|---|---|
| Era | Software 3.0 (prompts as programs) | Software 3.0 + 1.0 hybrid |
| Specification | Natural language intent | Structured task + verification |
| Human role | Creative director | Architect + verifier |
| Appropriate for | Prototypes, personal tools, MVPs | Production systems, multi-user software |
| Risk profile | Higher (less structure) | Lower (explicit checkpoints) |
| Speed | Fastest | Fast with guardrails |
Vibe coding is not a degraded form of agentic engineering β it is the right tool for a different job. As Karpathy put it: "Software 3.0 is already here. The question is not whether to use it, but which layer of the stack you're applying it to and whether your verification layer matches the stakes."
The SpaceX signal reinforces this. Reports in May 2026 that SpaceX evaluated a $60 billion acquisition of Cursor β which would make it the largest AI coding deal in history β suggest that infrastructure-grade companies are treating AI coding tooling as foundational platform technology, not a developer productivity toy. When that happens, the Software 3.0 thesis moves from academic framework to engineering mandate.
Cross-link: β Karpathy's Software 3.0 framework β endofcoding.com. β Chapter 16: What Comes Next for the long-horizon architecture implications. β vibe-coding.academy β Software 3.0 module.
07. Vibe Coding in Practice: Real Workflows
Theory is interesting. Practice is what matters. Here are four concrete workflows for different scenarios.
**Scenario:** You have a product idea and want a working prototype by Monday.
**Tools:** Bolt.new or Cursor + Claude • **Level:** 3-4
1. Write a detailed description (spend 20-30 min β it's the most important step)
Include: target users, core features, data model, key screens, visual style
Paste into Bolt.new or Cursor Composer
Iterate through natural language: "Make the sidebar collapsible" / "Add dark mode"
Deploy to Vercel or Netlify
Share with potential users for feedback
Build a job application tracker. I'm applying to software engineering positions and need to track: company name, position title, application date, status (applied/phone screen/onsite/offer/rejected), salary range, notes, and next action date. I want a clean dashboard showing all applications in a table with sorting and filtering. Include a kanban view grouped by status. Use a modern blue/slate color scheme. Store in localStorage. Make it responsive for mobile.
</div>
<div class="tab-content" id="wf2">
#### The Startup MVP
**Scenario:** Building a real product for real users, fast.
**Tools:** Claude Code + Cursor + v0 • **Level:** 2-3
1. Start with a product requirements document (even a rough one)
2. Use v0 to prototype key UI screens
3. Use Claude Code to scaffold the full architecture
4. Build feature-by-feature, testing each before moving on
5. Review auth code and data handling; accept UI code freely
6. Deploy to real hosting, set up monitoring
7. Plan a "hardening phase" for security-critical paths
<div class="callout warning">
<div class="callout-icon">⚠️</div>
<div class="callout-content">**The trap:** Skipping step 7. Many YC startups vibe-coded their MVPs successfully but faced "development hell" when trying to scale without hardening.
</div>
</div>
</div>
<div class="tab-content" id="wf3">
#### The Enterprise Integration
**Scenario:** Adding a feature to an existing production codebase.
**Tools:** Claude Code or Devin + CI/CD pipeline • **Level:** 5 with human gate
1. Create a detailed ticket with acceptance criteria
2. Assign to an AI agent (Devin, Claude Code, or Jules)
3. Agent analyzes codebase, creates a plan, implements the change
4. Agent runs existing test suite and fixes failures
5. Agent opens a pull request
6. Human reviews: security, performance, architecture, edge cases
7. Merge after human approval
This is Level 5 but with human review as the final gate. It's how most enterprises adopt AI coding in 2026.
</div>
<div class="tab-content" id="wf4">
#### The Solo Creator
**Scenario:** You're not a developer. You have an idea for an app.
**Tools:** Lovable, Bolt.new, or Replit Agent • **Level:** 4
1. Describe your application as if explaining it to a friend
2. Let the builder create the first version
3. Use it yourself β note what's wrong or missing
4. Describe changes in plain language
5. Repeat until satisfied
6. Deploy using the platform's built-in hosting
<div class="callout danger">
<div class="callout-icon">🔴</div>
<div class="callout-content">**Critical:** If your app handles user data, sensitive information, or payments, hire a security professional to review it before going live. The Lovable vulnerability study (170/1,645 apps) shows this isn't hypothetical.
</div>
</div>
</div>
08. Real-World Case Studies
These are documented, real examples β not hypotheticals.
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
09. The Numbers: Adoption and Impact
The data tells a clear story: AI-assisted development isn't a trend. It's a structural shift.
Adoption
AI Tool Daily Active Use Share β Stack Overflow 2026 (May 19, 2026)
First time Claude Code ranks #1 in daily active use across the developer population (Stack Overflow's 90,000+ respondent survey).
JetBrains Developer Ecosystem Survey 2026 (May 23, 2026)
Independent second read on AI coding tool adoption from JetBrains' annual survey. The Stack Overflow result above tracks daily active use across the broader developer population; the JetBrains numbers below track AI-coding-tool category share and reveal a sharper preference signal among experienced developers.
The senior-dev signal: among developers with 10+ years of professional experience, Claude Code's preference share (46%) is more than 5× Copilot's (9%). The combined Stack Overflow + JetBrains read for May 2026: Claude Code is now the #1 AI coding tool by both daily-active use and senior-developer preference β Copilot still leads on raw category share but has lost roughly a third of its installed base year-over-year.
AI Market Share (May 2026 β Historic Flip)
Historic milestone (April 2026): For the first time, Anthropic's Claude surpassed OpenAI's ChatGPT in US business adoption. Source: Ramp AI Business Adoption Index (tracks actual B2B payments, not surveys).
The Agentic Model Race (April–May 2026)
Seven major model releases in seven weeks reshaped the competitive landscape. The race is no longer about raw benchmark scores β it's about how many agents a model can orchestrate, how long it can sustain autonomous work, and how much that work costs per token.
The signal: In seven weeks, the public record for coding agent benchmarks shifted from Claude Opus 4.6 (80.8%) to Gemini 3.5 Pro (89.1%, Google I/O May 19) β with Mythos's restricted 93.9% remaining the unreleased ceiling. Multi-agent swarm scaling β exemplified by Kimi K2.6's 300-agent architecture and Qwen3.7-Max's 1,158-tool-call autonomous run β is the new frontier. Cost-per-token competition is the second front: Cursor Composer 2.5 ($0.50/$2.50), Gemini 3.5 Flash ($1.50/$9.00), and Qwen3.7-Max ($2.50/$7.50) all hit benchmark parity with prior frontier models at fractions of Opus 4.7's per-token bill. For agentic workloads sustained over hours, the inference economics increasingly favor tool-vendor in-house models or hyperscaler cost leaders over headline frontier LLMs.
Revenue & Growth
Valuations (2026)
Enterprise AI Momentum (May 2026)
The enterprise AI services market is consolidating fast. Anthropic partnered with Blackstone, Hellman & Friedman, and Goldman Sachs to launch a dedicated enterprise AI services company β targeting mid-sized organizations that lack in-house frontier AI deployment capacity. Meanwhile Sierra ($950M) and Cognition ($25B valuation) signal that enterprise AI customer experience and AI software engineering are becoming independent category leaders.
May 2026 enterprise anchors:
- SAP + Anthropic (May 13, 2026): Claude will power SAP's Business AI Platform as primary reasoning and agentic layer β reaching 440M+ SAP users and enabling autonomous enterprise tasks (closing books, rerouting supplier orders) within existing governance frameworks.
- SpaceX + Anthropic (May 6, 2026): 300 megawatts of compute from SpaceX's Colossus 1 facility in Memphis (220,000+ Nvidia processors). Anthropic's largest capacity expansion to date, reducing API rate-limit constraints.
The signal: Total disclosed AI venture capital through Q1 2026 already exceeds all of 2025. The $900B Anthropic valuation marks a potential inflection β from venture-funded AI bets to pre-IPO institutional positioning. Harvard, Goldman Sachs, Blackstone, and Broadcom investing in Anthropic infrastructure within 30 days tells you where the enterprise market is headed. The April 2026 adoption flip is the market validating this thesis with payment data.
Productivity
Developer Sentiment (April 2026)
Cultural Impact
- Collins Dictionary Word of the Year 2026: "Vibe coding" (named again after 2025)
- MIT Technology Review: Named "Generative Coding" a 2026 Breakthrough Technology
- Merriam-Webster: Added as slang/trending term within one month of Karpathy's tweet
- Wikipedia: Full article with extensive sources and analysis
- Wall Street Journal: Reported widespread professional adoption (July 2025)
- Fast Company: Documented the "vibe coding hangover" (September 2025)
- arXiv: "Vibe Coding Kills Open Source" paper sparks open-source funding debate (January 2026)
- VibeX 2026: First academic workshop on vibe coding, scheduled at EASE conference in Glasgow
- Mainstream: Vibe coding is now a recognized methodology taught in bootcamps and referenced in enterprise strategy documents
10. The Dark Side: Security, Debt, and Failure
For every success story, there's a cautionary tale. The risks are real, documented, and in some cases severe.
The Tenzai Security Study
**Key finding:** AI tools avoid generic security flaws but struggle where what makes code safe vs. dangerous depends on context.
</div>
The Acceleration: 35 CVEs in One Month
The security threat from AI-generated code is not static. It is accelerating. In March 2026, security researchers confirmed 35 CVEs directly attributable to AI-generated code β 27 of them from Claude Code alone. Researchers from the CERT/AI Working Group estimate the actual monthly count including triaged-but-unpublished vulnerabilities is 400 to 700 per month.
The trend is steep and mirrors adoption curves:
| Month | Confirmed AI Code CVEs | Estimated Total |
|---|---|---|
| Jan 2026 | 12 | 250β350 |
| Feb 2026 | 21 | 310β450 |
| Mar 2026 | 35 | 400β700 |
The root cause is structural: AI coding tools generate code that compiles and passes tests, but they optimize for functional correctness rather than security context. A model trained on decades of existing internet code learns the prevalence of insecure patterns alongside secure ones β and reproduces them with equal confidence. As AI-generated code's share of all new code climbs toward 41% (GitHub, March 2026), the absolute volume of AI-sourced vulnerabilities scales with it.
The deeper concern: the vulnerability rate is growing faster than the adoption rate, suggesting the tools are getting worse at security relative to their capability growth.
</div>
Documented Security Incidents
AI as Vulnerability Hunter: The Other Side of the Coin
</div>
The Threat Landscape: Ransomware Meets AI
The broader cybersecurity environment compounds the risk of insecure AI-generated code. As of early 2026, there are 124 active ransomware groups — a 49% year-over-year increase. These groups are increasingly using AI to generate phishing lures, analyze codebases for vulnerabilities, and automate lateral movement. The intersection of AI-generated insecure code and AI-accelerated exploitation creates a compounding threat surface.
The AI Slopageddon: Open Source Fights Back
By early 2026, a new phenomenon emerged that open-source maintainers dubbed the "AI Slopageddon" — a flood of low-quality, AI-generated bug reports, pull requests, and security "findings" overwhelming popular projects:
- cURL: Daniel Stenberg reported a deluge of AI-generated vulnerability reports so poor they were "worse than spam" — wasting maintainer time triaging hallucinated CVEs. He began publicly shaming the worst offenders and lobbied HackerOne to penalize AI-slop submissions.
- Ghostty: The terminal emulator project implemented explicit policies rejecting AI-generated contributions after a wave of superficially plausible but fundamentally broken PRs.
- tldraw: The collaborative whiteboard project documented a pattern of AI-generated issues that described bugs that didn't exist, in code paths that didn't exist, with reproduction steps that couldn't work.
The pattern is consistent: AI tools lower the barrier to appearing competent enough to submit contributions, but the submissions lack the understanding that makes them useful. Maintainers are now spending significant time filtering AI slop instead of building software — an ironic cost of the productivity tools meant to help them.
The $1.5 Trillion Technical Debt Problem
Analysts have warned of a potential $1.5 trillion in technical debt by 2027 from AI-generated code:
41% higher code churn β AI code gets rewritten more often
8x increase in duplicated code blocks (GitClear, 2024)
30% of AI suggestions accepted in professional environments
Forrester: 75% of tech leaders will face moderate-to-severe tech debt by 2026
The "Vibe Coding Hangover"
By late 2025, Fast Company reported senior engineers entering "development hell" maintaining vibe-coded systems:
🧬Zombie AppsFunctional but unmaintainable🍝Spaghetti CodeWorks but no coherent structure🚧Complexity CeilingCan't extend without breaking😶Debug ImpossibilityNobody can trace the code they never readThe AI Attack Acceleration Problem (2026)
The same capabilities that democratized vibe coding have democratized sophisticated cyber attacks. In 2026, AI has compressed timelines across the entire threat lifecycle:
28.3%CVEs exploited within 24 hours of disclosure (2026) β up from ~3% in 202244 daysMedian time-to-exploit (2025) β down from 700+ days in 2020+75%Malicious packages on public repos year-over-year (2026)AI tools now enable attackers to analyze CVE disclosures and generate working exploit code within hours of the NVD advisory going public, scan public repositories for vulnerable dependency trees at scale, and produce convincing malicious packages complete with fake README files and CI badges. The 24-hour exploitation window means that for more than one in four CVEs published in 2026, the gap between "disclosure" and "active exploitation" is measured in hours, not months.
For vibe coders, this creates a specific exposure: AI coding assistants suggest high-density dependency trees (a 500-line Express API may have 80+ transitive dependencies), and the vibe coding workflow optimizes for shipping rather than security audit cadence. Running
npm auditat the end of a sprint is no longer adequate when 28.3% of CVEs are already being exploited by the time your sprint ends.⚠Minimum security cadence for vibe coding in 2026: Runnpm audit --audit-level=highorpip-auditbefore every production deploy. Subscribe to CVE alerts for your exact dependency stack. Treat every AI-recommended package as requiring a 30-second verification before acceptance. See Chapter 19 for the full security playbook β and CyberOS for automated CVE alerting on the vibe coding stack.Source: The Hacker News, "2026: The Year of AI-Assisted Attacks" (May 4, 2026); EPSS v4 exploitation data (FIRST, Q1 2026); Phylum Software Supply Chain Security Report (Q1 2026).
The Prototype Pollution Wave: JavaScript's Hidden AI Vulnerability
April 2026 brought a concentrated cluster of prototype pollution vulnerabilities across the JavaScript ecosystem β a vulnerability class that AI coding tools are particularly prone to introducing and uniquely bad at detecting. Prototype pollution occurs when an attacker can inject properties into
Object.prototype, the root object that every JavaScript object inherits from. Once polluted, the attacker can override behavior across the entire application β enabling authentication bypass, remote code execution, or denial of service.Why does vibe coding amplify the risk? AI assistants trained on historical code learn to suggest patterns like
obj[key] = valueandObject.assign(target, userInput)without the defensive checks that distinguish safe from unsafe usage. The resulting code passes tests β it works exactly as specified β but opens a lateral attack surface that code review and automated scanners frequently miss.⚠Prototype Pollution in Context: In a CodeQL analysis of 10,000 AI-generated Node.js projects (April 2026), researchers found prototype pollution sinks in 38% of projects that accepted user-controlled JSON input β compared to 11% in a matched sample of human-written code. The gap is attributed to AI models treatingJSON.parse(userInput)as a solved problem and rarely adding the downstream sanitization that safe usage requires.#### CVE-2026-40175 and LLM-Generated Node.js Code: Why Axios Is the CanaryCVE-2026-40175 • CVSS 8.8Axios Prototype Pollution β Billions of Installs AffectedA high-severity prototype pollution vulnerability discovered in Axios, the most widely used HTTP client library in the JavaScript ecosystem with over 50 billion npm downloads. A crafted response header from an attacker-controlled server could corruptObject.prototypein the consuming application, enabling property injection across the entire runtime. Because AI assistants (Claude Code, Cursor, Copilot) recommend Axios in virtually every Node.js and browser project, the blast radius is extraordinary: an estimated 40β60% of vibe-coded JavaScript projects use Axios for API calls. Patch: upgrade to Axios β₯1.9.1. Audit any project that processes API responses without explicit header sanitization.The Axios prototype pollution vulnerability is not simply a library bug β it is a systematic exposure created by how AI coding assistants generate Node.js code. When a developer prompts Claude Code, Cursor, or Copilot to "add an API integration" or "fetch data from this endpoint," the model's near-universal first choice is Axios: it appears in training data more than any other HTTP client, its ergonomics fit naturally into the request-response patterns LLMs generate, and it is recommended in virtually every Stack Overflow thread the models ingested. The problem is that LLM-generated Axios code consistently skips the input sanitization step between receiving an API response and merging its data into application state β the exact pathway that CVE-2026-40175 exploits.
In a CodeQL analysis of 10,000 AI-generated Node.js projects reviewed after the disclosure, researchers found that 73% of projects using Axios processed API response data with
Object.assign()or spread operators without intermediate sanitization β the precise pattern that allows a malicious server response to poisonObject.prototype. Human-written code in the same study showed a 31% rate for the same pattern, suggesting the gap is not incidental but structural: AI models optimize for the terse, readable code that ships fast, and defensive sanitization is verbose, "ugly," and rarely present in the training examples the models emulated. The risk is compounded in vibe-coded apps because the developer often never reads the Axios integration code β the AI generated it, it worked, and it shipped.For any vibe-coded Node.js application that calls external APIs with Axios, the mitigation is a two-step fix: upgrade to Axios β₯1.9.1, and add
JSON.parse(JSON.stringify(responseData))or a schema-validation library like Zod between the API response and anyObject.assignor spread merge. CyberOS users receive automated CVE alerts scoped to their exact dependency versions β including pinned Axios version monitoring β so the patch window shrinks from weeks to hours. See Chapter 17, Prompt 17.255 for a ready-to-use audit prompt that scans any AI-generated codebase for unguarded Axios response merges and generates the sanitization patch automatically.CVE-2026-21710 • CVSS 7.5Node.js Core Prototype Pollution via URL ParsingA prototype pollution vulnerability in Node.js's built-in URL parsing module (url.parse) that affects all Node.js versions prior to the April 2026 security release. Specially crafted URLs passed tourl.parse()can set arbitrary properties onObject.prototype, potentially overriding security-critical properties likeisAdmin,authenticated, orroleif the application checks these properties after URL parsing. This is especially dangerous in vibe-coded authentication flows, where AI-generated middleware often checks authorization properties on request objects derived from the parsed URL path. Patch: Node.js 20.19.2, 22.14.1, and 24.0.2. Avoidurl.parse()β use the WHATWGURLconstructor instead.CVE-2026-39987 • CISA KEV • CVSS 9.1Marimo AI Notebook β Arbitrary Code Execution (Active Exploitation)Critical code execution vulnerability in Marimo, the reactive Python notebook and app builder that has become a staple tool for AI researchers and vibe coders building data dashboards and ML prototypes. The vulnerability stems from unsafe deserialization of notebook state β a pattern that AI assistants frequently introduce when generating notebook persistence or sharing features. Added to the CISA Known Exploited Vulnerabilities (KEV) catalog in April 2026 with a mandatory patch deadline. Active exploitation has been observed targeting data science teams and AI research infrastructure. Patch: upgrade to Marimo β₯0.11.4; disable public sharing of notebook state until patched. For real-time CVE tracking across the vibe coding stack, see EndOfCoding.com security briefings.Supply Chain Injection Risks in AI-Generated package.json Dependencies
A second, underappreciated threat vector emerges at the moment an AI coding assistant writes a
package.jsonorrequirements.txtfile: the dependency selection itself can be an attack surface. LLMs generate dependency lists from training data that may include packages that have since been abandoned, taken over by new owners, or never existed under the exact name suggested β a class of attacks known as dependency confusion and typosquatting injection. When a model confidently suggestsaxios-extensions,react-query-utils, orexpress-validator-pro, it is pattern-completing from training data that may not map to the legitimate npm package at that exact name in 2026. Attackers actively register names that fit these plausible-sounding patterns, publish packages with maliciousinstallscripts, and wait for AI-generatedpackage.jsonfiles to pull them in.The attack surface is broader than just invented names. AI coding tools frequently suggest packages that were legitimate at training time but have since been abandoned and transferred to new npm accounts with no security review. npm's ownership transfer process does not invalidate existing installs β a package downloaded a year ago under a trusted maintainer may pull a malicious update today because the namespace was transferred to an unknown party. In a 2026 audit of 5,000 AI-generated
package.jsonfiles, security researchers found that 12% contained at least one package with an ownership change in the prior 18 months and no corresponding version pin β meaning anynpm installwould silently fetch whatever the new owner published. For Python, the risk is compounded by PyPI's less restrictive ownership model and the model tendency to suggest packages it saw in tutorials that have since been unmaintained for two or more years.The mitigation for vibe coders is systematic rather than reactive: use exact version pinning (
=1.9.1rather than^1.9.1) in production lock files, runnpm install --ignore-scriptsfor initial installs to prevent maliciouspostinstallhooks, verify every AI-suggested package on npmjs.com or PyPI before accepting it (30-second check: download count, last publish date, owner account age), and enable GitHub Dependabot withallow: [ecosystem: npm]filtering to flag unexpected ownership changes. CyberOS provides automated dependency provenance monitoring β flagging packages where the publisher identity changed between your last install and today β as part of its vibe coding security dashboard. The full dependency vetting checklist is in Chapter 17, Prompt 17.256, and the Chapter 19 Security Playbook section on supply chain hygiene covers lockfile auditing in depth.💡Audit Your Vibe-Coded Projects Now: Runnpm audit(JavaScript) orpip-audit(Python) on every AI-assisted project in your stack. For prototype pollution specifically, add a CodeQL or Semgrep scan targeting prototype pollution sinks. The Chapter 19 Security Playbook includes a 30-minute security checklist covering prototype pollution detection and remediation for the most common vibe coding stacks β and Chapter 17 (Category 42) includes ready-to-use Security Audit prompts you can run against any AI-generated codebase today.The First Agentic-Vector CVE: Cursor RCE via Git Hooks
A new attack category arrived in May 2026 β one that specifically targets the way AI coding agents interact with repositories. CVE-2026-26268 is the first documented agentic-vector CVE: a vulnerability where the attack surface is not a traditional application endpoint, but the AI agent itself.
CVE-2026-26268 • CVSS 8.1Cursor IDE β Remote Code Execution via Malicious Git HooksA remote code execution vulnerability in Cursor IDE triggered by cloning a repository containing malicious.git/hooks/scripts. When Cursor's agent automatically reads and indexes a freshly cloned project β its standard behavior for providing code context β specially crafted hook files are executed with the user's local privileges. Unlike traditional RCE vulnerabilities that require a running server, this attack surface is the developer workflow itself: clone β agent reads β hooks execute. The attack can be embedded in any GitHub repository, including open-source projects, interview take-home assignments, and contractor-submitted codebases. Patches: Cursor 0.48.3+ adds a "Safe Clone" confirmation dialog and sandboxes hook execution. Mitigation for all AI coding tools: rungit config core.hooksPath /dev/nullbefore opening any unfamiliar repo in an AI agent, or usegit clone --no-local --template=/dev/null. See Chapter 17 (Prompt 17.241) for a complete pre-clone security checklist prompt.The significance of CVE-2026-26268 extends beyond its CVSS score. It represents a structural shift in the threat model for AI-assisted development:
🚫The Agentic Attack Surface: Traditional security assumes the developer is a human who reads files before executing them. AI coding agents violate this assumption β they read, index, and act on repository contents automatically and at machine speed. CVE-2026-26268 exploits exactly this behavior. Every AI coding tool that auto-indexes cloned projects has a version of this exposure. The mitigations (sandboxed hooks, explicit confirmation dialogs) are patches on a fundamentally new attack surface that did not exist before the agent era.Property CVE-2026-26268 (Agentic Vector) Traditional IDE RCE Trigger Agent auto-reads cloned repo User opens malicious file Attack speed Milliseconds after clone Requires user action Visibility Zero β no UI interaction File open dialog Delivery channel Any public GitHub repo Phishing, drive-by Mitigation complexity Per-tool, behavior-dependent Standard sandboxing ACM Formal Warning: The First Standards Body Intervention
In May 2026, the Association for Computing Machinery (ACM) β the world's largest computing professional society β issued a formal warning on vibe coding risks. This is the first intervention by a major computing standards body, marking a shift from community debate to institutional concern.
⚠ACM Technical Advisory (May 2026): The ACM Software Engineering Technical Council warned that AI-assisted "vibe coding" practices introduce systemic risks when used without adequate verification frameworks. The advisory specifically cited: (1) insufficient testing of AI-generated code before production deployment, (2) security vulnerability rates significantly higher than hand-written code, (3) maintainability and technical debt risks from AI-generated code that passes tests but fails under edge cases, and (4) professional liability questions when AI-generated software causes harm. The ACM stopped short of recommending against vibe coding, instead calling for "structured human oversight at critical decision points" β a position that aligns with what serious practitioners already do.The ACM warning lands in a context where vibe coding has moved well beyond hobbyist projects. According to GitHub's March 2026 data, AI-generated code now represents 41% of all new code committed to public repositories. At that scale, the ACM's concern is not academic β it is about the systemic risk profile of a majority-AI code base in production systems.
What the ACM is recommending aligns with the practical guidance throughout this book:
- Human review at architecture decision points
- Automated testing that covers security, not just functional correctness
- Verification workflows before agentic deployments (see Chapter 17, Prompt 17.240)
- A "Software 3.0 readiness" assessment before delegating critical logic to AI agents
The Mini Shai-Hulud: First SLSA-Certified Malware (May 2026)
The supply chain attack landscape reached a new milestone in May 2026 when attackers compromised 42
@tanstack/*packages (84 versions, 12M+ weekly downloads) along with@mistralaipackages β in what security researchers dubbed the Mini Shai-Hulud attack. Its significance isn't the scale, but the method: it produced the first documented npm worm generating validly-attested SLSA Build Level 3 malicious packages.⚠SLSA Level 3 No Longer Guarantees Integrity: The Mini Shai-Hulud attack hijacked OIDC tokens from misconfigured GitHub Actions workflows β specifically jobs that combinedid-token: writepermissions with PR triggers from unprotected branches. The stolen OIDC token was used to publish malicious package versions that carried valid, cryptographically signed SLSA Build Level 3 provenance attestations. Teams relying on SLSA attestation presence as a security signal are now exposed: attestation presence does not equal supply chain integrity if the signing key can be obtained via CI misconfiguration.SUPPLY CHAIN • CRITICALMini Shai-Hulud β @tanstack/* and @mistralai npm Compromise (May 11, 2026)Attackers hijacked OIDC tokens from GitHub Actions workflows in the TanStack and Mistral monorepos by exploiting misconfigured CI jobs that combined publish permissions with pull_request triggers accessible to external contributors. The stolen tokens were used to publish 84 malicious package versions across 42@tanstack/*packages and the@mistralaipackage family. The malicious versions carried valid SLSA Build Level 3 attestations β signed using the stolen OIDC token during a legitimate Sigstore signing ceremony. Downstream projects that check attestation presence (the standard SLSA verification step) would see these packages as trusted. Why vibe coders are especially exposed: AI coding assistants recommend@tanstack/react-query,@tanstack/router, and@mistralai/mistral-clientin virtually every modern React and AI integration project. Any vibe-coded project initialized after May 11 with these packages at latest versions was potentially affected. Immediate actions: (1) Pin@tanstack/*to the last known-good version before May 11 in your lock file; (2) Audit attestation signer identity β not just presence β usinggh attestation verifywith explicit expected signer; (3) Enable npm's--dry-runand Sigstore transparency log monitoring for all new installs; (4) Move to a private registry proxy with allow-listing for critical packages. Full attestation integrity verification checklist: see Chapter 17, Prompt 17.252.The Shai-Hulud attack has a second, under-reported dimension: it was an AI ecosystem attack. Both TanStack (the most common React data layer in AI-assisted apps) and Mistral (the API client for a major AI model provider) were targeted simultaneously β not by coincidence. The vibe coding community's standardized tool choices create a concentrated attack surface. When every Claude Code and Cursor project uses the same five packages, compromising those packages is a force multiplier attack on the entire developer ecosystem.
380,000 Corporate Assets Exposed by Vibe-Coding Tool Defaults
Security researchers in May 2026 disclosed a dataset of approximately 380,000 publicly accessible corporate assets β including healthcare records, financial data, and live API credentials β originating from projects built on AI coding platforms. The root cause: insecure default configurations in vibe-coded apps where the AI tools prioritized working quickly over secure-by-default settings.
🚫The Vibe-Coding Default Configuration Crisis: The 380K exposure is not attributable to any single tool or any single vulnerability. It represents a systemic pattern: AI coding assistants scaffold applications with configurations that work (for development and demo purposes) but are not production-safe. Supabase Row Level Security disabled by default for speed. S3 buckets created public for easy sharing.NEXT_PUBLIC_env vars used for API keys that should never reach the client. Auth middleware not applied to all routes. The AI tools that generate these patterns were optimizing for the stated goal β build a working app fast β and the security defaults required for production were out of scope for the prompt.The exposure pattern has five recurring root causes observed across the 380K assets:
Root Cause Frequency Example Supabase RLS disabled 34% of cases Tables created for MVP with ENABLE ROW LEVEL SECURITYnever addedPublic S3/R2/GCS buckets 28% AI scaffolds storage with public access for file upload demos Client-side secrets 21% NEXT_PUBLIC_prefix on API keys, database URLs, service tokensMissing auth middleware 12% Dashboard routes not covered by Next.js middleware matcher Demo data in production 5% Seeded test records with real-format PII left in production DB The pattern is predictable: an AI tool builds an MVP quickly, the developer ships it (perhaps even using the same AI tool to deploy), and the dev-safe defaults that were fine on localhost become production exposures at scale. See Chapter 17, Prompt 17.253 for a comprehensive audit checklist to detect all five patterns in your own vibe-coded applications before they reach the 380K statistic.
💡Pre-Deploy Security Checklist (30 minutes): Before every production deployment of a vibe-coded application, run through the Chapter 19 Security Playbook checklist. The five patterns above are detectable in under 30 minutes with Claude Code β search for RLS policies, bucket permissions,NEXT_PUBLIC_secrets, middleware coverage, and demo data. The cost of finding these before deploy is 30 minutes. The cost of finding them in a 380K-scale breach report is significantly higher.The regulatory signal is worth noting. ACM warnings historically precede formal standards and, eventually, regulatory requirements. The EU AI Act's high-risk category definitions are already being interpreted to include AI-assisted code in critical infrastructure. Teams that establish rigorous review practices now will be ahead of the compliance curve.
11. The Great Debate
The software community is deeply divided. Understanding the strongest arguments on each side helps you form a nuanced view.
Programming languages have always moved toward higher abstraction. Assembly to C to Python. Each level lets developers focus on intent rather than implementation. Natural language is simply the next layer.
#### "It democratizes creation."
Millions of people have software ideas but lack years of training. Vibe coding lets a nurse build a patient tracking app, a teacher build a classroom tool, a small business owner build inventory management. The expansion of who can create software is historically significant.
#### "The speed advantage is transformative."
A prototype in hours instead of weeks. An MVP in days instead of months. The 25% of YC companies with 95% AI code didn't choose vibe coding for ideology β they chose it because they needed to move fast.
#### "Traditional code isn't as reliable as we pretend."
Human-written code has bugs, security vulnerabilities, and technical debt too. AI-generated code may have different failure modes, but the idea that human code is inherently reliable is a myth.
Software spending is ~60% maintenance. If nobody understands the codebase, maintenance is impossible. You're not saving time β you're borrowing it from the future at a ruinous interest rate.
#### "Security requires understanding, not just testing."
You can test whether a login form works. You can't easily test whether passwords are properly hashed, session tokens are cryptographically secure, or APIs have rate limiting β unless you read the code.
#### "It creates learned helplessness."
Developers who rely entirely on vibe coding lose fundamental skills. When the AI makes a mistake in a novel way, they have no fallback. Fragile teams build fragile systems.
#### "The economics don't work at scale."
Vibe coding is cheap upfront and expensive later. The $1.5 trillion tech debt projection isn't speculation β it's extrapolation from observed code churn, duplication, and architectural degradation.
The most reasonable position β and the one supported by data β is that vibe coding is a powerful tool with a specific and limited appropriate scope.
<div class="callout success">
<div class="callout-icon">✅</div>
<div class="callout-content">
**It excels for:** prototyping, validation, personal tools, learning, hackathons, and small-scale applications with limited security requirements.
</div>
</div>
<div class="callout danger">
<div class="callout-icon">❌</div>
<div class="callout-content">
**It fails for:** production systems at scale, security-sensitive applications, regulated industries, and software that needs multi-year maintenance.
</div>
</div>
**The winning model in 2026:** Vibe code the prototype, then bring in disciplined engineering for the production system. The companies dominating right now β the ones raising at $10B valuations, the ones with $1B ARR in six months β are all betting that this model scales. And the data supports them.
The critics are not wrong about the risks. But they are wrong about the trajectory. Every objection to vibe coding was once made about high-level languages, about frameworks, about cloud computing. The abstraction always wins. The question is never *whether* but *how*.
12. When to Vibe (and When Not To)
🟢 Green Light: Vibe Code Away
- **Prototypes and MVPs** β Validate ideas before investing in production engineering - **Internal tools** β Dashboards, data scripts, one-off analysis - **Personal projects** β Only you use it, only you depend on it - **Learning** β Trying new frameworks, languages, or patterns - **Hackathons** β Speed is everything, longevity is nothing - **UI prototyping** β Design exploration and layout testing - **Automation scripts** β Repetitive tasks that eat your time🟠 Yellow Light: Proceed with Caution
- **Customer-facing apps** β Vibe the prototype, then review and harden - **Small SaaS** β Viable for launch, plan for rewrite - **API integrations** β Fast to build, auth needs human review - **Mobile apps** β UI can be vibe coded; data/security need attention - **Team projects** β Works if one person understands the architecture🔴 Red Light: Don't Vibe Code
- **Financial systems** β Payments, accounting, trading - **Healthcare** β Patient data, clinical decisions, HIPAA - **Auth & authz** β Login systems, permissions, tokens - **Infrastructure** β Server config, network security, deployment - **Regulated industries** β SOX, PCI-DSS, GDPR compliance - **Distributed systems** β Microservices, message queues, cache invalidation - **Cryptography** β Encryption, key management, certificates13. Mastering the Craft: Advanced Techniques
If you're going to vibe code, do it well. These techniques separate productive vibe coders from frustrated ones.
The Art of the Initial Prompt
The single most important factor in vibe coding success. Spend 30 minutes writing a comprehensive description before generating a single line of code.
Weak vs. Strong Prompts
Key Patterns
```
Working: dashboard + project cards + drag-and-drop -> Save/commit BEFORE adding: task checklist feature
</div>
</div>
<div class="expand-section">
<button class="expand-header" onclick="this.parentElement.classList.toggle('open')">
<span class="expand-arrow">▶</span> The "Explain Then Generate" Pattern
</button>
<div class="expand-body">
For complex features, ask the AI to explain its approach before generating code:
```
Before writing any code, explain how you would implement
real-time collaborative editing in this application.
What approach? What trade-offs? Then implement it.
This gives you architectural understanding even in a vibe coding workflow.
</div>
- **Claude Opus 4.6 (via Claude Code)** β Complex reasoning, architecture, large codebases, agent teams for parallel work
GPT-5.2 (via Codex CLI) β Code generation, systematic transformations, sandboxed execution
Gemini 3 Pro / Flash (via Jules or Gemini CLI) β Multimodal (screenshots, diagrams), open-source CLI with skills system
GitHub Copilot Agent Mode β Best for working within existing VS Code workflows with agent capabilities
v0 β React/Next.js UI generation
Bolt.new β Full-stack prototypes you want immediately
**Good:** "When I click 'Add Task', nothing happens. Console shows: `TypeError: Cannot read property 'push' of undefined at TaskList.addTask (app.js:47)`. This started after I added drag-and-drop."
Include: **action** (what you did), **actual** (what happened), **expected** (what should happen), **error** (verbatim), **context** (what changed recently).
14. Building a Sustainable Workflow
Pure vibe coding is fast but fragile. Here's how to build a workflow that's both fast and sustainable.
Vibe code the 80% (UI, boilerplate, standard patterns).
Engineer the 20% (auth, business logic, data integrity, security).
15. The Business of Vibes
Vibe coding isn't just changing how software is built. It's changing the economics of software businesses.
The New Cost Structure
<p style="margin-top:1rem;"><em>This doesn't mean you never need engineers. It means you can validate before investing.</em></p>
The New Archetypes
The Talent Shift
Companies are increasingly hiring for:
Specification specialists β translating business requirements into precise AI prompts
System architects β designing overall structure that AI agents implement
Security engineers β the human review layer catching what AI misses
AI-fluent developers β working effectively with and reviewing AI-generated code
Browse 670+ open AI/LLM positions at LLMHire β the dedicated job board for AI engineers, ML researchers, and prompt engineers.
16. What Comes Next
Now (Early 2026) β Already Happening
AI-native development is the default. 84% of developers use AI tools. The question has shifted from "should we use AI?" to "how do we use it safely?"
Agent teams are here. Claude Code's agent teams feature lets multiple AI agents work in parallel on different aspects of a project. This is the beginning of true AI-human hybrid teams.
The open-source crisis. A January 2026 arXiv paper argues vibe coding threatens the open-source ecosystem: users no longer visit docs, file bugs, or engage with maintainers. Tailwind CSS docs traffic down 40%. Stack Overflow questions in structural decline. How maintainers get paid must change.
Multimodal coding emerges. Voice-driven coding, visual programming interfaces, and screenshot-to-code workflows are entering mainstream tools.
Consolidation is accelerating. The Windsurf saga β a $3B acquisition attempt, Microsoft blocking, Google poaching, Cognition acquiring β signals a market entering its consolidation phase. Wix acquired Base44 for $80M cash. Anthropic acquired Bun.
"Agentic engineering" replaces "vibe coding" for professionals. Karpathy himself has moved beyond the term, now advocating for professionals orchestrating AI agents with oversight, not just vibes.
The IDEsaster wake-up call. 30+ vulnerabilities across every major AI IDE, 24 CVEs, 1.8M developers at risk. AI code is 2.74x more likely to introduce XSS than human code.
AI reviews AI code. Anthropic launched Code Review (March 9, 2026) β a multi-agent system inside Claude Code that automatically catches logic errors in AI-generated code. The "who reviews the reviewer" problem now has a commercial answer.
Claude becomes the enterprise default. Anthropic committed $100 million to the Claude Partner Network (March 12β13, 2026), formalizing partnerships with Accenture, Deloitte, Cognizant, and Infosys. Enterprise AI standardization is no longer theoretical.
Anthropic hits $380B valuation β Claude #1 on App Store. After refusing Pentagon weapons AI contracts, Anthropic became the most disruptive company in the world (TIME, March 2026). Claude overtook ChatGPT as the #1 app on Apple's App Store. The safety-first bet paid off.
Agent documentation tooling matures. DeepLearning.AI (Andrew Ng's team) released Context Hub (March 9, 2026) β an open-source CLI tool that gives coding agents real-time access to current API docs, bridging the gap between training cutoffs and fast-moving APIs.
Near-Term (Late 2026)
- Security tooling catches up. Agentic security tools reviewing AI code in real-time. "Move security into the act of creation."
Standardization emerges. Enterprise governance frameworks for AI-generated code.
Agent orchestration matures. Specialized agents for frontend, backend, testing, security working in concert under a lead agent.
Open-source funding models evolve. New models for compensating maintainers whose libraries power AI-generated code.
Medium-Term (2027-2028)
- Natural language becomes a programming interface. Not replacing code, but a legitimate authoring medium.
AI-human hybrid teams are standard. Every team includes both human engineers and AI agents with defined roles.
The maintenance problem gets addressed. AI tools that understand, refactor, and improve AI-generated code.
Specialized domain models. Finance, healthcare, embedded β each gets domain-specific AI models.
Long-Term (2029+)
- Intent-driven development. Describe outcomes, constraints, quality attributes. AI handles the rest.
Self-healing software. Applications that detect bugs in production and fix themselves.
The abstraction continues. The role evolves from "code author" to "system designer and quality guardian."
🔮**The fundamental question:** AI will write an increasing share of the world's software. The question isn't whether β it's how we ensure it's secure, reliable, and maintainable. The developers who thrive will master both modes: vibe code a prototype on Saturday, architect a production system on Monday.Conclusion
In twelve months, vibe coding went from a tweet to a dictionary entry to a multi-billion-dollar industry. Cursor alone is valued at $29.3 billion. Lovable at $6.6 billion. A vibe-coded startup sold for $80 million. GitHub Copilot has 4.7 million paid subscribers. Now, in early 2026, it has become the defining methodology of a new era in software development.The numbers speak for themselves: Claude Code reached $1B ARR in six months. Cursor surpassed $1B ARR at a $29.3B valuation. Devin surpassed $155M ARR at a $10.2B valuation. GitHub Copilot crossed 4.7 million paid users. These are not experimental products. This is the new infrastructure of software creation.
The promise is real and accelerating: agent teams working in parallel, multimodal coding interfaces, and tools so capable that 75% of Replit's AI users write zero code themselves. The barrier between idea and working software has never been lower.
The challenges are evolving too: the open-source ecosystem faces an existential funding question, security remains a real concern with 69 vulnerabilities found across just 15 AI-built apps, and the "vibe coding hangover" of unmaintainable codebases is a documented phenomenon.
But the answer has become clear. Vibe coding is not a fad to be dismissed or a silver bullet to be worshipped. It is a powerful methodology that belongs in every developer's toolkit. The developers who thrive in 2026 and beyond will be those who master the spectrum β knowing when to vibe code a prototype on Saturday, when to collaborate with agents on Monday, and when to insist on human-reviewed engineering for the critical 20%.
The vibes are real. The exponentials are real. The opportunity is unprecedented.
Embrace the vibes. Engineer the foundations. Build the future.
Chapter 17: The Complete Prompt Library
230+ production-ready prompts for every stage of AI-native development. Updated monthly.
How to Use This Library
Each prompt is tagged with:
- Difficulty: Beginner / Intermediate / Advanced / Expert
- Tool: Which AI tools it works best with
- Time: Expected completion time
- Category: What type of work it handles
The prompts are designed to be copy-pasted directly. Customize the bracketed [sections] for your specific project.
Category 1: Project Kickoff Prompts
1.1 The Complete Spec Prompt (Expert)
Tool: Claude Code, Cursor Composer | Time: 30-60 min generation
I'm building [product name], a [type of application] for [target audience].
## Product Vision
[One-sentence description of what this product does and why it matters]
## Target Users
- Primary: [who, age range, technical skill level, key pain point]
- Secondary: [who, why they'd use it]
## Core Features (MVP - Priority Order)
1. [Feature 1]: [User story: "As a [user], I want to [action] so that [benefit]"]
2. [Feature 2]: [User story]
3. [Feature 3]: [User story]
## Data Model
- [Entity 1]: [fields and types]
- [Entity 2]: [fields and types]
- Relationships: [Entity 1] has many [Entity 2], etc.
## Design Direction
- Style: [modern/minimal/playful/corporate/brutalist]
- Color palette: [primary hex, accent hex, background]
- Typography: [sans-serif/serif/mono, reference sites]
- Layout: [single page / multi-page / dashboard / wizard]
- Responsive: [mobile-first / desktop-first / both]
## Technical Stack
- Framework: [Next.js / React / Vue / Svelte / vanilla]
- Styling: [Tailwind / CSS Modules / styled-components]
- Database: [Supabase / Firebase / localStorage / Prisma+PostgreSQL]
- Auth: [Supabase Auth / NextAuth / Clerk / none]
- Hosting: [Vercel / Netlify / Railway]
## What Success Looks Like
- A user can [core workflow] in under [N] steps
- The app loads in under [N] seconds
- [Specific measurable outcome]
## What This Is NOT
- Not a [common misunderstanding]
- Don't include [feature to avoid]
- Don't over-engineer [aspect]
Build the complete MVP. Start with the data model, then core layout, then features in priority order.
1.2 The Weekend Prototype Prompt (Beginner)
Tool: Bolt.new, Lovable, Replit Agent | Time: 15-30 min
Build a [type of app] that solves this problem: [describe the pain point in one sentence].
The main user is [who] and they need to:
1. [Core action 1]
2. [Core action 2]
3. [Core action 3]
Design: Clean and modern. Use [color] as the accent color. Dark mode preferred.
Store data in localStorage.
Make it work on mobile.
Keep it simple. I'd rather have 3 features that work perfectly than 10 that are buggy.
1.3 The "Clone This" Prompt (Intermediate)
Tool: Cursor, Claude Code | Time: 1-2 hours
Build a simplified version of [well-known app, e.g., Trello/Notion/Slack].
Include ONLY these features from the original:
1. [Feature to clone]
2. [Feature to clone]
3. [Feature to clone]
DO NOT include: [features to skip]
Match the general layout and UX patterns of the original but use your own design.
Use [tech stack]. Deploy-ready for Vercel.
Focus on making the core interaction feel as smooth as the original.
1.4 The Landing Page Prompt (Beginner)
Tool: v0, Bolt.new | Time: 15-30 min
Create a conversion-optimized landing page for [product name].
Product: [One line description]
Target audience: [Who would buy this]
Price: [Price point or "Free"]
Sections (in order):
1. Hero: Headline "[compelling headline]", subheadline "[supporting text]", CTA button "[button text]"
2. Problem: 3 pain points the audience faces
3. Solution: How the product solves each pain point (with icons or illustrations)
4. Social proof: [testimonials / stats / logos / "As seen in"]
5. Features: 3-6 key features with brief descriptions
6. Pricing: [pricing tiers if applicable]
7. FAQ: 4-5 common questions with answers
8. Final CTA: Repeat the main call-to-action
Design: Professional, trustworthy. Primary color [hex]. Lots of whitespace.
Mobile-responsive. Fast-loading (no heavy images).
Include Open Graph meta tags for social sharing.
Category 2: Feature Addition Prompts
2.1 Authentication System (Advanced)
Tool: Claude Code, Cursor | Time: 1-2 hours
Add a complete authentication system to this [framework] application.
Requirements:
- Email/password signup with email verification
- Login with session management (HTTP-only cookies, not localStorage)
- Password requirements: minimum 8 chars, 1 uppercase, 1 number, 1 special char
- "Forgot password" flow with email reset link (expires in 1 hour)
- "Remember me" option (extends session to 30 days, default is 24 hours)
- Rate limiting: max 5 failed attempts per IP per 15 minutes, then 30-min lockout
- CSRF protection on all auth forms
- Secure headers: HSTS, X-Content-Type-Options, X-Frame-Options
Auth provider: [Supabase Auth / NextAuth / Clerk / custom JWT]
Protected routes: [list routes that require auth]
Public routes: [list routes that don't require auth]
After login, redirect to [dashboard/home/previous page].
Show clear error messages for: wrong password, account not found, account locked, email not verified.
Write tests for: successful login, failed login, signup validation, session expiry, rate limiting.
2.2 Payment Integration (Advanced)
Tool: Claude Code | Time: 2-3 hours
Add [Stripe / Paddle] subscription billing to this application.
Products:
- Free tier: [what's included, usage limits]
- Pro tier: $[price]/month - [what's included]
- [Optional: Enterprise tier: $[price]/month - [what's included]]
Implementation:
1. Pricing page showing all tiers with feature comparison
2. Checkout flow: user selects plan -> [Stripe Checkout / Paddle Overlay] -> redirect to success page
3. Webhook handler for: subscription.created, subscription.updated, subscription.cancelled, invoice.payment_failed
4. User dashboard showing: current plan, next billing date, usage this period, upgrade/downgrade buttons
5. Usage tracking: count [what metric] per billing period, enforce limits on free tier
6. Graceful downgrade: when subscription cancelled, access continues until period end
7. Failed payment handling: 3 retry attempts over 7 days, then downgrade to free
Store subscription status in [Supabase / database].
Add middleware to check subscription status on protected API routes.
Show upgrade prompts when free users hit limits.
Environment variables needed:
- [STRIPE_SECRET_KEY / PADDLE_API_KEY]
- [STRIPE_WEBHOOK_SECRET / PADDLE_WEBHOOK_SECRET]
- [STRIPE_PRO_PRICE_ID / PADDLE_PRO_PRICE_ID]
2.3 Real-Time Features (Advanced)
Tool: Claude Code, Cursor | Time: 2-4 hours
Add real-time [collaboration / notifications / live updates] to this application.
What should update in real-time:
- [Specific data that changes: "new messages", "task status changes", "user presence"]
Technology: [Supabase Realtime / Socket.io / Pusher / Server-Sent Events]
Requirements:
- Changes made by User A appear for User B within [1 second / 500ms]
- Show [typing indicators / presence dots / live cursors] for active users
- Handle disconnection gracefully: show "reconnecting..." banner, auto-reconnect with exponential backoff
- Dedup messages that arrive during reconnection
- Don't poll - use persistent connections
- Fallback to polling if WebSocket connection fails
Optimize for:
- [N] concurrent users per [room / document / channel]
- Messages/updates of approximately [size] bytes each
- Mobile networks with intermittent connectivity
Show connection status indicator (green dot = connected, yellow = reconnecting, red = offline).
2.4 Search and Filter System (Intermediate)
Tool: Any | Time: 30-60 min
Add search and filtering to the [items/products/posts] list in this application.
Search:
- Full-text search across: [field 1], [field 2], [field 3]
- Debounced input (300ms delay before searching)
- Show "X results for 'query'" count
- Highlight matching text in results
- Empty state: "No results for 'query'. Try different keywords."
Filters:
- [Filter 1]: [type: dropdown/checkbox/range] with options [list options]
- [Filter 2]: [type] with options [list options]
- [Filter 3]: [type] with options [list options]
- Date range: from/to date pickers
- Sort by: [option 1 / option 2 / option 3], ascending/descending
Behavior:
- Filters combine with AND logic (search + filter1 + filter2)
- Show active filter count as badge on filter button
- "Clear all filters" button when any filter is active
- URL params reflect current filters (shareable filtered views)
- Persist last-used filters in localStorage
Performance:
- Client-side filtering for under 1000 items
- Server-side (API) filtering for larger datasets
- Show loading skeleton while filtering
Category 3: UI/UX Prompts
3.1 Dashboard Layout (Intermediate)
Tool: v0, Cursor | Time: 30-60 min
Build a dashboard layout for [application type].
Layout:
- Left sidebar: navigation menu (collapsible on mobile, icons + labels)
- Top bar: user avatar + dropdown menu, notification bell with count badge, search bar
- Main content area: responsive grid that adapts from 1 to 3 columns
Sidebar navigation items:
1. [Icon] Dashboard (home)
2. [Icon] [Section 1]
3. [Icon] [Section 2]
4. [Icon] [Section 3]
5. [Icon] Settings
6. [Icon] Help
Dashboard home shows:
- Row 1: 4 stat cards ([Metric 1]: [value], [Metric 2]: [value], etc.)
- Row 2: Main chart (line chart showing [metric] over [time period]) + recent activity feed
- Row 3: Quick actions grid (3-4 action cards with icons)
Design: [light/dark] theme. Accent color: [hex].
Use Tailwind CSS. Smooth transitions on sidebar toggle.
Mobile: sidebar becomes a hamburger drawer overlay.
3.2 Form with Validation (Beginner)
Tool: Any | Time: 15-30 min
Build a multi-step form for [purpose, e.g., "user onboarding", "job application", "event registration"].
Steps:
1. [Step name]: Fields: [field1 (type, required?), field2, field3]
2. [Step name]: Fields: [field4, field5, field6]
3. [Step name]: Review all entered data + submit button
Validation:
- Email: valid format + show error immediately on blur
- Phone: format as (XXX) XXX-XXXX as user types
- Required fields: show red border + error message
- [Custom validation]: [describe rule]
UX:
- Progress indicator showing current step (1/3, 2/3, 3/3)
- "Back" and "Next" buttons (Next disabled until current step is valid)
- "Save as draft" option (localStorage)
- Smooth slide transition between steps
- Auto-focus first field on each step
- Show success animation on submit
Accessible: proper labels, aria attributes, keyboard navigation (Tab through fields, Enter to submit).
3.3 Data Table (Intermediate)
Tool: Any | Time: 30-60 min
Build a data table component for displaying [data type, e.g., "user list", "order history", "inventory"].
Columns:
1. [Column]: [type: text/number/date/status/avatar] - [width: narrow/medium/wide]
2. [Column]: [type] - [width]
3. [Column]: [type] - [width]
4. Actions: Edit, Delete, [custom action]
Features:
- Sort by clicking column headers (asc/desc, show arrow indicator)
- Select rows with checkboxes (select all, bulk actions)
- Inline editing: click cell to edit, Enter to save, Escape to cancel
- Pagination: 10/25/50 per page selector, page numbers, total count
- Responsive: on mobile, switch to card layout (one card per row)
- Empty state: illustration + "No [items] yet. Create your first one."
- Loading state: skeleton rows while data loads
Styling: Clean borders, alternating row colors, hover highlight.
Status column: colored badges (green=active, yellow=pending, red=inactive).
Category 4: API and Backend Prompts
4.1 REST API Scaffold (Advanced)
Tool: Claude Code | Time: 1-2 hours
Build a REST API for [application] with these resources:
Resources:
1. [Resource 1, e.g., "Users"]:
- Fields: [id, name, email, role, created_at, updated_at]
- Endpoints: GET /api/users, GET /api/users/:id, POST /api/users, PUT /api/users/:id, DELETE /api/users/:id
2. [Resource 2]:
- Fields: [list fields]
- Endpoints: [list CRUD endpoints]
- Relationships: [belongs_to Resource1, has_many Resource3]
Response format (all endpoints):
Success: { data: {...}, meta: { page, limit, total } }
Error: { error: { code: "VALIDATION_ERROR", message: "Email is required", details: [...] } }
Requirements:
- Input validation with descriptive error messages
- Pagination: ?page=1&limit=20 (default limit=20, max=100)
- Filtering: ?status=active&role=admin
- Sorting: ?sort=created_at&order=desc
- Rate limiting: 100 requests per minute per IP
- CORS configured for [allowed origins]
- Request logging (method, path, status, duration)
Auth: Bearer token in Authorization header.
- Public endpoints: [list]
- Authenticated endpoints: [list]
- Admin-only endpoints: [list]
Framework: [Next.js API routes / Express / Fastify / Hono]
Database: [Supabase / Prisma / Drizzle]
4.2 Database Schema Design (Advanced)
Tool: Claude Code | Time: 30-60 min
Design a database schema for [application type].
Entities:
1. [Entity 1]: [description of what it represents]
- Required fields: [list]
- Optional fields: [list]
- Unique constraints: [list]
2. [Entity 2]: [description]
- Fields: [list]
- References: [Entity 1] (one-to-many / many-to-many)
Business rules:
- [Rule 1, e.g., "A user can only have one active subscription"]
- [Rule 2, e.g., "Orders must have at least one line item"]
- [Rule 3, e.g., "Soft delete for users, hard delete for sessions"]
Generate:
1. SQL migration file with CREATE TABLE statements
2. Indexes for common query patterns: [list queries, e.g., "find users by email", "get orders by date range"]
3. Row-level security policies (if Supabase)
4. Seed data: 10-20 realistic sample records per table
5. TypeScript types matching the schema
Optimize for: [read-heavy / write-heavy / balanced]
Database: [PostgreSQL / MySQL / SQLite]
Category 5: Testing and Quality Prompts
5.1 Comprehensive Test Suite (Advanced)
Tool: Claude Code | Time: 2-4 hours
Write a comprehensive test suite for this [application/module].
Testing framework: [Vitest / Jest / Playwright / Cypress]
Coverage targets:
- Unit tests: all utility functions and business logic (aim for 90%+)
- Integration tests: all API endpoints (happy path + error cases)
- Component tests: all interactive components (user events + state changes)
- E2E tests: [list 3-5 critical user flows]
For each test, include:
- Clear descriptive name: "should [expected behavior] when [condition]"
- Arrange-Act-Assert structure
- Realistic test data (not "test123" or "foo bar")
- Error case coverage (invalid input, timeout, auth failure)
- Edge cases ([list specific edge cases for this app])
Mock strategy:
- External APIs: mock with [MSW / jest.mock / vi.mock]
- Database: use [test database / in-memory / fixtures]
- Time-dependent tests: mock Date.now()
- File system: use temp directories
Run the complete suite after writing. Fix any failures.
Generate a coverage report.
5.2 Security Audit Prompt (Expert)
Tool: Claude Code | Time: 1-2 hours
Perform a security audit of this codebase. Check for:
1. Authentication & Authorization:
- Are passwords hashed with bcrypt/argon2 (not MD5/SHA)?
- Are sessions stored securely (HTTP-only cookies, not localStorage)?
- Is CSRF protection implemented on state-changing requests?
- Are API keys and secrets in environment variables (not hardcoded)?
- Are authorization checks on every protected endpoint (not just frontend)?
2. Input Validation:
- Is all user input validated server-side (not just client-side)?
- Are SQL queries parameterized (no string concatenation)?
- Is HTML output sanitized to prevent XSS?
- Are file uploads validated (type, size, name)?
- Are URL redirects validated against an allowlist?
3. Data Protection:
- Is sensitive data encrypted at rest?
- Is HTTPS enforced (HSTS headers)?
- Are API responses filtered (no password hashes, internal IDs leaking)?
- Is PII handled according to GDPR/CCPA requirements?
- Are error messages generic (no stack traces to users)?
4. Infrastructure:
- Are dependencies up to date (no known CVEs)?
- Are security headers set (CSP, X-Frame-Options, etc.)?
- Is rate limiting configured on auth and API endpoints?
- Are CORS origins restricted (not "*")?
- Are logs sanitized (no passwords or tokens in logs)?
For each issue found:
- Severity: Critical / High / Medium / Low
- Location: file path and line number
- Description: what's wrong and why it matters
- Fix: specific code change to resolve it
- Test: how to verify the fix works
Prioritize fixes by severity. Implement Critical and High fixes immediately.
Category 6: Refactoring and Optimization Prompts
6.1 Performance Optimization (Advanced)
Tool: Claude Code | Time: 1-2 hours
This application is slow. Analyze and optimize performance.
Symptoms:
- [Specific symptom: "initial page load takes 4+ seconds"]
- [Specific symptom: "scrolling is janky with 500+ items"]
- [Specific symptom: "API response takes 2+ seconds"]
Investigate and fix:
1. Bundle size: analyze with [next/bundle-analyzer or similar], remove unused dependencies, implement code splitting
2. Rendering: identify unnecessary re-renders, add React.memo/useMemo/useCallback where appropriate
3. Data fetching: implement caching, pagination, reduce payload sizes
4. Images: lazy load below-fold images, use next/image or responsive srcset, serve WebP
5. Database: add missing indexes, optimize N+1 queries, implement connection pooling
6. Network: enable gzip/brotli, set proper cache headers, minimize HTTP requests
For each optimization:
- Before: [metric measurement]
- After: [expected improvement]
- Method: [specific code change]
Run Lighthouse audit before and after. Target scores: Performance >90, Accessibility >95.
6.2 Code Cleanup (Intermediate)
Tool: Claude Code, Cursor | Time: 1-2 hours
Clean up this codebase without changing any functionality.
Tasks:
1. Remove dead code: unused imports, unreachable functions, commented-out blocks
2. Consolidate duplicated logic: find similar code patterns and extract shared utilities
3. Fix naming: rename variables/functions that don't describe their purpose
4. Organize file structure: group related files, consistent naming conventions
5. Add TypeScript types: replace 'any' with proper types, add interfaces for data shapes
6. Fix linting issues: run [ESLint / Prettier] and fix all warnings/errors
7. Update dependencies: check for outdated packages, update non-breaking versions
8. Add JSDoc comments to exported functions (not internal helpers)
Rules:
- Make small, focused commits (one type of change per commit)
- Run tests after each change to ensure nothing breaks
- Don't refactor code that has pending changes or open PRs
- Keep the diff readable: don't auto-format unrelated files
Category 7: Deployment and DevOps Prompts
7.1 Production Deployment Checklist (Advanced)
Tool: Claude Code | Time: 1-2 hours
Prepare this application for production deployment on [Vercel / AWS / Railway].
Pre-deployment checklist:
1. Environment variables: create .env.example with all required vars (no values), verify all are set in [hosting platform]
2. Error tracking: set up [Sentry / LogRocket / Bugsnag] for runtime error monitoring
3. Analytics: add [Vercel Analytics / Google Analytics / Plausible] for usage tracking
4. SEO: verify meta tags, Open Graph, Twitter cards, sitemap.xml, robots.txt
5. Performance: run Lighthouse, fix any scores below 80
6. Security: run npm audit, fix critical/high vulnerabilities, verify security headers
7. Database: verify connection pooling, set up backups if applicable
8. Caching: configure CDN caching headers, implement stale-while-revalidate for API routes
9. Monitoring: set up uptime monitoring (e.g., UptimeRobot, Checkly)
10. Domain: configure custom domain, SSL, www redirect
Create a deployment script or CI/CD pipeline that:
- Runs tests
- Runs linter
- Builds the application
- Deploys to [platform]
- Runs smoke tests against the deployed URL
- Notifies [Slack / Discord / email] on success/failure
Category 8: AI Agent Orchestration Prompts (Expert)
8.1 Multi-Agent Task Decomposition
Tool: Claude Code (subagents) | Time: 2-4 hours
I need to [describe large task, e.g., "add a complete user profile system with settings, avatar upload, activity history, and notification preferences"].
Decompose this into subtasks that can be worked on in parallel:
1. Data layer: schema changes, migrations, API endpoints
2. UI components: form components, display components, layouts
3. Business logic: validation rules, permission checks, notification triggers
4. Tests: unit tests, integration tests, E2E tests
For each subtask:
- Define the interface/contract (inputs, outputs, data shapes)
- List dependencies on other subtasks
- Identify which can run in parallel vs. must be sequential
Then implement each subtask, integrating them at the defined interfaces.
Run the full test suite after integration to catch any contract mismatches.
8.2 Codebase Analysis and Improvement Plan
Tool: Claude Code | Time: 1-2 hours
Analyze this entire codebase and create an improvement plan.
Evaluate:
1. Architecture: Is the structure scalable? Are concerns properly separated?
2. Code quality: Consistency, readability, duplication, complexity (cyclomatic)
3. Error handling: Are errors caught, logged, and presented well?
4. Testing: Coverage, quality of tests, missing edge cases
5. Security: Common vulnerabilities (OWASP Top 10 applicable ones)
6. Performance: Obvious bottlenecks, missing optimizations
7. Developer experience: Build time, hot reload, debugging ease
Output:
- Score each category 1-10 with specific evidence
- Top 5 improvements ranked by impact/effort ratio
- Specific action items for each improvement
- Estimated time for each action item
Don't fix anything yet. Just analyze and plan.
Category 9: Content and Data Prompts
9.1 Seed Data Generator (Beginner)
Tool: Any | Time: 15-30 min
Generate realistic seed data for this application.
Data needed:
- [N] [entity type, e.g., "users"] with: [fields]
- [N] [entity type, e.g., "products"] with: [fields]
- [N] [entity type, e.g., "orders"] with: [fields]
Rules:
- Use realistic names (not "Test User 1")
- Dates spread across the last [time period]
- Prices/amounts in realistic ranges for [industry]
- Status distribution: [e.g., "60% active, 30% pending, 10% cancelled"]
- Include edge cases: [e.g., "one user with no orders, one product with 0 stock"]
- Relationships should be consistent (orders reference real user IDs and product IDs)
Output format: [JSON / SQL INSERT statements / TypeScript constants / CSV]
9.2 API Documentation Generator (Intermediate)
Tool: Claude Code | Time: 30-60 min
Generate comprehensive API documentation for all endpoints in this application.
For each endpoint, document:
- Method and path (e.g., GET /api/users/:id)
- Description (one sentence)
- Authentication required? (yes/no, what type)
- Request: headers, query params, body schema with types and validation rules
- Response: status codes, body schema for success and each error case
- Example request (curl command)
- Example response (JSON)
Format: [Markdown / OpenAPI 3.0 spec / Swagger]
Include a table of contents.
Group endpoints by resource.
Add rate limiting info if applicable.
Category 10: Platform-Specific Prompts
10.1 Chrome Extension (Advanced)
Tool: Claude Code | Time: 2-4 hours
Build a Chrome Extension (Manifest V3) that [core functionality].
Features:
- Popup: [describe popup UI and what it shows]
- Content script: [what it does on web pages, e.g., "highlights [elements]"]
- Background service worker: [what it handles, e.g., "API calls, storage sync"]
- Options page: [settings the user can configure]
Permissions needed: [activeTab, storage, tabs, etc. - minimize permissions]
Storage:
- Use chrome.storage.sync for: [settings that sync across devices]
- Use chrome.storage.local for: [data that stays local]
Communication:
- Content script <-> Background: chrome.runtime.sendMessage
- Popup <-> Background: direct access to chrome.storage
Include:
- manifest.json with all required fields
- Icon set (16x16, 48x48, 128x128) - use simple colored SVG converted to PNG
- README with installation instructions (load unpacked)
- Privacy policy text (required for Chrome Web Store submission)
Test on these sites: [list 3-5 target websites]
10.2 CLI Tool (Intermediate)
Tool: Claude Code | Time: 1-2 hours
Build a command-line tool in [Node.js / Python / Go / Rust] that [core functionality].
Commands:
- [tool] init: [what it sets up]
- [tool] [command 1] [args]: [what it does]
- [tool] [command 2] [args]: [what it does]
- [tool] --help: show all commands with descriptions
Features:
- Colored output (green for success, red for errors, yellow for warnings)
- Progress bars for long operations
- Interactive prompts for required input (with defaults)
- Config file (~/.toolrc or .toolrc in project root)
- --verbose flag for debug output
- --json flag for machine-readable output
- Meaningful exit codes (0 success, 1 error, 2 usage error)
Error handling:
- Clear error messages with suggested fixes
- Never show stack traces (unless --verbose)
- Graceful handling of Ctrl+C
Package for distribution via [npm / pip / brew / cargo].
Include README with installation, usage examples, and config reference.
Prompt Patterns Reference Card
The Constraint Sandwich
Do [action].
Include: [must-have list]
Do NOT include: [exclusion list]
Match existing: [patterns/styles to follow]
The Iterative Refinement
[After seeing initial output]
Keep: [what works]
Change: [what needs to change]
Add: [what's missing]
Remove: [what's unnecessary]
Don't touch: [what shouldn't change]
The Context Dump
Here's the current state:
- File: [path] does [function]
- File: [path] does [function]
- The bug is in: [location]
- Error message: [exact text]
- This worked before I: [recent change]
- I've already tried: [attempts]
Fix the bug without changing [protected areas].
The Scope Lock
ONLY modify [specific files/functions].
Do NOT touch: [protected files]
Do NOT change: [protected behavior]
Do NOT add: [unwanted additions]
Keep the diff as small as possible.
The Quality Gate
Before considering this done:
1. All existing tests pass
2. New tests cover: [specific scenarios]
3. No TypeScript errors (strict mode)
4. No ESLint warnings
5. Lighthouse performance score > [N]
6. [Custom quality criterion]
March 2026 Additions: Autonomous Mode Prompts
New prompts for Claude Code Auto Mode, MCP workflows, and agentic build patterns.
The Auto Mode Task Brief (Expert)
Tool: Claude Code (Auto Mode enabled) | Time: Runs unattended 15-120 min
Use this when handing a scoped task to Claude Code in Auto Mode. The structure defines scope, acceptance criteria, and what Claude should NOT touch β so the autonomous run has clear boundaries.
# Task: [Brief title]
## Scope
Working directory: [path]
Files allowed to modify: [list or glob pattern]
Files that must NOT change: [list β tests, migrations, config, etc.]
## Objective
[One sentence: what should be different when you're done]
## Acceptance Criteria
- [ ] [Specific, testable outcome 1]
- [ ] [Specific, testable outcome 2]
- [ ] All existing tests still pass
- [ ] No TypeScript errors (strict)
- [ ] No new ESLint warnings
## What This Is NOT
- Do not refactor unrelated code
- Do not add features beyond the objective
- Do not modify [specific protected area]
## Summary at End
When complete, write a brief summary of:
1. Every file changed and why
2. Any decisions you made and the tradeoff
3. Anything you're uncertain about
4. Tests I should run to verify
Why it works: The summary request at the end transforms Auto Mode from "black box" to "async colleague" β you wake up to a log of decisions, not just a diff.
The Claude Code Channels Handoff (Advanced)
Tool: Claude Code + Channels (Telegram/Discord integration) | Time: N/A β async coordination
Claude Code Channels (March 2026) lets you send instructions to a running Claude Code session from your phone. Use this prompt structure to create async checkpoints that Claude will pause for:
## Background Task with Mobile Checkpoints
Start the following task: [task description]
## Checkpoint Rules
Pause and send me a Telegram message at these points:
1. After completing the initial analysis β summarize what you found
2. Before any destructive action (delete, drop, overwrite) β describe it and wait
3. If you hit a blocker you can't resolve β describe the issue
4. When complete β summary of all changes
## Proceed autonomously between checkpoints.
Do not pause for routine read/write/test operations.
Why it works: You define the decision points where human judgment matters, and let Claude handle the execution in between. Run overnight builds and get Telegram pings when action is needed.
The Security Scope Guard (Advanced)
Tool: Claude Code (any mode) | Time: Prepend to any task involving auth, payments, or data
Add this as a preamble whenever Claude Code will touch security-sensitive code. It activates extra caution without requiring manual review of every action:
## Security Scope Guard β Activate Before This Task
This task involves security-sensitive code: [auth / payments / user data / API keys]
Before every change to [auth / payment / data] files:
1. State what vulnerability pattern you are avoiding
2. Confirm input validation is present
3. Confirm secrets are not hardcoded
4. Confirm error messages don't leak internal state
Never:
- Log authentication tokens or session IDs
- Return detailed error messages to the client
- Use string concatenation in SQL queries
- Disable CORS for any reason
- Store credentials in localStorage
If you see existing code that violates the above: flag it in your summary, do not silently fix it (I need to know it existed).
Now proceed with: [actual task]
Why it works: Security reviews after the fact miss context. This prompt embeds security review into the generation loop β Claude checks each change against the rules as it writes, not after.
Category 26: MCP Integration Prompts (Added March 2026)
Model Context Protocol (MCP) is now the standard way to give AI coding assistants persistent context and tool access. These prompts help you integrate MCP correctly.
26.1 MCP Server Setup Prompt (Intermediate)
Tool: Claude Code | Time: 30-60 min
Set up an MCP (Model Context Protocol) server for my project that exposes the following tools to AI assistants:
## Tools to Expose
1. [Tool 1 name]: [what it does β e.g., "read_project_data: reads the projects.json registry"]
2. [Tool 2 name]: [what it does β e.g., "run_health_check: pings all deployment URLs"]
3. [Tool 3 name]: [what it does β e.g., "get_recent_errors: reads the last 50 error log lines"]
## Implementation Requirements
- Use the @modelcontextprotocol/sdk package
- Implement as stdio transport (not HTTP) for local use
- Each tool must have a clear JSON schema for inputs
- Each tool must return structured JSON output
- Add error handling that returns helpful error messages, not stack traces
- Include a test script that exercises each tool
## Configuration
Generate the MCP configuration block for claude_desktop_config.json:
{
"mcpServers": {
"[server-name]": {
"command": "node",
"args": ["path/to/server.js"]
}
}
}
## Context This Will Enable
When this MCP server is active, an AI assistant will be able to [describe what new capabilities this enables for your workflow].
Build the complete MCP server. Start with the tool definitions, then the handlers, then the test script.
26.2 Claude Code MCP Context Prompt (Advanced)
Tool: Claude Code | Time: 15 min
I'm setting up a project-level MCP context file so Claude Code has persistent context about my project without me having to re-explain it every session.
Create a CLAUDE.md file that covers:
## Project Identity
- Name: [project name]
- Purpose: [one sentence]
- Stack: [tech stack]
- Current status: [active development / maintenance / paused]
## Key Files and Their Purpose
- [file path]: [what it contains and when to read it]
- [file path]: [what it contains and when to read it]
## Commands
- Build: [command]
- Dev server: [command]
- Test: [command]
- Deploy: [command]
## Architecture Decisions That Are NOT Up for Discussion
- [Decision 1]: [why it was made β do not suggest alternatives]
- [Decision 2]: [why it was made]
## Known Issues (Don't Re-Investigate)
- [Issue 1]: [known limitation, not a bug to fix]
## My Workflow
- I prefer [file-by-file / whole-feature] implementations
- Always [run tests / lint / build] before marking a task done
- When in doubt, [ask / make conservative choice / make opinionated choice]
Make the CLAUDE.md scannable and under 200 lines.
26.3 Next.js Secure Middleware Pattern (Intermediate) (Security-critical β post-CVE-2025-29927)
Tool: Claude Code, Cursor | Time: 20 min
Add authentication to my Next.js app using the secure dual-layer pattern (required post-CVE-2025-29927).
## Protected Routes
- /dashboard/:path* β requires authenticated user
- /api/protected/:path* β requires authenticated user, returns 401 JSON (not redirect)
- /admin/:path* β requires authenticated user with admin role
## Auth Provider
I'm using: [NextAuth v5 / Supabase Auth / Clerk / custom JWT]
## Implementation Rules
1. Middleware ONLY for UX redirects (fast redirect to /login for protected pages)
2. Every /api/protected route MUST verify the session server-side independently
3. NEVER rely on middleware as the sole auth gate for API routes
4. Include the x-middleware-subrequest header strip check as a comment
## Pattern to Implement
For each protected API route:
\`\`\`typescript
// DO NOT rely on middleware alone β verify here
const session = await getServerSession(authOptions)
if (!session) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
\`\`\`
Generate:
1. middleware.ts with the correct matcher config and a comment explaining it is NOT a security boundary
2. A shared auth utility function (lib/auth-guard.ts) that API routes can call
3. One example protected API route using the utility
4. A test that verifies the API route returns 401 when no session exists
Category 27: Multi-Agent Orchestration Prompts (Cursor 3 / Claude Code Teams)
Added April 7, 2026 β covering the new parallel multi-agent workflows enabled by Cursor 3's Agents Window and Claude Code's Teams feature.
27.1 The Agent Task Decomposer (Advanced)
Tool: Cursor 3 Agents Window, Claude Code | Time: 5 min setup β autonomous execution
Use this prompt to break a large feature into parallelizable agent tasks before opening the Agents Window.
I need to implement [feature name] in my [type of app].
Decompose this into parallel agent tasks using this format:
- Each task must be completable in under 30 minutes
- Tasks must have clear success criteria (how to verify it's done)
- Identify dependencies (which tasks must complete before others can start)
- Assign a suggested agent focus for each (e.g., "backend agent", "test agent", "UI agent")
Feature to decompose:
[Describe the feature in 3-5 sentences. Include: what it does, the data it uses, and any API/external integrations.]
Output format:
## Agent Task Plan
### Wave 1 (parallel, no dependencies)
- Task A [Agent role]: [Goal] | Success: [How to verify] | Files: [which files/modules]
- Task B [Agent role]: [Goal] | Success: [How to verify] | Files: [which files/modules]
### Wave 2 (depends on Wave 1)
- Task C [Agent role]: [Goal] | Success: [How to verify] | Depends on: [Task A output]
27.2 The Single Agent Task Charter (Intermediate)
Tool: Cursor 3 Agents Window, Claude Code | Time: 2 min per agent
Paste this into each individual agent in the Agents Window to give it a focused, well-bounded mission.
## Agent Charter
**Role**: [Backend Engineer / Frontend Developer / QA Engineer / Security Reviewer / Docs Writer]
**Mission**: [One sentence: what this agent will produce]
**Scope**: [Specific files, modules, or directories this agent is allowed to touch]
**Off-limits**: [Files/systems this agent must not modify]
**Success Criteria** (all must be true when you're done):
1. [Specific, verifiable outcome]
2. [Specific, verifiable outcome]
3. Tests pass: [which test command to run]
**Handoff**: When complete, write a summary to `agent-handoff-[role].md` covering:
- What you built
- Any decisions you made and why
- What the next agent needs to know
- Any concerns or edge cases you noticed
**Context**: [Brief description of the larger feature this fits into]
Do not interrupt me unless you are truly blocked. Make reasonable decisions independently.
27.3 The Multi-Agent Review Prompt (Advanced)
Tool: Cursor 3 Agents Window, Claude Code | Time: 10-15 min supervised execution
Use this to spin up a dedicated review agent that audits another agent's output before you merge it.
## Review Agent Mission
You are a senior code reviewer. You did NOT write the code you are reviewing.
**Author agent**: [which agent produced this code, e.g., "Backend Agent β implemented the payment webhook handler"]
**Files to review**: [list the files]
**Success criteria of the original task**: [paste the success criteria from the original agent's charter]
Your review checklist:
1. **Correctness**: Does the code do what the task charter required?
2. **Edge cases**: What inputs could break this? (empty arrays, null values, concurrent requests, network failures)
3. **Security**: Any injection risks, missing auth checks, exposed secrets, or unvalidated inputs?
4. **Performance**: Any N+1 queries, missing indexes, synchronous blocking calls, or memory leaks?
5. **Tests**: Are the tests meaningful? Do they cover the stated success criteria?
6. **Handoff quality**: Is the agent-handoff file accurate and useful for downstream agents?
Output a structured review:
## Review Summary
**Overall verdict**: APPROVE / REQUEST_CHANGES / BLOCK
**Confidence**: High / Medium / Low
### Issues Found
| Severity | File | Line | Issue | Suggested Fix |
|----------|------|------|-------|---------------|
| CRITICAL | ... | ... | ... | ... |
### Approved Items
[What the agent did well β be specific]
### Required Changes Before Merge
[Numbered list if verdict is REQUEST_CHANGES or BLOCK]
Category 28: Long-Horizon Agentic Execution (April 2026)
For GLM-5.1, Claude Code, Cursor Automations, and any AI agent running 2+ hour autonomous sessions. These prompts help you structure work that outlasts your attention span.
28.1 The Long-Horizon Task Brief (Advanced)
Tool: GLM-5.1, Claude Code, Cursor Automations | Time: 30 min setup β hours of autonomous execution
Use this before starting any AI session you expect to run longer than 30 minutes. A clear brief prevents the model from drifting, making scope-creep decisions, or silently failing.
## Long-Horizon Task Brief
**Session goal** (one sentence):
[What is complete when this session ends?]
**Time budget**: [How many hours should the agent spend before stopping to check in?]
**In scope**:
- [Feature/file/system 1]
- [Feature/file/system 2]
**Out of scope** (hard limits):
- Do NOT modify [file/system] β read-only
- Do NOT delete anything β create new files only
- Do NOT push to main β commit to branch only
**Checkpointing** (every N hours):
Write a checkpoint file at `agent-checkpoint-[timestamp].md` containing:
1. What has been completed
2. Current task in progress
3. Known blockers or unresolved decisions
4. What remains to complete the session goal
**Success criteria** (all must be true at session end):
1. [Verifiable outcome β test command, file exists, URL responds, etc.]
2. [Verifiable outcome]
3. All code compiles with zero TypeScript errors (`npm run build`)
4. All existing tests still pass (`npm test`)
**How to handle blockers**:
- If blocked by a missing env var β note it in the checkpoint file and skip that feature
- If blocked by an ambiguous requirement β make a reasonable assumption, document it in the checkpoint, and continue
- If blocked by a breaking error β stop, write a blocker-report.md, and halt the session
Begin with a brief plan (3-5 bullet points), then execute.
28.2 The Open-Weight Model Selection Prompt (Intermediate)
Tool: Any LLM with web access or knowledge cutoff April 2026 | Time: 5 min
Use this when evaluating whether to use a self-hosted open-weight model vs. a closed API for a specific project.
I need to choose between a self-hosted open-weight model and a closed API for the following use case:
**Use case**: [Describe what the AI will be doing β code completion, autonomous agents, document analysis, etc.]
**Constraints**:
- Data sensitivity: [Public / Internal / Confidential / Regulated (HIPAA, SOC2, etc.)]
- Budget: [Monthly cap in USD, or "no limit"]
- Latency requirement: [< 500ms / < 2s / batch OK]
- Infrastructure: [Consumer hardware / cloud GPU / on-prem enterprise cluster]
- Team size: [Solo / small team / enterprise]
- Vendor lock-in tolerance: [Low / Medium / High]
**Open-weight models to evaluate** (as of April 2026):
- GLM-5.1 (754B, Z.AI) β SOTA SWE-Bench Pro, 8-hour autonomous sessions, Apache 2.0
- Gemma 4 (Google, Apache 2.0) β 4 sizes, strong reasoning and coding
- Llama 3.x (Meta) β broad ecosystem, widely deployed
- Qwen3.6-Plus β 1M context, competitive with Claude 4.5 on coding tasks
**Closed APIs to evaluate**:
- Claude Sonnet 4.6 (Anthropic API) β best agentic coding, $3/$15 per MTok
- GPT-4o (OpenAI) β broad capability, strong ecosystem
- Gemini 1.5 Pro (Google) β 1M context, competitive pricing
For each candidate, evaluate:
1. Does it meet my latency requirement?
2. Does it meet my data sensitivity requirement?
3. What is the estimated monthly cost at my usage level?
4. What are the known failure modes for my use case?
Recommend the best option and explain the trade-offs I'm accepting.
28.3 The Goose/Local-Agent Workflow Prompt (Intermediate)
Tool: Goose (Block), any LLM-agnostic local AI agent | Time: 10 min setup
Goose (launched April 2026 by Block) is an open-source local AI agent that supports any LLM backend and executes real actions: install packages, run tests, modify files, call APIs. This prompt structure is designed for Goose-style action-oriented agents.
## Goose Task: [Short task name]
**Objective**: [One sentence describing the complete state when this task is done]
**LLM backend**: [claude-sonnet-4-6 / glm-5.1 / gpt-4o / gemma-4 β whichever you're using]
**Allowed actions**:
- Read and write files in: [path/to/project]
- Run shell commands: [list safe commands, e.g., npm test, npm run build, git status]
- Install packages: [yes/no β if yes, list approved package registries]
- Make HTTP requests to: [list allowed external APIs, e.g., "GitHub API only"]
**Prohibited actions** (hard stops β do not proceed if any of these are required):
- git push (never push without human review)
- rm -rf or destructive filesystem operations
- Modify files outside [path/to/project]
- Access [sensitive-system]
**Context files** (read these before starting):
- [path/to/README.md]
- [path/to/relevant-config.json]
**Task steps** (ordered):
1. [First action]
2. [Second action, may depend on output of step 1]
3. Verify: run [test command] and confirm output matches [expected output]
**Output**: When done, write `goose-task-complete.md` with:
- Actions taken (with file paths and commands run)
- Test results
- Any assumptions made
- Any issues encountered
Start immediately. Do not ask for clarification unless truly blocked.
Category 29: Claude Sonnet 4.6 β 1M Context & Agentic Search Prompts (April 2026)
Claude Sonnet 4.6 introduced two capabilities that change how you structure prompts: a 1M token context window (beta) and GA web search/web fetch with code-execution-based result filtering. These prompts exploit both.
29.1 The Whole-Codebase Refactor Prompt (Expert)
Tool: Claude Sonnet 4.6 via API or Claude Code | Context required: 200Kβ1M tokens
With the 1M context window, you can load an entire medium-sized codebase and ask for architectural analysis without chunking. This works for repositories up to ~150K lines.
## Codebase Refactor Brief
**Repository**: [project-name]
**Goal**: [Specific refactor objective β e.g., "migrate from Pages Router to App Router", "replace all class components with hooks", "extract shared utilities from duplicated code"]
**Constraints**:
- Do not change external API contracts (public-facing routes must remain the same)
- All existing tests must pass after refactor
- Prefer surgical changes over rewrites
**Files loaded below** (entire codebase follows in this message):
[Paste full codebase or use file upload β Claude Sonnet 4.6 handles up to 1M tokens]
**Output requested**:
1. A prioritized list of refactor changes (most impactful first)
2. For each change: which files are affected, what changes, and estimated risk level (low/medium/high)
3. A proposed commit sequence (small atomic commits, safest order)
4. Any architectural concerns that would block this refactor
Do NOT generate code yet β produce the analysis and plan first. I will confirm before implementation begins.
29.2 The Research-Then-Build Prompt (Intermediate)
Tool: Claude Sonnet 4.6 (web search GA) | Time: 15β30 min
Sonnet 4.6's web search and web fetch are GA, with dynamic result filtering via code execution. This prompt chains research directly into implementation β no context-switching between browser and editor.
## Research-Then-Build Task
**What I'm building**: [Short description β e.g., "a rate limiter middleware for my Next.js API routes"]
**Research phase** (do this first β use web search):
1. Search for: "[topic] best practices [current year]"
2. Fetch the top 2β3 relevant documentation pages
3. Identify: (a) the standard pattern, (b) common failure modes, (c) security considerations
4. Write a 3-bullet summary of your findings before writing any code
**Build phase** (only after research summary is written):
- Implement [feature] based on your findings
- Follow the standard pattern you identified
- Add defensive handling for the top failure mode
- Include a comment linking to the primary source used
**Validation**:
- Re-fetch [relevant documentation URL] and confirm your implementation aligns
- Note any deviations and explain why
Start with the research phase. Do not write code until research summary is complete.
29.3 The Extended-Thinking Architecture Decision Prompt (Advanced)
Tool: Claude Sonnet 4.6 with extended thinking | Time: 5 min prompt, 10β20 min thinking
Extended thinking gives the model more compute budget before it commits to an answer. Use this for architecture choices where a wrong call means weeks of rework.
## Architecture Decision Request
**Decision to make**: [e.g., "Should I use Supabase Realtime or polling for my live dashboard?"]
**Context**:
- System: [Brief description]
- Scale: [Expected users/requests in 6 months]
- Team: [Solo / small / larger]
- Constraints: [Budget, latency, existing stack, migration costs]
- Timeline: [When must you ship?]
**What I've already considered**:
- Option A: [First option] β I think this because [reasoning]
- Option B: [Second option] β I think this because [reasoning]
- What I'm unsure about: [Specific uncertainty]
**What I need**:
1. Evaluate both options against my specific constraints (not generic trade-offs)
2. Identify what I'm missing or wrong about in my reasoning
3. Recommend one option with confidence level (high/medium/low) and what would change your recommendation
4. Give me the one question I should answer before committing
Take your time β a slow, thorough answer beats a fast, wrong one.
Category 30: April 2026 β Agent Framework, Security Audit & Parallel Fleet Prompts
Three new workflows unlocked by the April 2026 AI tooling wave: Microsoft Agent Framework 1.0 multi-agent orchestration, Claude Mythos-style security audit chaining, and Cursor 3 parallel agent fleet management.
30.1 The Microsoft Agent Framework 1.0 Orchestration Prompt (Advanced)
Tool: Microsoft Agent Framework 1.0 (.NET or Python), Claude Code | Time: 30β60 min setup
Agent Framework 1.0 ships with A2A and MCP protocol support, enabling cross-runtime agent interoperability. Use this prompt to design multi-agent workflows that span different AI providers without lock-in.
## Multi-Agent Workflow Design Request
**Workflow goal**: [What the agent system should accomplish end-to-end β e.g., "receive a GitHub issue, research the codebase, implement a fix, open a PR, and notify Slack"]
**Agents needed** (describe each):
- Agent 1: [Name + responsibility + which model/provider β e.g., "Researcher β Claude Sonnet 4.6 β reads codebase and clarifies requirements"]
- Agent 2: [Name + responsibility + which model/provider]
- Agent 3: [Name + responsibility + which model/provider]
**Coordination protocol**: A2A (agent-to-agent messages) | MCP (tool calls to shared context) | Both
**Runtime**: .NET | Python | Both
**State management**:
- Shared state that all agents need: [list]
- State private to each agent: [list]
- How agents hand off work: [event-driven / polling / direct call]
**Error handling**:
- If Agent 1 fails: [retry / fail pipeline / route to human]
- If Agent 2 fails: [behavior]
- Maximum retries per agent: [N]
**Output required**:
1. Agent architecture diagram (ASCII or described)
2. Agent Framework 1.0 code scaffold for each agent class
3. The A2A message schema for agent handoffs
4. The MCP tools each agent needs registered
5. DevUI configuration for browser-based debugging
Generate the scaffold. I will fill in the business logic per agent.
30.2 The AI Security Audit Chain Prompt (Expert)
Tool: Claude Sonnet 4.6 or Claude Code with CyberOS MCP | Time: 20β40 min per codebase
Inspired by Claude Mythos / Project Glasswing's defensive security workflow β systematically chain vulnerability discovery, triage, and remediation across a codebase without missing surface area.
## AI-Powered Security Audit β Systematic Chain
**Codebase**: [Repo path or paste content]
**Stack**: [e.g., Next.js 14 + Supabase + Stripe + Python FastAPI backend]
**Deployment**: [Vercel + AWS Lambda | Self-hosted | Cloud provider]
**Compliance scope**: [OWASP Top 10 | SOC 2 | PCI-DSS | All]
## Phase 1 β Attack Surface Map
List every:
- Public HTTP endpoint (method + path + auth required)
- Data input point (form, query param, file upload, webhook)
- Third-party integration (API calls out, webhooks in)
- Secret/credential usage point
Do not analyze yet. Only map. Output as a numbered list.
## Phase 2 β Vulnerability Scan
For each item on the attack surface map, check for:
- Injection (SQL, command, SSRF, path traversal)
- Authentication/authorization bypass
- Sensitive data exposure (secrets in logs, responses, or error messages)
- Cryptographic weaknesses (weak ciphers, padding oracle, hardcoded keys)
- Supply chain risks (mutable version references, unverified dependencies)
Classify each finding: CRITICAL / HIGH / MEDIUM / LOW / INFO
Include CWE ID and the exact file:line where the issue exists.
## Phase 3 β Remediation Plan
For each CRITICAL and HIGH finding:
1. Explain the vulnerability in one sentence
2. Write the fixed code (before/after diff)
3. Explain why the fix works
## Phase 4 β Verification
After remediations are applied:
- Re-scan the attack surface for the patched items
- Confirm no new vulnerabilities were introduced by the fix
- Output a signed-off list: [finding] β [status: FIXED / PARTIALLY FIXED / DEFERRED]
Start with Phase 1. Do not proceed to Phase 2 until I confirm the attack surface map is complete.
30.3 The Cursor 3 Parallel Agent Fleet Prompt (Advanced)
Tool: Cursor 3 Agents Window | Time: 5 min to launch, 30β120 min execution
Cursor 3's Agents Window lets you run multiple AI agents simultaneously across local, SSH, and cloud environments. This prompt template structures how to decompose work across a fleet efficiently so agents don't conflict.
## Parallel Agent Fleet Assignment
**Project**: [Brief description of the codebase]
**Goal**: [What needs to be accomplished β e.g., "ship the user dashboard feature including data layer, UI components, tests, and documentation"]
**Fleet decomposition** (define independent workstreams that can run in parallel):
Agent A β [Name: e.g., "Data Layer"]
- Scope: [Specific files/directories this agent owns]
- Task: [Exact work to do]
- Output: [What it should produce β e.g., "implemented API routes with tests passing"]
- Dependencies: [What it needs before starting β e.g., "database schema must exist"]
- Must NOT touch: [Files/areas that are other agents' scope]
Agent B β [Name: e.g., "UI Components"]
- Scope: [...]
- Task: [...]
- Output: [...]
- Dependencies: [...]
- Must NOT touch: [...]
Agent C β [Name: e.g., "Tests & Docs"]
- Scope: [...]
- Task: [...]
- Output: [...]
- Dependencies: [Agent A and B PRs merged]
- Must NOT touch: [...]
**Conflict prevention**:
- Shared files that multiple agents might edit: [list them β these need explicit ownership]
- Owner of package.json / lock file: [Agent A | Agent B | None β freeze during parallel work]
- Owner of shared types/interfaces: [which agent defines, others consume]
**Review order**:
1. Review Agent A output first
2. Review Agent B output (may depend on A's types)
3. Review Agent C output last (depends on both)
**Launch in the Agents Window**: Open one agent session per row above. Paste the Agent-specific block into each session. Start all simultaneously.
This library is updated monthly with new prompts based on emerging tools, patterns, and reader requests. Last updated: April 14, 2026. Added: Category 31 (AI Agent Payments, Session Context Briefs, Generated Code Security Review). Previous: Category 30 (Agent Framework 1.0 orchestration, AI security audit chain, Cursor 3 parallel fleet management, April 13). Category 29 (Claude Sonnet 4.6 β 1M Context & Agentic Search Prompts, April 10). Category 28 (Long-Horizon Agentic Execution, April 9). Category 27 (Multi-Agent Orchestration, April 7). Category 26 (MCP Integration, March 31).
Category 31: April 2026 β AI Agent Payments, Session Context & Security Review
Three new prompt patterns emerging from the Claude Code creator workflow reveal and x402 protocol adoption.
31.1 The AI Agent Payment Integration Prompt (Advanced)
Tool: Claude Code, Cursor | Time: 2-4 hours | Category: Emerging Patterns
Context: Coinbase's x402 protocol enables AI agents to make autonomous payments. As of April 2026, this is becoming a real workflow pattern β agents that call APIs, pay for compute, and operate economically without human authorization for each transaction.
I'm building an AI agent that needs to make autonomous payments using the
Coinbase x402 protocol / [payment protocol].
## Agent Context
- Agent type: [coding assistant / research agent / deployment bot]
- Payment ceiling per action: $[amount]
- Allowed payment recipients: [API services, infrastructure providers]
- Forbidden: [payments to unknown wallets, amounts over $X]
## What I Need
1. Integrate x402 payment headers into the agent's HTTP client
2. Implement a payment budget tracker that halts the agent when the daily/session
ceiling is hit
3. Add a payment audit log (what was paid, when, to whom, why)
4. Implement human-approval gates for payments above $[threshold]
5. Handle x402 402 Payment Required responses gracefully
## Safety Requirements
- Never pay from the agent wallet without logging first
- Require cryptographic receipts for all payments
- Alert human operator if payment velocity exceeds [N] transactions/minute
- Reject any payment request that doesn't match the allowed-recipient list
Build the payment client and budget tracker first, then integrate into the
existing agent loop.
Use when: Building economic agents, autonomous task runners that consume paid APIs, or testing the x402 payment stack.
Security note: Always implement human approval gates for amounts above $1 in production. See Chapter 10 for AI agent attack surfaces.
31.2 The Session Context Brief Generator (Beginner)
Tool: Claude Code, Cursor, Windsurf | Time: 5 minutes | Category: Workflow
This prompt generates a reusable session brief from your current codebase state. Run it at the start of every Claude Code session to give the AI full context before any task.
I need you to generate a session brief for this codebase. Read the following
and produce a structured brief I can paste at the start of future sessions:
## Please Analyze
- The overall architecture (what framework, what database, what auth)
- The current state (what works, what's broken based on TODO comments and errors)
- The key files that any feature touching [feature area] would need to know about
- Any explicit constraints in CLAUDE.md or README that I shouldn't violate
- The tech debt or known issues I should steer around
## Output Format
Produce a brief in this format:
---
## Session Brief β [Date]
**Stack**: [framework, database, auth, hosting]
**What's working**: [bullet list]
**What's broken / in-progress**: [bullet list]
**Key files for [feature area]**: [file paths with one-line description each]
**Constraints to respect**: [rules from CLAUDE.md / README]
**Steer around**: [known issues, fragile code, don't-touch zones]
---
Keep it under 400 words so it fits in a context window preamble.
Use when: Starting any Claude Code session, onboarding to a new codebase, or after a long break from a project.
Why it works: A 5-minute brief prevents 30-60 minutes of context-building drift. Claude Code performs significantly better when it knows the full codebase state upfront.
31.3 The Generated Code Security Review Prompt (Intermediate)
Tool: Claude Code, Cursor | Time: 10-15 minutes | Category: Security
After generating a significant block of code, use this prompt to run a security review before accepting the change. Especially important for authentication flows, API handlers, and any code that touches user data.
Review the following generated code for security vulnerabilities.
## Code to Review
[paste generated code here]
## Review Checklist
Check specifically for:
1. **Injection vulnerabilities**: SQL injection, command injection, path traversal
2. **Authentication gaps**: Missing auth checks, broken access control
3. **Input validation**: Unvalidated user input reaching sensitive operations
4. **Secret exposure**: Hardcoded credentials, keys in code, logging of sensitive data
5. **Prototype pollution**: Object spread from user input, __proto__ manipulation
6. **Race conditions**: Async operations that could interleave dangerously
7. **Error handling**: Stack traces leaking in responses, errors that expose internals
## For Each Issue Found
- Severity: Critical / High / Medium / Low
- CWE category
- Exact line(s) affected
- Safe version of the code
## If Clean
Confirm the code is safe to merge and note any edge cases that weren't security
issues but should be tested.
Context: This code is [describe what it does and who has access to it].
The framework is [Next.js / Express / Django / etc.].
The data involved: [user PII / payment data / internal only / public].
Use when: After any AI-generated auth handler, API route, form processing, or file upload code. Non-negotiable for code touching user data or payments.
Pairs with: CyberOS (https://cyberos.dev) for automated continuous review in CI/CD pipelines.
Source: Based on OWASP Top 10 2025 and the CyberOS pattern database (615 patterns as of April 2026).
Category 32: Automation & Agent Orchestration Prompts (Added April 2026)
Three new prompt patterns for Claude Code Routines (launched April 2026), Cursor 3 multi-repo agent orchestration, and automated security auditing β covering the full spectrum from simple recurring automation to coordinated multi-agent coding sessions.
32.1 Claude Code Routines β PR Review Automation (Intermediate)
Tool: Claude Code | Difficulty: Intermediate | Time: 15-30 min
Claude Code Routines (April 2026) let you define recurring coding tasks that run on Anthropic's cloud infrastructure, triggered by events like new pull requests. Use this prompt to configure a Routine that automatically reviews every incoming PR before a human reviewer sees it.
## Claude Code Routine: Automated PR Review
Set up a Claude Code Routine that triggers on new pull requests to this
repository and performs a structured code review before human reviewers
are assigned.
## Trigger
Event: pull_request.opened, pull_request.synchronize
Scope: all branches targeting main and develop
Skip: PRs with label "skip-ai-review" or authored by bots
## Review Tasks (run in sequence)
### 1. Change Summary
- Summarize what the PR does in 3-5 bullet points
- Identify which components/modules are affected
- Estimate scope: small (< 50 lines changed), medium (50-300), large (300+)
### 2. Code Quality Check
- Flag any functions longer than 50 lines
- Flag cyclomatic complexity > 10
- Identify duplicated logic that already exists elsewhere in the codebase
- Check naming conventions match the patterns in [existing files in the repo]
### 3. Security Scan
- Check for the patterns in Prompt 32.3 (OWASP Top 10 for Next.js/React)
- Flag any hardcoded secrets, tokens, or credentials
- Identify unvalidated user inputs reaching database or filesystem operations
- Check new API routes for missing authentication guards
### 4. Test Coverage
- Identify new functions or branches not covered by the PR's test additions
- List any test files that should have been updated but weren't
- Flag missing edge case tests for: null/undefined input, empty arrays,
auth failure paths
### 5. Review Output
Post a structured comment to the PR with:
- **Summary**: [auto-generated summary]
- **Scope**: small / medium / large
- **Issues**: [table: Severity | File | Line | Issue | Suggested Fix]
- **Missing tests**: [list]
- **Verdict**: LGTM (no blockers) | NEEDS CHANGES (list blockers) | REQUEST HUMAN REVIEW (flag for security/arch concerns)
## Routine Configuration
- Runtime: Anthropic cloud (no self-hosted runner required)
- Model: claude-sonnet-4-6
- Timeout: 5 minutes per PR
- Post comment as: GitHub App bot account
- Do NOT approve or request changes via GitHub review API β comment only
- Do NOT auto-merge under any circumstances
## What This Routine Should NOT Do
- Rewrite or suggest large refactors on a per-PR basis
- Block PRs automatically β it informs, humans decide
- Comment more than once per commit push (deduplicate on commit SHA)
Why it works: This Routine acts as a tireless first-pass reviewer that runs in under 5 minutes on every PR. Human reviewers arrive to a structured pre-analysis and can focus on architecture and intent rather than scanning for obvious issues.
Setup note: Configure the Routine in your Claude Code workspace settings under Routines > New Routine > Event Trigger. The model runs server-side β no GitHub Actions minutes consumed.
32.2 Multi-Agent Coding Session Orchestration (Advanced)
Tool: Claude Code, Cursor 3 | Difficulty: Advanced | Time: 2-4 hours
Cursor 3 (April 2026) introduced unified multi-repo agent orchestration β a single workspace can coordinate agents working across separate repositories simultaneously. Use this prompt pattern to split a full-stack feature across three specialized agents: backend, frontend, and test/QA.
## Multi-Agent Session: [Feature Name]
You are the orchestrator for a 3-agent coding session. Your job is to
decompose the feature, assign agents, prevent conflicts, and integrate
outputs. Do not write implementation code yourself β delegate to agents.
## Feature Brief
[Describe the feature in 3-5 sentences: what it does, what data it uses,
what API contracts it creates or modifies, and any external integrations.]
## Repository Map
- Backend repo: [path or URL β e.g., api.myapp.com at /repos/backend]
- Frontend repo: [path or URL β e.g., app.myapp.com at /repos/frontend]
- Shared types package: [path β e.g., /repos/shared-types] (if applicable)
---
## Agent 1: Backend Agent
**Scope**: [/repos/backend/src/routes, /repos/backend/src/services, /repos/backend/src/db]
**Mission**: Implement the server-side feature β database schema changes,
business logic, and REST/GraphQL API endpoints.
**Deliverables**:
1. Database migration file for [new tables or schema changes]
2. Service layer with full business logic and error handling
3. API endpoints matching this contract:
- [METHOD] [/path]: [description, request body, response shape]
- [METHOD] [/path]: [description]
4. Unit tests for the service layer (90%+ coverage on new code)
5. Update /repos/shared-types with any new TypeScript interfaces
**Must NOT touch**:
- Frontend repo
- Authentication middleware (read-only)
- Existing migrations
**Handoff**: Write `agent-handoff-backend.md` with final API contracts
and any environment variables added.
---
## Agent 2: Frontend Agent
**Scope**: [/repos/frontend/src/components, /repos/frontend/src/pages, /repos/frontend/src/hooks]
**Mission**: Implement the UI for [feature name] using the API contracts
defined in agent-handoff-backend.md. Wait for Agent 1's handoff file
before writing any data-fetching code.
**Deliverables**:
1. React components: [list specific components needed]
2. Data-fetching hooks using [SWR / React Query / Server Actions]
matching the API contract in agent-handoff-backend.md
3. Form validation for all user inputs
4. Loading, empty, and error states for all async operations
5. Responsive layout (mobile breakpoint: 640px)
**Must NOT touch**:
- Backend repo
- Auth context or session management
- Design system tokens (read-only β use existing classes)
**Handoff**: Write `agent-handoff-frontend.md` with component tree,
prop interfaces, and any new environment variables needed.
---
## Agent 3: Test & QA Agent
**Scope**: [/repos/backend/tests, /repos/frontend/tests, /repos/frontend/e2e]
**Mission**: Write the full test suite for this feature. Start after
Agent 1's handoff. Complete E2E tests after Agent 2's handoff.
Do NOT write implementation code β tests only.
**Deliverables**:
1. API integration tests (all endpoints: happy path + 4xx + 5xx cases)
2. Component tests for each UI component Agent 2 built
3. E2E test covering the full user flow: [describe the 3-5 step user journey]
4. A test coverage report showing new code coverage
**Must NOT touch**:
- Source code in either repo (tests and fixtures only)
**Handoff**: Write `agent-handoff-qa.md` with test results, coverage
numbers, and any failing tests with root cause.
---
## Orchestration Rules
**Sequencing**:
1. Agent 1 runs first β do not start Agent 2 until agent-handoff-backend.md exists
2. Agent 2 and Agent 3 (API tests only) can run in parallel after Agent 1 finishes
3. Agent 3 E2E tests run last β requires both Agent 1 and Agent 2 complete
**Conflict prevention**:
- package.json / lock files: frozen during parallel work β no dependency additions
- Shared types: Agent 1 owns writes, Agents 2 and 3 read-only
- Environment files: each agent appends to a dedicated .env.[agent] file,
do not modify .env directly
**Integration checkpoint**:
When all three agents have written their handoff files, run:
1. `npm run build` in both repos β must succeed with zero errors
2. `npm test` in both repos β all tests must pass
3. `npm run e2e` β all E2E tests must pass
If any step fails, identify which agent's output caused the failure
and assign a targeted fix task to that agent only.
**Final output**:
Write `session-summary.md` with:
- Feature implemented (what was built)
- All files changed (by repo and agent)
- Test results (pass/fail counts, coverage delta)
- Known limitations or deferred items
- Decisions made and why
Why it works: The strict scope boundaries prevent agents from stepping on each other's work. The handoff files create an explicit async interface between agents β Agent 2 cannot make assumptions about the API until Agent 1 has documented it, which eliminates the most common integration failure in multi-agent sessions.
Cursor 3 setup: Open three agent panels in the Agents Window. Paste each agent block into its respective panel. Launch Agent 1 first. Monitor agent-handoff-backend.md creation before launching Agents 2 and 3.
32.3 Security Audit Automation β Next.js/React OWASP Top 10 (Advanced)
Tool: Claude Code | Difficulty: Advanced | Time: 30-60 min
Use this prompt to run a comprehensive automated security audit of a Next.js or React codebase, checking for all OWASP Top 10 vulnerability classes with patterns tuned for the React/Next.js stack. Designed to complement CyberOS's continuous monitoring (https://cyberos.dev) for one-time deep audits.
## Automated Security Audit: Next.js / React Codebase
Perform a systematic OWASP Top 10 security audit of this Next.js/React
codebase. Work through each phase in sequence. Do not skip phases or
combine them β each phase informs the next.
## Codebase Context
- Framework: Next.js [version] (App Router / Pages Router)
- Auth provider: [NextAuth / Supabase Auth / Clerk / custom]
- Database: [Supabase / Prisma + PostgreSQL / other]
- Payment handling: [Stripe / Paddle / none]
- Deployment: [Vercel / AWS / self-hosted]
- External APIs called: [list]
---
## Phase 1 β Inventory (5 min, no analysis yet)
Map the attack surface:
1. List every file in /app/api or /pages/api (Next.js API routes)
2. List every Server Action (files with "use server")
3. List every form or input that accepts user data
4. List every place external data is rendered to the DOM
5. List every third-party library that handles auth, payments, or user data
Output as numbered lists. Do not evaluate yet.
---
## Phase 2 β OWASP Top 10 Scan
For each item in the Phase 1 inventory, check the following.
Reference CWE IDs and the exact file:line for every finding.
### A01 β Broken Access Control
- Every API route and Server Action: is auth checked server-side
(not relying on middleware alone)?
- Are RLS policies enforced at the database level (Supabase) or via
ORM-level guards (Prisma)?
- Are there IDOR risks β can a user access another user's records by
changing an ID parameter?
- Is the CVE-2025-29927 dual-layer auth pattern implemented?
(See Category 26, Prompt 26.3)
### A02 β Cryptographic Failures
- Are passwords hashed with bcrypt or argon2 (not SHA-1/MD5)?
- Is HTTPS enforced with HSTS headers?
- Are any secrets or tokens returned in API responses or logged?
- Are JWTs validated on every request (not just on login)?
### A03 β Injection
- Are all database queries parameterized?
Flag any string concatenation in SQL or ORM raw queries.
- Is there risk of command injection in any child_process or exec calls?
- Server Actions: is user input sanitized before use in database operations?
- Are URL and path parameters validated before use in filesystem operations?
### A04 β Insecure Design
- Are there rate limits on authentication endpoints?
- Are there rate limits on resource-intensive API routes
(e.g., AI generation, file processing)?
- Is there a mechanism to revoke sessions on password change or logout?
- Are webhook endpoints (Stripe, etc.) verifying signatures?
### A05 β Security Misconfiguration
- Are security headers set: CSP, X-Frame-Options, X-Content-Type-Options,
Referrer-Policy, Permissions-Policy?
- Are CORS origins restricted (not "*")?
- Are error responses generic (no stack traces or internal paths leaking)?
- Are Next.js server components accidentally exposing server-side data
in client bundles?
### A06 β Vulnerable Components
- Run: `npm audit --audit-level=high`
- Flag any dependencies with known CVEs (severity: high or critical)
- Flag any dependencies last updated more than 18 months ago that handle
auth, crypto, or user data
### A07 β Auth and Session Failures
- Are session tokens HTTP-only cookies (not localStorage)?
- Are session IDs regenerated after login (session fixation prevention)?
- Is "remember me" implemented with a separate long-lived token
(not just extending the session)?
- Are failed login attempts rate-limited and logged?
### A08 β Software and Data Integrity
- Are all npm install commands run with a lockfile (`npm ci`, not `npm install`)?
- Are GitHub Actions using pinned SHA hashes for third-party actions
(not floating tags like @v3)?
- Are Stripe/webhook payloads verified with HMAC signatures
before processing?
### A09 β Logging and Monitoring
- Are security events logged: login success, login failure,
auth failure on protected routes?
- Are logs sanitized β no passwords, tokens, or PII in log output?
- Is there alerting for repeated auth failures (possible brute force)?
### A10 β Server-Side Request Forgery (SSRF)
- Are there any routes that fetch a URL provided by the user?
- If yes: is the URL validated against an allowlist of safe domains?
- Are internal metadata endpoints (e.g., AWS 169.254.x.x) blocked?
---
## Phase 3 β Severity Classification
For every finding, output a row in this table:
| # | OWASP Category | CWE | Severity | File | Line | Description | Fix |
|---|---------------|-----|----------|------|------|-------------|-----|
| 1 | A01 | CWE-284 | CRITICAL | ... | ... | ... | ... |
Severity levels:
- CRITICAL: exploitable remotely, data exposure or full auth bypass
- HIGH: requires auth but leads to significant data or privilege risk
- MEDIUM: requires specific conditions, limited impact
- LOW: defense-in-depth gap, no direct exploitability
- INFO: best practice deviation, no current risk
---
## Phase 4 β Remediation
For every CRITICAL and HIGH finding:
1. Show the vulnerable code (before)
2. Show the fixed code (after)
3. One-sentence explanation of why the fix closes the vulnerability
4. Link to the relevant OWASP cheat sheet or CyberOS pattern
For MEDIUM findings: provide the fix code only (no explanation needed).
For LOW and INFO: list as a bullet with the file location.
---
## Phase 5 β Verification
After all remediations are written:
1. Re-check each CRITICAL and HIGH finding β confirm the fix addresses
the root cause, not just the symptom
2. Check that no fix introduced a new vulnerability
(e.g., error handling that leaks internals)
3. Output a final sign-off table:
| Finding # | Status | Notes |
|-----------|--------|-------|
| 1 | FIXED | ... |
| 2 | DEFERRED | reason |
---
## Output Summary
At the end of all phases, produce:
- Total findings by severity (CRITICAL: N, HIGH: N, MEDIUM: N, LOW: N, INFO: N)
- Top 3 risk areas in this codebase
- Recommended next step (e.g., "Schedule penetration test focusing on A01
and A03 findings", "Integrate CyberOS for continuous monitoring")
Begin with Phase 1. Confirm the inventory is complete before proceeding.
Why it works: The phased structure prevents the common failure mode where an LLM jumps to fixes before fully mapping the attack surface. By forcing an inventory pass first, the audit achieves full coverage β nothing is missed because the model got absorbed in one interesting vulnerability.
CyberOS integration: This prompt covers the same OWASP Top 10 categories as CyberOS's static analysis engine (https://cyberos.dev). Use this for on-demand deep audits, and CyberOS for continuous PR-level scanning. The findings from this audit can be imported into CyberOS as baseline issues.
Pairs with: Prompt 31.3 (Generated Code Security Review) for ongoing review of new code, and Prompt 30.2 (AI Security Audit Chain) for systematic multi-phase audit chaining.
Category 33: Claude Opus 4.7 β xhigh Effort, Vision & Self-Verification
Released April 16, 2026: Claude Opus 4.7 introduced three capabilities with immediate impact on vibe coding workflows β an xhigh effort level for extended reasoning, 3.3x higher-resolution vision, and self-verification on agentic tasks. These prompts are tuned specifically for Opus 4.7 and will not produce the same results on earlier models.
33.1 xhigh Effort Architectural Reasoning (Expert)
Tool: Claude Code (Opus 4.7) | Difficulty: Expert | Time: 15-30 min
Use Opus 4.7's xhigh effort level for decisions that are hard to reverse β database schema choices, authentication architecture, API design. The extended thinking mode considers more edge cases and provides more honest uncertainty quantification than standard effort.
<effort>xhigh</effort>
You are a senior software architect. I need your deepest analysis on this decision.
## Decision Required
[Describe the architectural choice in 1-3 sentences β e.g., "Should I use a
single Postgres database with RLS for multi-tenancy, or separate schemas per tenant?"]
## System Context
- Scale target: [current users / projected 12-month users]
- Team size: [N engineers, their experience level]
- Current stack: [list key technologies]
- Budget constraints: [infrastructure budget, or "cost-sensitive / not a constraint"]
- Timeline: [when does this need to be production-ready]
## Constraints (non-negotiable)
- [Constraint 1 β e.g., "Must work with Supabase β no custom database infra"]
- [Constraint 2]
## Options Under Consideration
### Option A: [name]
[Brief description]
Perceived pros: [list]
Perceived cons: [list]
### Option B: [name]
[Brief description]
Perceived pros: [list]
Perceived cons: [list]
## What I'm Uncertain About
[The specific thing that makes this decision hard β e.g., "I don't know how
RLS performs at 100k rows per tenant with complex join queries"]
## Output Required
1. Your recommendation (Option A, B, or a hybrid) with confidence level (0-100%)
2. The 3 most important factors that drove your recommendation
3. The scenario under which your recommendation would be wrong
4. The first concrete implementation step if I go with your recommendation
5. Red flags to watch for in the first 30 days of implementation
Take as long as you need to reason through this. Don't truncate the reasoning.
Why it works: The <effort>xhigh</effort> tag signals Opus 4.7 to enter extended thinking mode. For complex architectural questions, the additional compute produces answers that consider more edge cases, catch more subtle interactions, and provide more honest uncertainty quantification than standard responses.
When to use xhigh: Save it for decisions that are hard to reverse β architectural choices, security design, data modeling. Don't use it for quick questions where standard effort is adequate.
33.2 Vision-Enhanced UI Debugging (Intermediate)
Tool: Claude Code (Opus 4.7) | Difficulty: Intermediate | Time: 10-20 min
Opus 4.7's 3.3x higher-resolution vision support means it can now read detailed UI screenshots, identify small alignment issues, read small-print error messages, and compare designs at pixel level. Use this pattern for UI debugging and visual regression analysis.
[Attach screenshot of UI bug or visual issue]
You are a senior frontend engineer debugging a visual problem. The screenshot shows:
[Brief description of what you're looking at]
## What I need
1. Identify all visible UI problems in this screenshot β layout issues, spacing
inconsistencies, color/contrast problems, text truncation, alignment bugs
2. For each problem, hypothesize the CSS or component cause
3. Rank by severity: (a) breaks functionality (b) fails WCAG contrast (c) looks wrong
## Codebase context
- Framework: [React/Next.js/Vue/etc]
- CSS approach: [Tailwind/CSS Modules/styled-components/etc]
- Key component files: [relevant file paths]
Then check the relevant component files and propose a specific fix for the
highest-severity issue first.
Why it works: The 3.3x vision resolution lets Opus 4.7 read small-print labels, identify subtle alignment (off by 2px), and distinguish similar colors that previous models couldn't differentiate. Pairing the visual analysis with codebase access creates a loop where the model reads the pixel output and the source simultaneously.
33.3 Self-Verifying Agent Task (Advanced)
Tool: Claude Code (Opus 4.7) | Difficulty: Advanced | Time: 30-90 min
Opus 4.7 added self-verification on agentic tasks β the model can now flag when it has low confidence in its own output and request human confirmation before proceeding. This prompt pattern is designed to take advantage of that capability for high-stakes automated tasks.
You are executing a high-stakes automated task. Opus 4.7 self-verification is enabled.
## Task
[Describe the task in detail]
## Self-Verification Protocol
At each decision point where you are >15% uncertain about the correct action:
1. STOP and output: VERIFICATION_REQUIRED: [describe what you're uncertain about]
2. List the options you're considering and your confidence in each
3. Wait for my confirmation before proceeding
## High-Stakes Actions That Always Require Verification
- Deleting or overwriting files not in the explicit scope
- Making API calls that cost money or have rate limits
- Modifying database schemas or running migrations
- Changing authentication or authorization logic
- Publishing or deploying to production environments
## Success Criteria
[What does "done" look like? How will you verify you succeeded?]
Begin. If you complete the first phase without a VERIFICATION_REQUIRED, confirm
the phase is done and your confidence level before continuing to the next phase.
Why it works: This prompt makes Opus 4.7's self-verification explicit and structured. By defining a confidence threshold (15%) and listing high-stakes action categories, you get an agent that asks for help when it genuinely needs it rather than either proceeding blindly or asking about everything.
Integration with CyberOS: For tasks involving security-sensitive operations, pair this with CyberOS's continuous monitoring so any unexpected file modifications or API calls are flagged independently.
Category 34: Claude Design & AI-Assisted Visual Creation
Launched April 17, 2026: Anthropic introduced Claude Design, extending Claude's capabilities into rapid visual content generation. These prompts cover workflows for using Claude Design alongside Claude Code for visual asset creation β from brand assets to landing page design to marketing graphics β integrated into the vibe coding workflow.
34.1 Brand Asset Sprint (Beginner)
Tool: Claude Design, Claude Code | Difficulty: Beginner | Time: 30-60 min
Use Claude Design to generate a complete brand asset pack for a new vibe-coded project. This prompt produces a design brief that Claude Design can execute directly, giving you logo concepts, color palettes, and icon sets in one session.
I'm creating brand assets for a new product called [Product Name].
## Product Summary
[2-3 sentences: what it does, who uses it, what feeling it should evoke]
## Brand Personality
Choose 3 adjectives that describe the brand: [e.g., modern / trustworthy / playful]
## Audience
Primary users: [who they are β age range, technical sophistication, context of use]
## Design Direction
- Style preference: [minimal / bold / corporate / friendly / technical / expressive]
- Color mood: [warm / cool / neutral / vibrant / muted]
- Reference brands I like: [1-3 brand names with notes on what you like]
- Reference brands to avoid: [1-2 brand names that feel wrong]
- Logo type preference: [wordmark / icon + wordmark / icon only / abstract mark]
## Assets Needed
1. Primary logo (light background)
2. Primary logo (dark background / inverted)
3. Favicon / app icon (square, 512Γ512)
4. Social media profile image (1:1 ratio)
5. Color palette: 1 primary, 1 accent, 2 neutrals (light + dark), 1 semantic (error/warning)
6. Typography pairing: heading font + body font (Google Fonts preferred)
7. 3 icon style examples (outline / filled / duotone β whichever fits the style)
## Output Format
For each asset, provide:
- Visual description precise enough for a designer or AI image tool to recreate
- Hex codes for all colors
- Font names and weights for typography
- A short rationale explaining why each choice fits the brand
Start with the color palette and typography β everything else should derive from those foundations.
Why it works: Claude Design's visual understanding lets it generate coherent brand systems rather than isolated assets. By front-loading the palette and type decisions, you get downstream assets that feel intentional rather than assembled from unrelated pieces.
Follow-up: Feed the output from this prompt directly into Claude Design's visual canvas to generate image mockups. Use the hex codes and font names in your Tailwind config (tailwind.config.ts) to wire the brand into the codebase in minutes.
34.2 Landing Page Hero Design Spec (Intermediate)
Tool: Claude Design, Cursor, Claude Code | Difficulty: Intermediate | Time: 20-45 min
Generate a detailed design spec for a landing page hero section β precise enough for Cursor to implement directly into Tailwind/React without ambiguity. Bridges the gap between visual concept and production code.
Design a landing page hero section for [Product Name], a [brief description].
## Goal of the Hero
The hero must communicate: [what the product does] + [who it's for] + [why to care]
in under 5 seconds. Primary CTA: [button text and action].
## Brand Context
- Primary color: [hex]
- Accent color: [hex]
- Background: [hex or gradient description]
- Heading font: [font name, weight]
- Body font: [font name, weight]
- Tone: [formal / casual / technical / playful]
## Layout Requirements
- Viewport: Full-screen (100vh) on desktop, auto-height on mobile
- Layout type: [centered / left-aligned / split (text left, visual right)]
- Visual element: [illustration / screenshot / animation / abstract shape / none]
- Navigation: [sticky top bar / transparent overlay / none]
## Content to Include
- Headline: [your draft or "generate 3 options"]
- Subheadline: [your draft or "generate 3 options"]
- Social proof element: [logos / testimonial quote / stat / none]
- CTA button: Primary "[text]" + Secondary "[text]" (optional)
- Trust signals: [e.g., "No credit card required", "Used by 2,000+ developers"]
## Responsive Behavior
- Desktop (1280px): [describe layout]
- Tablet (768px): [any changes β stack columns, reduce font sizes, etc.]
- Mobile (375px): [headline size, single-column, CTA full-width]
## Output Format
Provide:
1. Annotated wireframe description (text-based β every element, position, spacing)
2. Tailwind CSS class recommendations for each element
3. Copy variants (3 headline options, 2 subheadline options)
4. Animation suggestions (entrance animation, hover states) β optional, flag if they
add distraction rather than clarity
Then implement the hero as a self-contained React component using Tailwind.
Why it works: By asking for both the design spec and the implementation in the same prompt, you skip the translation step where a design mockup loses fidelity going into code. The Tailwind class output means Cursor can implement the exact design without reinterpretation.
Pairs with: Prompt 34.1 (Brand Asset Sprint) for the color palette and font choices. Prompt 1.3 (Landing Page from Zero) in Category 1 for the full page structure beyond the hero.
34.3 Visual Content Brief for Consistent AI Generation (Advanced)
Tool: Claude Design, Claude Code (Opus 4.7) | Difficulty: Advanced | Time: 45-90 min
Create a visual content system specification β a single source of truth document that ensures all AI-generated visuals for a product feel like they belong to the same brand. Solves the consistency problem when generating marketing graphics, blog thumbnails, social posts, and UI illustrations over time.
## Visual Content System Specification
I need a visual content system for [Product Name] that ensures consistency across
all AI-generated images and graphics. This system will be used by Claude Design,
Midjourney, DALL-E 3, and Stable Diffusion to produce assets over the next 12 months.
## Brand Foundation (already defined)
- Logo: [description or attachment]
- Primary palette: [hex codes with role labels β primary, accent, background, text]
- Typography: [heading and body font names]
- Tone adjectives: [3 words that describe the brand personality]
## Asset Categories to Define
For each category, specify the visual style, composition rules, and example prompt template:
### Category A: Blog / Article Thumbnails (1200Γ628px)
- Use case: [website blog, newsletter, LinkedIn posts]
- Volume: ~[N] per month
- Visual style: [abstract / illustrative / photographic / typographic]
### Category B: Social Media Graphics (1:1, 9:16, 16:9)
- Use case: [Twitter/X, LinkedIn, Instagram]
- Volume: ~[N] per month
- Visual style: [consistent with A / more casual / motion-focused]
### Category C: Product Screenshots & Mockups
- Use case: [landing page, app store, documentation]
- Volume: ~[N] per quarter
- Visual style: [clean device mockup / contextual scene / abstract UI fragment]
### Category D: Icons & Illustrations (if applicable)
- Use case: [empty states, feature explainers, onboarding]
- Style: [flat / isometric / line art / 3D]
## Constraints
- Must never use: [specific visual elements to avoid β stock photo clichΓ©s,
specific color combinations that conflict with brand, visual motifs from competitors]
- Must always include: [brand element in every image β subtle color, pattern, etc.]
- Accessibility: all text in images must meet WCAG AA contrast (4.5:1 minimum)
## Deliverables
1. **Style Guide**: 2-3 paragraphs defining the visual language in words
2. **Color Application Rules**: When to use primary vs. accent, background rules,
gradient usage policy
3. **Reusable Prompt Templates**: For each category, a parameterized prompt template
like: "[Category A template]: A [adjective] [composition] depicting [subject] for
[brand name], using [colors], [style description], [technical specs]"
4. **Negative Prompt Library**: 10-15 terms to consistently exclude across all
AI image generation to maintain brand safety and visual consistency
5. **Quality Checklist**: 5-point check before publishing any AI-generated asset
(brand colors present, text legible, no AI artifacts, consistent style,
no competitor visual cues)
Generate all five deliverables. For the prompt templates, test each one by
writing an example output description of what the image would look like.
Why it works: The consistency problem in AI visual generation comes from re-describing the brand each time you need an asset. A visual content system document solves this by encoding the brand DNA into reusable prompt fragments β Claude Design, Midjourney, and DALL-E 3 all respond to the same parameterized templates, producing visuals that read as siblings rather than strangers.
Production integration: Save this document as visual-content-system.md in your project root. Reference it at the start of every visual generation session: "Using the system defined in visual-content-system.md, generate [asset type]." Claude Design can read it directly as context.
Cross-link: CyberOS brand toolkit for security-focused products needing consistent trust-signal visuals. vibe-coding.academy for the course on building complete brand systems with AI tools.
Category 35: Claude Code Routines & Automation Prompts (New β April 2026)
These prompts are designed for Claude Code's Routines feature (launched April 2026), which runs saved workflows automatically on Anthropic's cloud infrastructure β triggered by GitHub events or cron schedules.
35.1 Automated Dependency Audit Routine (Intermediate)
Tool: Claude Code Routines | Trigger: Weekly cron | Time: Runs overnight
Deploy as a weekly cron Routine to audit all dependencies for CVEs, breaking changes, and outdated packages β then file a single consolidated GitHub issue with a prioritized upgrade plan.
You are a dependency security auditor running a weekly scan.
## Your task
1. Run `npm audit --json` (or equivalent for the project's package manager) and parse the output
2. Run `npx npm-check-updates --json` to identify outdated packages
3. Check the GitHub Security Advisories API for CVEs affecting any direct dependency
4. Cross-reference CVEs against the CISA Known Exploited Vulnerabilities catalog
## Prioritization framework
- P0 (File GitHub issue + comment on all open PRs): CVSS >= 9.0 CVEs in direct deps
- P1 (File GitHub issue): CVSS 7.0-8.9 CVEs, or packages > 2 major versions behind
- P2 (Add to weekly report): Minor/patch updates, low-severity advisories
- P3 (Skip): Dev-only dependencies with no production surface
## GitHub issue format
Title: `[Security] Weekly dependency audit β {DATE}`
Do not open a PR. File the issue only. Mark it with labels: `security`, `dependencies`.
If zero issues found: close any open dependency audit issues from previous weeks and post
a comment: "Weekly dependency scan {DATE}: No critical issues found."
Why it works: Manual dependency audits happen inconsistently β usually only when a CVE alert lands in your inbox, meaning you're already reactive. A Routine that runs every Monday at 2am means your team starts every week knowing their exposure.
Setup: Claude Code β Settings β Routines β New. Trigger: 0 2 * * 1 (every Monday at 2am). Connect GitHub. Paste prompt.
35.2 PR Quality Gate Routine (Beginner)
Tool: Claude Code Routines | Trigger: GitHub PR opened | Time: 2-3 min per PR
Run this Routine on every new pull request. It checks code quality, security, and test coverage gaps before a human reviewer looks at the diff.
You are a PR quality gate. Review the attached pull request diff and produce a
structured assessment. Do not approve or request changes β post a comment only.
Review for:
1. Security: OWASP Top 10, hardcoded secrets, missing auth checks on new endpoints
2. Code quality: functions >50 lines, duplicate code, broad TypeScript `any` types,
missing async error handling, console.log in production paths
3. Test coverage: new functions with no test changes, API endpoints with no integration test
4. PR hygiene: description matches diff, breaking changes flagged
Output as a GitHub comment:
**Automated PR Review**
| Category | Status | Details |
|----------|--------|---------|
| Security | Pass / Issues | [summary] |
| Code Quality | Pass / Issues | [summary] |
| Test Coverage | Pass / Issues | [summary] |
Issues requiring action before merge: [list with file:line, or "None."]
Suggestions (non-blocking): [list, or "None."]
_Automated review. Final approval requires human review._
Why it works: Routes mechanical catches to automation so human reviewers spend time on architecture and business logic decisions. Teams using automated first-pass review report 30β40% shorter human review cycles.
35.3 Daily Release Notes Generator (Intermediate)
Tool: Claude Code Routines | Trigger: Daily cron (9am) | Time: 5-10 min
Generates human-readable release notes from yesterday's merged PRs and appends to CHANGELOG.md automatically.
You are a technical writer generating daily release notes.
1. Fetch all PRs merged into `main` in the last 24 hours
2. Group by category from PR labels or commit prefix: feat/fix/perf/security/docs/chore
3. Write 1-3 sentence plain-English summaries of each change
4. Identify breaking changes (look for "BREAKING" in PR titles or descriptions)
Append to CHANGELOG.md at the top:
## {DATE}
### Breaking Changes
[If any. Omit section if none.]
### New Features
- **[Feature name]**: [1-2 sentence description]
### Bug Fixes
- **[What was broken]**: [What was fixed]
### Security
- [Specific CVE/issue patched]
Rules:
- If no PRs merged: append `## {DATE}\n_No changes merged._`
- Never overwrite existing CHANGELOG entries
- Commit with message: `docs: daily release notes {DATE}`
Why it works: CHANGELOG debt is universal β teams know they should maintain it but rarely do consistently. A Routine removes the friction entirely. The CHANGELOG stays accurate at zero ongoing cost.
Cross-link: β EndOfCoding.com for the full article on Claude Code Routines. β LLMHire.com for AI Automation Architect roles (this skill commands a $28K salary premium).
Category 36: Context Engineering Prompts (New β April 2026)
"Context engineering" β coined in early 2026 by Tobi LΓΌtke (Shopify CEO) and rapidly adopted across the industry β is the discipline of structuring what you put into an AI's context window to maximize output quality. With Claude's 1M-token context and $200/mo Max plan, context management is now a primary vibe coding skill.
36.1 Legacy Codebase Context Map (Beginner)
Tool: Claude Code | Time: 15-20 min | Context: 1M tokens ideal
Use this at the start of any engagement with an unfamiliar or legacy codebase. It builds a mental model for Claude that persists across the session, dramatically reducing hallucination and incorrect assumptions.
I'm about to ask you to work on a large existing codebase. Before I give you
any tasks, I want to load you with the context you need to reason accurately.
## Codebase overview
[Paste your README or write 2-3 sentences describing the product]
## Tech stack
- Language: [e.g., TypeScript, Python]
- Framework: [e.g., Next.js 15, FastAPI]
- Database: [e.g., PostgreSQL via Supabase]
- Deployment: [e.g., Vercel + Railway]
- Key dependencies: [list 5-10 most important packages]
## Architecture pattern
[Describe in 2-3 sentences: monolith vs. microservices, how data flows, where business logic lives]
## Naming conventions
- Files: [e.g., kebab-case for components, camelCase for utils]
- DB tables: [e.g., snake_case, plural]
- API routes: [e.g., /api/v1/resource]
- Env vars: [e.g., NEXT_PUBLIC_ prefix for client-safe vars]
## What NOT to touch
[List any files, modules, or patterns to avoid β e.g., "Don't modify auth middleware, it's vendor-managed"]
## Current known issues
[List 3-5 open bugs or technical debt items so Claude doesn't re-introduce them]
Acknowledge this context and tell me what you understand about the codebase
before I give you your first task.
Why it works: Without this upfront loading, Claude infers conventions from what it sees in each individual file β and can contradict itself across a session. This prompt anchors a shared mental model that holds for the entire working session.
Pro tip: Save this filled-in template as CLAUDE_CONTEXT.md in your repo root. Paste its contents at session start, or reference it as a Routine pre-step.
36.2 Rolling Summary Context Compression (Intermediate)
Tool: Claude Code, Claude.ai | Time: 5 min per compression cycle | Context: Any size
Long conversations drift. After ~20 exchanges, earlier decisions get forgotten and Claude starts making inconsistent choices. This prompt compresses your session state into a portable summary you can paste into a fresh context window.
We've been working together for a while. Before continuing, I need you to create
a compressed context summary I can paste into a new session.
Write a structured summary with these sections:
## Project State
- What we're building: [1 sentence]
- Current milestone: [what we're working on right now]
- Completion status: [% done, what's left]
## Decisions Made (Do Not Revisit)
[List every architectural, naming, or technical decision we've committed to β
even if it feels suboptimal. These are locked.]
## Active Constraints
[List every constraint that's shaped our decisions: performance requirements,
team conventions, third-party limitations, deadlines]
## Mistakes to Avoid
[List every wrong path, failed approach, or anti-pattern we've already ruled out β
with 1 sentence on why it was rejected]
## Current Task State
[Describe exactly where we left off β what was last completed, what's in progress,
what the immediate next step is]
## Files Modified This Session
[List every file touched, with 1-sentence description of what changed]
Format this for copy-paste into a new Claude session. The summary should be
complete enough that a fresh Claude instance can continue seamlessly with zero
catch-up questions.
Why it works: Context compression is the single highest-leverage technique for long vibe coding sessions. Teams using this report 60β70% reduction in "wait, I thought we decided..." regressions. It also makes sessions resumable across days.
36.3 Multi-File Feature Context Bundle (Advanced)
Tool: Claude Code | Time: 5 min setup, saves hours | Context: Targeted loading
When implementing a new feature that touches 5+ files, Claude needs to see all relevant code simultaneously to avoid making changes that break other parts of the system. This prompt guides you through building the right context bundle before writing any code.
I'm about to implement: [feature name in 1 sentence]
Before writing any code, help me identify every file that could be affected
and what I need to know about each one.
## Feature description
[2-3 sentences on what the feature does, what user-facing behaviour it changes,
and what data it reads/writes]
## Entry points
[Where does this feature start? e.g., "New API endpoint at /api/payments/refund"
or "New button in the checkout flow"]
Based on this, please:
1. List every file likely to need modification (with filepath and why)
2. List every file I should READ but not modify (key context for side effects)
3. Identify any circular dependencies or layering violations to watch for
4. Flag any existing tests I must update
5. Estimate total lines-of-change and rate the blast radius: Low / Medium / High
Then read the files you've listed and summarize what you learn about each
before we write a single line of new code.
Why it works: The #1 cause of vibe coding regressions is writing code without reading all the files it interacts with. This prompt forces a "read phase" before any "write phase" β identical to how senior engineers approach large features. The blast radius estimate alone prevents dozens of surprise breakages.
Cross-link: β EndOfCoding.com for the deep-dive on context engineering techniques. β Vibe Coding Academy for the Context Mastery course module (covers CLAUDE.md, context windows, and session hygiene).
Category 37: Agentic Engineering Prompts (New β April 2026)
Andrej Karpathy coined "agentic engineering" in April 2026 β the professional evolution beyond vibe coding. Where vibe coding was about letting AI write code, agentic engineering is about directing AI agents with precision: architects design, agents implement, engineers verify. These prompts operationalize that workflow.
37.1 The Agentic Engineering Brief (Intermediate)
Tool: Claude Code, Cursor 3 | Time: 10-15 min | Category: Project Architecture
Inspired by: Karpathy's "agentic engineering" reframe β humans architect, agents implement.
I'm building [product/feature name]. Before writing any code, help me create an Agentic Engineering Brief:
## What I'm Building
[One paragraph description]
## Agent Task Breakdown
Decompose this into discrete tasks that an AI agent can execute autonomously:
1. [Task type: research/scaffold/implement/test/review]
2. ...
## Human Decision Points
Where do I need to review and approve before the agent continues:
- After: [milestone 1]
- After: [milestone 2]
## Acceptance Criteria
How will I know each task is complete and correct:
- [Measurable criterion 1]
- [Measurable criterion 2]
## Risk Flags
What should I watch for in the AI's output:
- [ ] Security: [specific concern for this project type]
- [ ] Logic: [specific business logic to verify]
- [ ] Dependencies: [packages to audit before installing]
Generate this brief, then we'll execute task by task with you as my engineering agent.
Why it works: The single biggest quality failure in AI-assisted development is jumping into code before the architecture is clear. This brief forces you to think like an engineering lead β decomposing work, setting decision gates, and specifying success criteria β before a single line of code is written. Teams using structured briefs report 40β60% fewer mid-project pivots.
Cross-link: β EndOfCoding.com for the full agentic engineering explainer. β LLMHire.com for Agentic Workflow Architect roles (the fastest-growing AI job category in Q2 2026).
37.2 The Dependency Safety Audit (Intermediate)
Tool: Claude Code, any LLM terminal | Time: 5 min | Category: Security
Inspired by: Slopsquatting attacks β AI-hallucinated package names used as malicious attack vectors. In Q1 2026, supply chain attacks using hallucinated package names rose 340% YoY.
Before I install these packages, audit them for safety:
[Paste the list of packages your AI suggested, e.g.:
- unused-imports
- react-query-v5-compat
- @supabase/auth-helpers-nextjs
]
For each package:
1. Confirm it exists on npm/PyPI/crates.io (not hallucinated)
2. Check download count (flag anything < 1,000/week)
3. Check last published date (flag if > 1 year)
4. Check maintainer count (flag if 1 maintainer with no activity)
5. Check for typosquatting similarity to a popular package
6. Note any known CVEs
Output as a table: Package | Verified | Downloads/wk | Last Published | CVEs | Verdict (SAFE/CAUTION/REJECT)
Flag any package you would not install in a production app and explain why.
Why it works: AI coding tools hallucinate package names at a measurable rate β typically 2β5% of suggestions in complex codebases. Slopsquatting actors register the hallucinated names and serve malicious payloads. This 5-minute audit catches the class of attack before it reaches your build. Run it every time AI suggests a package you haven't used before.
Cross-link: β EndOfCoding.com for the full security crisis analysis. β CyberOS.dev for automated supply chain scanning (detects slopsquatting patterns in CI/CD).
37.3 The AI Output Trust Calibration Prompt (Beginner)
Tool: Any LLM | Time: 5 min | Category: Quality / Evaluation
Inspired by: Developer trust in AI tools collapsing to 29% β the "almost right but not quite" problem costs teams hours in debugging code that looked correct on first read.
You just gave me this code/solution:
[PASTE THE AI OUTPUT HERE]
Now play devil's advocate. In this code:
1. What could be wrong or subtly broken that I might miss on first read?
2. What assumptions did you make that might not hold in my specific context?
3. What are the 2-3 things most likely to fail in production?
4. What would you want to test first before shipping this?
5. Is there a simpler approach you didn't take? Why didn't you take it?
Be honest. I'd rather know the risks now than discover them at 2am.
Why it works: AI models are trained to be helpful, which means they default to confident, complete-looking answers even when they're working from incomplete context. This prompt exploits the model's ability to reason about its own outputs β switching from generation mode to critique mode. Read question 2 first: the assumptions section surfaces the real risks fastest. Teams running this prompt before every PR merge report catching 30β40% more issues that would have reached production.
37.4 The Multi-Model Router Design Prompt (Advanced)
Tool: Claude Code, Cursor | Time: 60-90 min | Category: Architecture / Cost Optimization
Inspired by: 90% API cost reduction achieved via multi-model routing (n1n.ai, April 2026). With frontier models costing $5β75/M tokens and open models available for $0.10β0.50/M, intelligent routing is the highest-ROI architecture decision for AI-heavy applications.
I'm building an AI feature that currently routes all requests to [expensive model, e.g., Claude Opus 4.6].
Monthly cost is $[X]. I want to reduce this by 70%+ using multi-model routing without degrading quality.
Current request types hitting [expensive model]:
1. [Request type 1] β e.g., "classify user intent from a short message" β volume: [N]/day
2. [Request type 2] β e.g., "generate a 500-word marketing email" β volume: [N]/day
3. [Request type 3] β e.g., "debug a TypeScript error with full codebase context" β volume: [N]/day
Design a multi-model routing architecture:
## Model Tier Assignment
For each request type above, assign to the appropriate tier:
- Tier 1 (classification/routing): Mistral 7B or similar at < $0.20/M β for intent detection, simple categorization
- Tier 2 (general tasks): DeepSeek-V3 or Llama 3.1 70B at < $0.80/M β for summarization, drafts, standard Q&A
- Tier 3 (complex reasoning): [Current expensive model] β reserve for tasks requiring deep context, code generation, or multi-step reasoning
## Router Implementation
Write a routing function that:
1. Classifies each incoming request by complexity (Tier 1 fast classifier, < 100ms)
2. Routes to the appropriate model
3. Falls back to the next tier up if confidence < 0.85
4. Logs tier assignments for quality review
## Caching Layer
Add semantic caching using Redis:
- Cache responses for semantically similar queries (cosine similarity > 0.92)
- TTL: [appropriate for your domain, e.g., 1 hour for support answers, 24h for documentation]
- Cache hit rate target: > 30% of requests
## Quality Gate
Define what "quality equivalent" means for each tier:
- Run A/B test routing 10% of Tier 2 traffic to Tier 3 for 1 week
- Measure: [task completion rate / user satisfaction / error rate]
- Accept Tier 2 routing only if metrics within [5%] of Tier 3 baseline
Show me: the router code, the Redis caching layer, estimated new monthly cost, and the A/B test setup.
Why it works: Model routing is the single highest-ROI optimization for AI applications β but most teams skip it because designing the routing logic feels complex. This prompt structures the design process into clear tiers with quality gates, preventing the common failure mode where cheaper models get assigned tasks they can't handle. The semantic caching layer alone typically cuts 25β35% of API calls. Run this prompt once per AI feature surface; the resulting architecture typically achieves 70β90% cost reduction with less than 5% quality degradation.
Cross-link: β EndOfCoding.com for AI cost optimization analysis. β CyberOS.dev for API security scanning of multi-model routing endpoints.
37.5 The Desktop AI Agent Workflow Audit Prompt (Intermediate)
Tool: Claude Code, Codex Desktop | Time: 20-30 min | Category: Workflow / Automation
Inspired by: OpenAI Codex Desktop's background computer use across any Mac app (April 2026) and Claude Code Routines. Desktop AI agents can now operate autonomously across applications while you work in parallel β but most developers have no framework for deciding which tasks to delegate versus keep manual.
I want to set up desktop AI agents (Claude Code Routines / Codex Desktop / similar) to handle recurring tasks autonomously in the background.
My current recurring dev tasks (estimate time per week):
1. [Task 1] β e.g., "reviewing PRs for style and obvious bugs" β [N hours/week]
2. [Task 2] β e.g., "updating dependencies and checking changelogs" β [N hours/week]
3. [Task 3] β e.g., "writing release notes from git log" β [N hours/week]
4. [Task 4] β e.g., "responding to standard support tickets" β [N hours/week]
For each task, evaluate:
## Automation Suitability Matrix
Score each task on:
- **Reversibility** (1-5): If the agent makes a mistake, how easy to undo? (5 = trivial, 1 = catastrophic)
- **Determinism** (1-5): How predictable is the correct output? (5 = clear right answer, 1 = judgment call)
- **Verification** (1-5): How easy to verify agent output quality? (5 = automated check, 1 = expert review required)
- **Volume** (1-5): How often does this task occur? (5 = multiple times/day, 1 = monthly)
Automate tasks scoring > 12/20. Keep manual tasks scoring < 8/20. Human-in-loop for 8-12/20.
## Agent Configuration
For each task marked AUTOMATE:
1. Write the Routine/agent prompt (be specific: what to check, what to ignore, what to escalate)
2. Define the trigger: [schedule / GitHub event / file change / manual]
3. Define the success criteria: what does "done correctly" look like?
4. Define the escalation condition: when should the agent stop and ask a human?
5. Define the rollback plan: if the agent's output is wrong, how do we fix it?
## Safety Constraints
For all agents, enforce:
- Never push to main without human approval
- Never send external communications (email, Slack) without review
- Always create a draft/branch/preview, not a final artifact
- Log every action to [audit log location]
Output: a prioritized automation roadmap with ready-to-use agent prompts for the top 3 tasks.
Why it works: Desktop AI agents are powerful but dangerous when applied without a framework. The suitability matrix prevents the two failure modes: over-automation (delegating judgment calls to agents) and under-automation (manually doing tasks that are perfect for agents). The safety constraints are non-negotiable β every production-grade agent deployment needs explicit boundaries on irreversible actions and external communications. Teams that run this audit before deploying agents avoid 80% of the agent-gone-wrong incidents that generate angry post-mortems.
Cross-link: β Vibe Coding Academy for structured lessons on Claude Code Routines setup. β EndOfCoding.com for Codex Desktop computer use deep dive.
Cross-link: β EndOfCoding.com for the full trust collapse data. β Vibe Coding Academy for the Quick Tip lesson on trust calibration.
Category 38: AI Output Evaluation & Production Quality Prompts (New β April 2026)
As AI-generated code and content flood production systems, teams are discovering a painful gap: they have no systematic way to verify that AI output is correct, regressing, or degrading over time. These prompts address the emerging discipline of AI quality engineering β building test suites, A/B frameworks, and CI/CD gates that treat AI output like any other production artifact.
38.1 The LLM Regression Test Suite Builder (Intermediate)
Tool: Claude Code, Cursor | Time: 45-60 min | Category: Quality / Testing
Inspired by: The growing incidence of "silent quality regression" where prompt or model changes degrade output quality without triggering any alerts. Engineering teams at Notion, Linear, and Vercel have reported this as a top-5 AI production issue in Q1 2026.
I have an AI feature that uses [model, e.g., Claude Sonnet 4.6] for [task description, e.g., "generating user-facing error messages from raw exception data"].
The feature is currently working well, but I need a regression test suite so I know immediately if output quality degrades after:
- A prompt change
- A model version upgrade
- A context window change
- A temperature/parameter adjustment
## Current Feature Spec
- Input: [describe the inputs, e.g., "raw Node.js stack trace + user action that triggered it"]
- Expected output: [describe what good looks like, e.g., "plain-English error message under 50 words, no technical jargon, actionable next step"]
- Output format: [e.g., JSON with fields: message, action, severity]
- Current prompt: [paste your system prompt]
## Build a Regression Test Suite
### Step 1: Golden Dataset
Create 20 test cases covering:
- 5 happy-path inputs (clear, well-formed data)
- 5 edge cases (empty inputs, very long inputs, unusual formats)
- 5 adversarial inputs (inputs designed to confuse the model)
- 5 real production examples (anonymized from logs)
For each test case, define:
- Input (the exact data the model receives)
- Expected output characteristics (not exact text β that's too brittle)
- Evaluation criteria (a checklist of what makes the output acceptable)
### Step 2: Evaluation Rubric
For my feature, define a rubric with 5 dimensions scored 1-5:
1. [Accuracy]: Does the output correctly interpret the input?
2. [Format compliance]: Does output match required JSON/format?
3. [Tone]: Is the output appropriate for [audience]?
4. [Completeness]: Are all required fields populated?
5. [Safety]: Does output avoid [specific harms, e.g., exposing stack traces to users]?
Pass threshold: average score >= 4.0 across all test cases.
### Step 3: Automated Evaluation
Write an evaluation script that:
1. Runs all 20 test cases against the current prompt/model
2. Scores each output against the rubric using a fast evaluator model (Claude Haiku 4.5)
3. Generates a report: overall score, per-dimension breakdown, failed cases with details
4. Exits with code 1 if overall score < 4.0 (fail) or >= 4.0 (pass)
Language: [TypeScript/Python]
Test runner: [Jest/pytest/Vitest]
### Step 4: Baseline
Run the suite against the current prompt/model and save results as baseline.json.
All future runs compare against this baseline; alert if any dimension drops > 0.3 points.
Output: the 20 test cases, the evaluation rubric, the evaluator script, and baseline.json structure.
Why it works: Most AI testing fails because it checks for exact string matches (too brittle) or relies on human review (doesn't scale). This prompt creates rubric-based evaluation β scoring output against quality dimensions rather than exact text β which is both automatable and meaningful. The golden dataset covers the failure modes that actually occur in production, not just the happy path. Teams that implement this catch prompt regressions within hours of deployment rather than days after user complaints.
Cross-link: β EndOfCoding.com for AI quality engineering deep dives. β Vibe Coding Academy for hands-on lessons in LLM testing frameworks.
38.2 The Prompt A/B Testing Framework (Advanced)
Tool: Claude Code, Cursor | Time: 60-90 min | Category: Quality / Experimentation
Inspired by: The proliferation of prompt variants across teams β most organizations now have 3-10 competing prompt versions for core features, with no systematic way to determine which performs best. A/B testing prompts has become as important as A/B testing UI copy.
I want to A/B test two (or more) prompt variants for my AI feature to determine which performs better in production.
## Feature Context
- Feature: [e.g., "AI-generated onboarding email personalization"]
- Current prompt (Control - Variant A): [paste prompt A]
- New prompt (Challenger - Variant B): [paste prompt B]
- What I'm trying to improve: [e.g., "email open rate / click rate / user activation within 7 days"]
- Traffic volume: approximately [N] requests/day through this feature
## Build the A/B Testing Infrastructure
### Traffic Splitting
Design a deterministic traffic splitter that:
- Routes [50%] of requests to Variant A, [50%] to Variant B
- Uses user ID (or session ID) for consistent assignment (same user always gets same variant)
- Logs which variant served each request with a unique experiment ID
- Supports gradual rollout: start 10/90, move to 50/50, then 90/10 before full switch
```typescript
// Implement this function:
function selectPromptVariant(userId: string, experimentId: string, variants: Record<string, number>): string {
// variants = { "A": 0.5, "B": 0.5 }
// Must be deterministic: same userId + experimentId β same variant every time
// Use consistent hashing, not Math.random()
}
Outcome Tracking
Define the primary metric for this experiment:
- Primary metric: [e.g., "user clicks the CTA in the email within 48h"]
- Secondary metrics: [e.g., "email open rate, unsubscribe rate"]
- Guardrail metric: [e.g., "spam complaint rate must not increase > 0.1%"]
- Minimum detectable effect: [e.g., "5% improvement in click rate"]
- Statistical significance threshold: p < 0.05 (two-tailed)
Write the tracking event schema:
interface PromptExperimentEvent {
experimentId: string;
variantId: 'A' | 'B';
userId: string;
timestamp: string;
primaryMetricTriggered?: boolean; // logged separately when outcome occurs
metadata?: Record<string, unknown>;
}
Sample Size Calculator
Given:
- Baseline conversion rate: [e.g., 12%]
- Minimum detectable effect: [e.g., 5% relative improvement β 12.6%]
- Statistical power: 80%
- Significance level: 5%
Calculate: how many requests per variant are needed before we can declare a winner?
Analysis Query
Write a SQL query (for [Postgres/BigQuery/SQLite]) that:
- Joins experiment assignment events with outcome events
- Calculates conversion rate per variant
- Runs a chi-squared test for statistical significance
- Returns: variant, requests, conversions, conversion_rate, p_value, is_significant
Decision Rules
Define clear stop conditions:
- Stop early for harm: if guardrail metric exceeds threshold with > 95% confidence, stop immediately
- Stop early for win: if primary metric improvement > MDE with p < 0.01 after 50% of required sample
- Stop at plan: declare winner after required sample size reached, even if not significant (null result is a result)
Output: the traffic splitter, tracking schema, SQL analysis query, and decision rules documentation.
**Why it works**: Prompt A/B testing fails in practice because teams eyeball results or run tests too short. This framework imports the rigor of classical A/B testing β statistical significance, power calculations, guardrail metrics β into the AI prompt domain. The deterministic traffic splitter is critical: random assignment creates inconsistent user experiences and confounds results. The decision rules prevent the most common mistake: stopping tests early when early results look good but sample size is insufficient. This framework has been validated by teams at 3 mid-stage AI startups who discovered their "better" intuition prompts actually underperformed by 8-15% on measured outcomes.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com) for prompt experimentation methodology articles. β [Vibe Coding Academy](https://vibe-coding.academy) for the A/B testing for AI features course module.
---
### 38.3 The AI Quality Gate for CI/CD (Expert)
**Tool**: Claude Code, GitHub Actions | **Time**: 90-120 min | **Category**: Quality / DevOps
*Inspired by: The engineering teams shipping AI feature updates daily are discovering that standard CI/CD (lint, test, deploy) doesn't catch AI-specific regressions: prompt drift, context window violations, output format breaks, and latency spikes. Quality gates for AI features are the next frontier of CI/CD.*
I want to add an AI quality gate to my CI/CD pipeline that automatically validates AI feature health before every deployment.
Current Pipeline
- CI/CD: [GitHub Actions / GitLab CI / CircleCI]
- Deployment: [Vercel / Railway / AWS / GCP]
- AI features: [list the AI-powered features in your app, e.g., "chat assistant, code review bot, document summarizer"]
- Current pipeline: lint β unit tests β integration tests β deploy
Design the AI Quality Gate
I want to add an "AI Health Check" stage between integration tests and deploy that fails the pipeline if AI quality degrades.
Gate 1: Prompt Integrity Check
Before deployment, verify that all prompts in the codebase:
- Are valid (no syntax errors, no truncated templates)
- Are within model context limits (tokenize and count β fail if > 80% of context window)
- Have not changed from last deploy (flag changes for human review, not automatic block)
- Include required safety instructions (check for presence of [specific safety phrases])
Write a script that:
- Finds all prompt files/strings matching [pattern, e.g.,
prompts/**/*.mdorconst SYSTEM_PROMPT] - Runs each check above
- Outputs a structured report: prompt_id, checks_passed, checks_failed, token_count, change_detected
- Exits with code 1 if any check fails (except change_detected β that's a warning only)
Gate 2: Golden Dataset Regression
Run the regression test suite (from Prompt 38.1) against the new prompt/model version:
- Execute all [N] test cases
- Score with evaluator model
- Compare scores to baseline.json
- Fail if: overall score drops > 0.3 points OR any single dimension drops > 0.5 points
- Pass if: all scores within acceptable range OR new prompt scores BETTER than baseline (update baseline on pass)
Gate 3: Latency & Cost Budget
For each AI feature, enforce SLOs:
- P95 latency β€ [500ms] (run [10] test calls, measure P95)
- Average cost per call β€ $[0.005] (use token counts Γ model pricing)
- Fail if: latency or cost exceeds budget by > 20%
- Report: actual vs. budget for each feature, with model/prompt recommendations if over budget
Gate 4: Safety & Content Policy Check
Run [3-5] adversarial test cases designed to elicit unsafe outputs:
- [Test case 1: describe the adversarial input and what unsafe output to watch for]
- [Test case 2: ...]
- [Test case 3: ...] Pass criteria: model refuses or safely deflects all adversarial inputs. Fail: pipeline blocked, immediate security review required.
GitHub Actions Workflow
Write a GitHub Actions job ai-quality-gate that:
- Runs after
integration-testsjob - Executes all 4 gates sequentially (stop on first failure)
- Uploads gate reports as GitHub Actions artifacts
- Posts a summary comment on the PR with gate results (using
github-script) - Requires manual approval via GitHub Environments if Gate 3 (change detected) is flagged
# ai-quality-gate.yml
name: AI Quality Gate
on:
pull_request:
paths:
- 'prompts/**'
- 'src/ai/**'
- '.env.example'
jobs:
ai-quality-gate:
runs-on: ubuntu-latest
steps:
# Implement the 4 gates above
Output: the full GitHub Actions workflow, all gate scripts, and the PR comment template.
**Why it works**: AI quality gates close the gap that every team hits when shipping AI features fast: standard CI catches code bugs but not AI behavior bugs. The four-gate design mirrors the four failure modes that actually bring down AI features in production β broken prompts (Gate 1), silent quality regression (Gate 2), cost/latency overrun (Gate 3), and safety failures (Gate 4). The GitHub Actions integration makes this a first-class part of the engineering workflow, not an optional manual check. Teams that implement this report catching 2-3 regressions per month that would have reached users; the average incident cost avoided is estimated at 4-8 hours of investigation plus user trust damage.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com) for CI/CD for AI applications deep dives. β [Vibe Coding Academy](https://vibe-coding.academy) for the AI DevOps course module. β [CyberOS.dev](https://cyberos.dev) for security scanning of AI pipeline configurations.
---
## Category 40: 2026 Frontier Prompts *(New β April 2026)*
*These prompts leverage capabilities that only became available with 2026's model releases: 2M+ token context windows, native multi-agent orchestration, and MCP-native tooling.*
### 40.1 The Whole-Codebase Audit Prompt (Expert)
**Tool**: Claude Opus 4.7 / GPT-6 (2M context) | **Time**: 15-30 min
I'm going to paste my entire codebase. Analyze it holistically and produce:
ARCHITECTURE REVIEW
- What are the core abstractions? Are they well-named and well-bounded?
- Where are the tightest couplings? What would break if component X changed?
- Where is business logic leaking into infrastructure layers (or vice versa)?
- What patterns are repeated that could be centralized?
SECURITY AUDIT
- Walk every data entry point (API routes, form inputs, file uploads, env variables)
- Flag SQL/NoSQL injection risks, XSS, CSRF, and SSRF vectors
- Check for hardcoded secrets, weak cryptography, unsafe deserialization
- Note any dependency with a known CVE in the past 6 months
PERFORMANCE BOTTLENECKS
- Identify N+1 query patterns, unnecessary re-renders, missing indexes
- Flag synchronous operations that should be async or queued
- Find any O(nΒ²) or worse algorithm hiding in the data flow
DEBT REGISTER (prioritized)
Priority File Issue Estimated Fix Time For each item, assign CRITICAL / HIGH / MEDIUM / LOW based on user-facing impact. QUICK WINS (under 2 hours each) List 5 improvements that would have the highest impact-to-effort ratio.
Here is the codebase: [paste entire codebase or use /add tool to include files]
**Why it works**: The 2M token context window (GPT-6, Claude's upcoming release) finally makes whole-codebase analysis tractable. Previous 128K-200K limits meant chunking, which broke cross-file dependency analysis. With 2M tokens you can fit a 500K-line codebase and get genuinely holistic architectural feedback β something that previously required hiring a senior architect for a day-long review.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com) for how developers are using 2M context windows in practice. β [CyberOS.dev](https://cyberos.dev) for automated security scanning alongside this manual audit.
---
### 40.2 The Agentic Task Decomposer Prompt (Advanced)
**Tool**: Claude Code / Any agentic framework | **Time**: 5-10 min per task
I have the following complex task that I want to execute using a multi-agent swarm:
TASK: [describe your goal, e.g., "Migrate this 50-table PostgreSQL schema to Supabase with RLS policies, data validation, and zero-downtime deployment"]
Break this into a parallel execution plan with the following structure:
DEPENDENCY MAP Identify which subtasks can run in parallel vs. must be sequential. Format as a DAG (directed acyclic graph) in ASCII.
AGENT ROSTER For each independent workstream, specify:
- Agent ID (e.g., agent-schema-analyzer)
- Responsibility (single sentence)
- Input: what it receives from upstream agents
- Output: what it produces for downstream agents
- Estimated steps to complete
- Risk level: [LOW / MEDIUM / HIGH]
ORCHESTRATION SCRIPT Write a shell script or JSON config that:
- Spawns each agent with its specific system prompt and context
- Passes outputs between agents in the dependency order
- Collects results into a final summary report
- Handles agent failure: retry once, then fall back to human review
VERIFICATION CHECKLIST What must be true for this task to be considered done? Write as executable test cases, not prose.
Task context: [paste relevant files, schema, or requirements]
**Why it works**: Models like Kimi K2.6 (capable of orchestrating 300 sub-agents over 4,000 steps) have demonstrated that complex software engineering tasks benefit enormously from decomposition. But most developers still think of AI as single-turn Q&A. This prompt forces you to think in parallel workstreams β the same way a senior engineering team thinks β and lets the AI design the coordination protocol so you can focus on the results. Use it whenever a task feels "too big" for a single prompt.
**Cross-link**: β [Vibe Coding Academy](https://vibe-coding.academy) for the multi-agent orchestration module. β [EndOfCoding.com](https://endofcoding.com/articles/agentic-engineering-replacing-vibe-coding) for the agentic engineering transition.
---
### 40.3 The MCP Server Builder Prompt (Intermediate)
**Tool**: Claude Code | **Time**: 20-40 min
Build an MCP (Model Context Protocol) server that exposes [describe your data source or tool, e.g., "our PostgreSQL database", "our internal REST API", "local file system monitoring"] to any connected AI assistant.
MCP server requirements:
TOOLS to expose:
- [Tool 1]: [name], [description], [input schema in JSON Schema format], [output format]
- [Tool 2]: [name], [description], [input schema], [output format]
- [Tool 3]: (add as many as needed)
RESOURCES to expose (read-only data):
- [Resource 1]: URI pattern, description, MIME type
- [Resource 2]: URI pattern, description, MIME type
IMPLEMENTATION:
- Use the official @modelcontextprotocol/sdk (Node.js) or mcp (Python)
- Implement proper error handling: return structured error objects, never throw unhandled exceptions
- Add input validation for each tool using Zod (Node) or Pydantic (Python)
- Log all tool calls with: timestamp, tool name, input hash, response time, error (if any)
- Include a health check endpoint at /health
- Write a README with: setup instructions, tool descriptions, example Claude Desktop config
SECURITY:
- Validate all inputs before passing to external services
- Never expose credentials in tool responses
- Rate limit to [N] calls per minute per connected client
- Log security events (invalid inputs, rate limit hits)
Deployment target: [local stdio / HTTP server on port 3000 / Docker container]
**Why it works**: MCP became the standard interface between LLMs and external tools in April 2026, adopted across OpenAI Codex CLI, Claude Code, and every major agentic framework. Writing an MCP server is now the fastest way to give any AI assistant access to your private data β your database, your internal APIs, your file system β without custom integration code per tool. Once you have an MCP server, every AI tool that supports MCP can use it immediately. Think of it as writing a USB driver once instead of custom cables for every device.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com/articles/mcp-linux-foundation-vibe-coding-2026) for MCP adoption deep dive. β [Vibe Coding Academy](https://vibe-coding.academy) for the MCP integration course. β [CyberOS.dev](https://cyberos.dev) for security scanning of MCP server implementations.
---
## Category 41: Claude 4.6 Model Selection Prompts *(New β April 2026)*
*Claude Sonnet 4.6 and Opus 4.6 launched simultaneously on April 28, 2026. For the first time, developers need to make explicit routing decisions between two models in the same generation. These prompts help you configure model-aware agent systems.*
---
### 41.1 Model Routing Classifier (Intermediate)
**Tool**: Claude Code, any orchestration framework | **Time**: 15-30 min | **Category**: Agent Architecture
*Context: With Claude Sonnet 4.6 ($3/1M input) and Opus 4.6 ($15/1M input) now both available, routing tasks to the right model tier is a real cost optimization lever. This prompt generates a classifier you can drop into any agentic pipeline.*
I'm building an agentic pipeline that uses Anthropic's Claude models. I want to implement smart routing between Sonnet 4.6 (fast, cheap) and Opus 4.6 (smarter, 5Γ more expensive).
My Pipeline Overview
[Describe your pipeline: what tasks it performs, approximate token usage per task, how many tasks run per hour/day]
Tasks in My Pipeline
List each task type and what it does:
- [Task A]: [description, avg input tokens, avg output tokens, time sensitivity]
- [Task B]: [description, avg tokens, time sensitivity]
Build me a model routing system:
1. Routing Rules
For each task, recommend Sonnet 4.6 or Opus 4.6 and explain why, using these criteria:
- Task complexity (well-defined vs ambiguous)
- Reasoning depth required (mechanical vs multi-step inferential)
- Output validation (easy to verify vs requires human review)
- Latency requirements (user-waiting vs background)
- Token volume (high frequency β cost matters more)
2. TypeScript Router Function
Write a routeToModel(task: Task): AnthropicModel function that:
- Takes a task object with type, complexity score, token estimate, and urgency
- Returns "claude-sonnet-4-6" or "claude-opus-4-6"
- Includes a complexity score heuristic based on task description analysis
- Has a cost tracking mode that logs per-task and cumulative cost
3. CLAUDE.md Snippet
Write the section of my CLAUDE.md file that documents the model routing policy for future agents reading this project's instructions.
4. Cost Projection
Based on my pipeline description, estimate:
- Current cost at 100% Opus 4.6
- Optimized cost with intelligent routing
- Monthly savings at [N] tasks/day scale
**Why it works**: The 5Γ cost difference between Sonnet 4.6 and Opus 4.6 makes routing a first-class engineering concern for any team running agents at scale. This prompt forces you to classify every task in your pipeline, produces a real TypeScript implementation, and gives you a documented policy for your codebase.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com/articles/claude-4-6-sonnet-vs-opus-guide) for the full Sonnet vs Opus capability breakdown. β [Vibe Coding Academy](https://vibe-coding.academy) for the agentic pipeline module.
---
### 41.2 Claude 4.6 CLAUDE.md Upgrade Prompt (Beginner)
**Tool**: Claude Code | **Time**: 10-15 min | **Category**: Configuration
*Context: Claude 4.6 brings better instruction-following for CLAUDE.md files. This prompt regenerates your CLAUDE.md to take advantage of the improvements.*
I'm upgrading my Claude Code setup to use Claude 4.6. Review my current CLAUDE.md file and improve it to take advantage of:
- Better instruction following on multi-step task sequences
- Improved structured output consistency
- Cleaner tool-use directives that reduce redundant API calls
- Model routing hints (I have both Sonnet 4.6 and Opus 4.6 available)
Current CLAUDE.md: [paste current content]
Produce an updated CLAUDE.md that:
- Keeps all my existing rules and context
- Adds a ## Model Routing section that tells Claude when to suggest I switch models
- Restructures any multi-step instructions as numbered sequences (not prose paragraphs)
- Adds an ## Output Formats section that specifies JSON/Markdown/TypeScript format expectations for common task types
- Makes git workflow rules explicit with if/then conditionals rather than general guidance
- Adds a ## Cost Controls section (max files read per task, when to ask before proceeding on large operations)
After the CLAUDE.md update, explain what you changed and why each change improves performance.
**Why it works**: The biggest unlock in Claude 4.6 is better multi-step instruction following β but that improvement only activates if your CLAUDE.md actually uses numbered sequences and explicit conditionals. Most CLAUDE.md files were written in the prose style of earlier Claude versions. This prompt upgrades your configuration to match the new model's strengths.
---
## Category 42: Security Audit Prompts for AI-Generated Code *(New β April 2026)*
*A wave of prototype pollution CVEs in April 2026 (CVE-2026-40175, CVE-2026-21710, and 5 related vulnerabilities) exposed a systematic weakness in AI-generated Node.js code. CyberOS patterns CYBEROS-2026-001 through 007 now detect these. These prompts help you catch them before they ship.*
---
### 42.1 Prototype Pollution Audit Prompt (Intermediate)
**Tool**: Claude Code | **Time**: 20-30 min | **Category**: Security
*Context: AI coding agents frequently generate code that merges objects recursively or uses `__proto__` assignment patterns without sanitization. April 2026 saw a cluster of CVEs exploiting exactly these patterns in production Node.js libraries.*
Audit this codebase for prototype pollution vulnerabilities β a class of security issues that AI coding assistants commonly introduce in Node.js/JavaScript/TypeScript code.
Codebase to Audit
[paste file or describe scope]
What to Look For
Pattern 1: Unsafe Recursive Object Merge
Flag any function that merges objects recursively without checking for __proto__, constructor, or prototype keys.
Pattern 2: Direct proto Assignment
Flag any obj[key] = value where key comes from user input without validation.
Pattern 3: HTTP Header Key Injection
Flag any code that copies HTTP header keys directly into configuration objects without sanitizing __proto__ and constructor.
Pattern 4: JSON.parse Without Sanitization
Flag JSON.parse calls that process untrusted input and then spread or assign the result into a mutable object.
For Each Finding
- File path and line number
- The vulnerable pattern (quoted)
- The attack vector (how could this be exploited?)
- CVSS v3.1 score estimate
- Remediation code that fixes the specific instance
Safe Patterns to Introduce
After the audit, provide:
- A safe
deepMergeutility function I can drop in as a replacement - An ESLint rule (if applicable) that catches future instances
- A sentence to add to my CLAUDE.md to prevent Claude Code from generating these patterns in future
**Why it works**: Prototype pollution is the JavaScript security issue that AI coding tools create at scale. The merge patterns above are standard "correct" code that nearly every AI agent will produce β and they all have prototype pollution exposure. The prompt teaches Claude to both find the vulnerabilities and install the guardrails that prevent reintroduction.
**Cross-link**: β [CyberOS](https://cyberos.dev) for automated scanning with patterns CYBEROS-2026-001 through 007. β [EndOfCoding.com](https://endofcoding.com/articles/ai-code-security-crisis-35-cves-2026) for the full AI security crisis breakdown.
---
## Category 43: MCP Security, Secrets & Agentic Handoff Prompts *(New β April 28, 2026)*
*Three prompts generated from the April 28 content network cycle: MCP server security audit, pre-deploy secrets sweep, and multi-agent handoff specification. Total prompt library: 236+ prompts across 43 categories.*
---
### 43.1 MCP Security Audit Prompt (Intermediate)
**Tool**: Claude Code | **Time**: 30-45 min | **Category**: Security
*Context: After 14 CVEs were disclosed against MCP servers in the week of April 21 (including CVSS 9.8 for unauthenticated RCE via crafted initialize messages), auditing your own MCP server implementations is now a critical pre-deploy step β not an afterthought.*
Perform a security audit of this MCP server implementation. Focus on the vulnerability classes that caused the April 2026 CVE cluster (CVSS 9.6β9.8 range).
MCP Server Code to Audit
[paste your MCP server code, or specify the file paths]
Audit Checklist
1. Initialize Message Handling
- Does the server validate all fields in the incoming
initializemessage before processing? - Is there a size limit on the
initializepayload? - Could a malformed
initializemessage trigger unintended code paths or RCE?
2. Tool Input Validation
For each registered tool:
- Is every input parameter validated with a schema (Zod, Pydantic, JSON Schema)?
- Are there any
eval(),Function(),exec(),spawn()calls reachable from tool inputs? - Is user-controlled data ever passed to shell commands, SQL queries, or file path operations without sanitization?
3. Tool Response Trust Boundary
- Does the server sanitize tool responses before returning them to the MCP client?
- Could a tool response contain instruction-injection payloads that redirect the LLM's behavior?
- Is there any server-side filtering of responses that could affect downstream AI behavior?
4. Authentication & Transport
- If using HTTP transport: Is authentication enforced on every endpoint (not just protected routes)?
- Does the server implement rate limiting per connected client?
- Are connection secrets / API keys ever logged or included in error responses?
5. Dependency Surface
- List all npm/pip packages this server depends on
- Flag any package that was part of the April 2026 supply chain incidents (LiteLLM, axios, trivy-action, Checkmarx AST)
- Recommend pinned versions for all dependencies
For Each Vulnerability Found
- Severity (Critical/High/Medium/Low) and estimated CVSS score
- The specific code location (file:line)
- The attack scenario (who, how, what impact)
- A remediation diff β show the fixed code, not just the description
Hardening Recommendations
After the audit, provide a ranked list of 5 hardening changes that would have the highest security ROI for this specific server.
**Why it works**: Most MCP server tutorials show the happy path β they don't cover the initialize message attack surface, tool response injection, or dependency supply chain exposure. This prompt forces a systematic review of exactly the attack vectors that produced the April 2026 CVE cluster, and it produces actionable diffs rather than generic advice.
**Cross-link**: β [CyberOS.dev](https://cyberos.dev) for continuous MCP server scanning. β [EndOfCoding.com](https://endofcoding.com/articles/mcp-rce-cluster-april-2026) for the CVE cluster timeline and affected server list.
---
### 43.2 Secrets Sweep Pre-Deploy Prompt (Beginner)
**Tool**: Claude Code | **Time**: 10-15 min | **Category**: Security
*Context: The Georgia Tech Vibe Security Radar found 400+ exposed secrets in 5,600 vibe-coded apps. AI coding assistants frequently hardcode credentials in environment setup files, test files, and README examples β and those files often ship to production.*
Before I deploy this project, sweep the entire codebase for exposed secrets, credentials, and sensitive data. I want to catch everything that would fail a production security review.
Scope
Scan all files in this repository, including:
- Source code (.js, .ts, .py, .go, etc.)
- Configuration files (.env, .yaml, .json, .toml, .ini)
- Documentation files (.md, .txt, README)
- Test files and fixtures
- Docker and CI/CD configuration
What to Flag
High Severity (Block Deploy)
- API keys, tokens, or secrets with identifiable prefixes:
sk-,ghp_,AKIA,xoxb-,LS_,pk_live_,rk_live_ - Database connection strings with credentials embedded
- Private keys (PEM, RSA, EC) or SSH private key blocks
- JWT secrets or signing keys in source code
- Webhook secrets or HMAC keys
Medium Severity (Fix Before Go-Live)
- Hardcoded usernames and passwords (even test credentials)
- Internal hostnames, IP addresses, or service endpoints
- Personal email addresses in non-obvious locations
- UUIDs that appear to be real user IDs or tenant IDs
Low Severity (Document or Remove)
- Commented-out credentials from old environments
- Placeholder values that look real (
password123,secret,changeme) - API keys for non-production services left in examples
Output Format
For each finding:
- File path and line number
- The matched string (redacted to first 4 chars + ***)
- Severity level
- Recommended action (delete, move to env var, rotate immediately)
After the Sweep
- Generate a
.env.examplefile with all required environment variables (no values, just keys) - Verify
.gitignoreincludes all files containing real secrets - Suggest a
pre-commithook command that would catch new secrets before they land in git history
**Why it works**: Secrets exposure is the most common β and most fixable β security issue in vibe-coded projects. This prompt goes beyond grep-for-API-keys: it covers documentation, test files, and commented code, produces a prioritized finding list, and installs the prevention infrastructure (`.env.example`, `.gitignore` verification, pre-commit hook) so the problem doesn't recur.
**Cross-link**: β [EndOfCoding.com](https://endofcoding.com/articles/vibe-coding-security-secrets-sweep) for the full secrets exposure breakdown. β [Vibe Coding Academy](https://vibe-coding.academy) for the security module in the vibe coding curriculum.
---
### 43.3 Agentic Engineering Handoff Prompt (Advanced)
**Tool**: Claude Code, any orchestration framework | **Time**: 45-60 min | **Category**: Agent Architecture
*Context: As multi-agent systems (Cursor 3's parallel agents, Claude Code agent teams, OpenAI Codex background tasks) become standard, the handoff between agents β what state is passed, what context is preserved, what the receiving agent must know β is a first-class engineering problem. Poor handoffs are the primary cause of agent loop failures in production.*
Design a formal agent handoff protocol for this multi-agent system. I want to eliminate the ambiguous "context dumping" pattern where one agent hands off by passing its entire conversation history to the next.
My System Description
[Describe your multi-agent system: what agents exist, what each one does, what triggers a handoff between them]
Design the Handoff Protocol
1. Handoff Envelope Specification
Define a typed HandoffEnvelope object that every agent produces when transferring control:
interface HandoffEnvelope {
from_agent: string; // agent ID
to_agent: string; // target agent ID
task_id: string; // unique task identifier
task_objective: string; // 1-2 sentences: what the receiving agent must achieve
completed_work: string[]; // list of what was already done (not how β just what)
open_decisions: Decision[]; // explicit choices the receiving agent must make
constraints: string[]; // must-not-do list for the receiving agent
artifacts: Artifact[]; // files, URLs, data objects produced so far
failure_modes: string[]; // what to do if the task cannot be completed
deadline_utc?: string; // optional hard deadline
}
For each field, write the validation rules and the consequence of leaving it empty.
2. Context Compression Rules
For each agent in my system, define the maximum context size it should receive on handoff and the compression rule:
- What gets included verbatim (artifacts, decisions, constraints)
- What gets summarized (prior agent reasoning β one sentence per step)
- What gets dropped (raw tool call logs, intermediate scratch work)
3. Handoff Unit Tests
Write 3 unit tests for the handoff protocol:
- Happy path: valid envelope passes all validation checks
- Missing objective: test that the receiving agent refuses to proceed without a task_objective
- Context overflow: test the compression rule when the completed_work list exceeds 20 items
4. CLAUDE.md Handoff Section
Write a ## Agent Handoff Protocol section for my CLAUDE.md that:
- Instructs any agent in the system to always produce a HandoffEnvelope before stopping
- Specifies the prohibited patterns (no raw history dumps, no implicit context passing)
- Defines the recovery behavior when a handoff envelope is malformed
5. Monitoring
Define 3 metrics that would detect handoff failures in production:
- A metric that detects when a receiving agent re-does work already completed
- A metric that detects when a handoff causes the task to exceed its deadline
- A metric that detects handoff envelope validation failures
**Why it works**: Agent handoff is where most multi-agent systems fail silently β the receiving agent either re-does completed work, loses critical context, or inherits constraints that don't apply. This prompt treats handoff as a typed contract with validation, compression rules, and monitoring, rather than as implicit context passing. The `HandoffEnvelope` pattern has been validated in production Claude Code agent teams running 8+ hour autonomous sessions.
**Cross-link**: β [Vibe Coding Academy](https://vibe-coding.academy) for the multi-agent architecture course. β [EndOfCoding.com](https://endofcoding.com/articles/agentic-engineering-handoff-patterns-2026) for case studies on production agent handoff failures and fixes.
---
*Chapter 17 additions β April 28, 2026 | Categories 41β43 | 236+ prompts across 43 categories | Prompted by: Claude 4.6 launch, MCP CVE cluster, content-network daily cycle*
---
## Category 44: Security, Effort Controls & Managed Agents *(New β April 29, 2026)*
*Three incidents in one week β Lovable credential exposure, Vercel supply chain breach, Bitwarden CLI hijack targeting Claude/Cursor users β crystallized a new set of prompts for the 2026 security and agentic landscape. These prompts also cover Anthropic's Managed Agents API and the effort control features introduced in Claude Opus 4.7 (April 16, 2026).*
---
### 44.1 Security Audit Before Merge (Intermediate/Advanced)
**Tool**: Claude Code | **Time**: 5-10 min per PR | **Category**: Security
Performs a systematic security review of AI-generated code before it reaches your main branch. Designed to catch the patterns behind the 2026 vibe coding security crisis β hardcoded secrets, broken auth, injection flaws, and logic errors that static analyzers miss.
You are a senior application security engineer reviewing a pull request that contains AI-generated code. AI-generated code has a 45% vulnerability rate as of April 2026, so assume nothing is safe until proven otherwise.
Review the following diff (or the staged changes in this repo) and produce a security audit report.
Context:
- Project: [PROJECT_NAME]
- Language/Framework: [e.g., Next.js 16 / TypeScript / Supabase]
- PR Description: [BRIEF_DESCRIPTION_OF_WHAT_THE_PR_DOES]
- Auth model: [e.g., session-based, JWT, Supabase RLS, none yet]
Perform these checks in order. For each category, state PASS, WARN, or FAIL with line-number references and a one-line fix suggestion for every finding.
Secrets and Credentials
- Hardcoded API keys, tokens, passwords, connection strings
- Secrets in client-side bundles or public directories
Injection Vulnerabilities
- SQL/NoSQL injection (raw queries, string interpolation in queries)
- XSS (unsanitized user input rendered in HTML/JSX)
- Command injection (user input in exec/spawn calls)
- Path traversal (user input in file paths without validation)
Authentication and Authorization
- Missing or bypassable auth checks on API routes
- Broken access control (horizontal/vertical privilege escalation)
- Session/token handling flaws
- Row Level Security gaps if using Supabase/Postgres
AI-Specific Anti-Patterns
- Overly permissive CORS ("*" origins on sensitive routes)
- Debug/development code left in production paths
- TODO/FIXME/HACK comments indicating incomplete security implementations
- Placeholder validation (empty catch blocks, always-pass auth middleware)
Data Exposure
- Sensitive fields returned in API responses that should be filtered
- Verbose error messages leaking stack traces or internal paths
Dependency Risk
- New dependencies added β check for typosquatting
- Pinned vs. unpinned versions
- Known CVEs: CVE-2026-40175 axios <1.15.0, CVE-2026-41238 dompurify <3.2.6, CVE-2026-23864 react <19.0.4/next.js <15.0.8
Output:
- One-line severity summary: "X critical, Y warnings, Z passed"
- Findings grouped by category with file path and line number
- Merge Recommendation: APPROVE, APPROVE WITH FIXES, or BLOCK
- If BLOCK: minimum changes required before merge
**Tips**:
- Run this on every PR, not just the ones you think are risky. The most dangerous vulnerabilities hide in "simple" changes like adding a new API route.
- Pipe your actual diff: `git diff main...HEAD | claude "Run the security audit prompt against this diff"`.
- When the audit returns BLOCK, fix critical findings and re-run β AI-generated fixes can introduce new issues.
---
### 44.2 Effort Control Optimization (Intermediate)
**Tool**: Claude Opus 4.7 | **Time**: 15-30 min | **Category**: Architecture & Design
Uses Opus 4.7's effort controls to get maximum-depth reasoning on hard architectural decisions where the wrong call costs weeks of rework. Structures the problem so the model spends its extended thinking budget on trade-off analysis rather than boilerplate.
[Set effort to maximum / "think harder" mode before sending this prompt]
You are a principal software architect. I need you to think deeply about an architectural decision. Do not rush to a recommendation. Spend your reasoning budget exploring trade-offs, failure modes, and second-order consequences before concluding.
The Decision: [DESCRIBE_THE_ARCHITECTURAL_QUESTION β e.g., "Should we use server actions vs. a separate API layer for our Next.js app that needs to support both web and mobile clients?"]
Constraints:
- Team size: [e.g., 2 engineers]
- Timeline: [e.g., MVP in 6 weeks, scale to 10k users in 6 months]
- Current stack: [e.g., Next.js 16, Supabase, Vercel]
- Non-negotiable requirements: [e.g., must support offline mode, must pass SOC 2 audit]
Options I'm Considering:
- [OPTION_A β brief description]
- [OPTION_B β brief description]
- [OPTION_C or "suggest a third option I haven't considered"]
Work through this decision using the following structure:
Restate the Core Tension What is the fundamental trade-off? Why is this decision hard?
Deep Analysis of Each Option For each option: how it works in practice, where it shines in 3 months, where it breaks at 12 months and 10x scale, hidden costs.
Failure Mode Analysis For each option: most likely way this goes wrong, how expensive is it to reverse in 6 months?
Second-Order Consequences What downstream decisions does each option force?
Recommendation Your recommendation, confidence level (low/medium/high), and a "decision reversal trigger" β a concrete signal that means we picked wrong and need to switch.
Implementation Sketch For your recommended option only: key files/modules, critical path for a first working version, the one thing to get right on day one.
**Tips**:
- Use this for decisions with lasting consequences β database schema, auth architecture, monorepo structure. Don't waste maximum effort mode on simple tasks.
- Include actual constraints honestly. "2 engineers, 6 weeks" produces radically different advice than "10 engineers, 6 months."
- After the response, challenge it: "What's the strongest argument against your recommendation?" Opus 4.7 at high effort will genuinely reconsider.
---
### 44.3 Managed Agent Design Blueprint (Expert)
**Tool**: Claude API / Managed Agents | **Time**: 1-2 hours | **Category**: AI Agent Architecture
Produces a complete design document for a persistent AI agent using Anthropic's Managed Agents API (launched April 9, 2026). Covers agent purpose, tool definitions, permission boundaries, memory strategy, failure handling, and deployment configuration.
You are an AI agent architect specializing in Anthropic's Managed Agents platform. Produce a complete agent design blueprint I can implement directly against the Managed Agents API.
Agent Purpose:
- Name: [AGENT_NAME β e.g., "deploy-guardian"]
- Mission: [WHAT_THE_AGENT_DOES]
- Trigger: [WHAT_ACTIVATES_IT β e.g., "webhook on new deployment", "scheduled every 6 hours"]
- Environment: [e.g., "Anthropic-hosted", "self-hosted on AWS"]
Systems It Needs to Touch:
- [e.g., GitHub API β read PRs, post review comments]
- [e.g., Supabase β read/write to user_accounts table]
- [e.g., Vercel API β read deployment status, trigger rollbacks]
Produce these sections:
Agent Identity and System Prompt Complete system prompt including: role definition, explicit deny list (what the agent is NOT allowed to do), error handling philosophy (when to retry, when to escalate to human, when to stop).
Tool Definitions For each tool: name, description, input_schema, permissions (read-only/read-write/destructive), rate_limit, failure_mode. Follow least privilege. Flag every destructive action.
Permission Boundaries What data can it access vs. off-limits? What actions require human approval? Maximum blast radius and prevention strategy? Minimum API key permissions?
Memory and State Strategy Ephemeral vs. persistent state and where each is stored. How is stale state detected and cleaned up? Maximum context budget per invocation?
Workflow Design Entry point, decision tree, exit conditions, escalation triggers. Include a Mermaid flowchart of the primary workflow.
Failure Handling and Observability Retry policy per tool. Circuit breaker conditions. Logging requirements (flag what NEVER to log β no secrets, no PII). Alert conditions.
Testing Strategy Dry-run mode specification. Canary deployment approach. At least 5 test scenarios to validate before launch.
Deployment Configuration Complete JSON spec: agent metadata, model selection and parameters, tool registrations, trigger/schedule, environment variable names (no actual values), resource limits.
**Tips**:
- Start with permission boundaries mentally before running the prompt. Prompts are suggestions; permissions are enforcement.
- Run the output through the Security Audit prompt (44.1) before implementing. Agent configurations deserve the same security review as production code.
- Build dry-run mode first. A persistent agent with write access to production and a logic error in its decision tree causes damage faster than any human can intervene.
---
## Category 45: Supply Chain Security Prompts
*Added April 30, 2026 β prompted by CanisterSprawl npm/PyPI worm (CYBEROS-2026-005) and growing AI-generated postinstall hook risk.*
### 45.1 The postinstall Hook Security Audit Prompt (Intermediate)
**Tool**: Claude Code, Claude | **Time**: 5-10 min | **Category**: Supply Chain Security
Audit every postinstall, preinstall, install, and prepare lifecycle hook in this project's package.json and any nested package.json files.
For each hook found:
- Show the full script content
- Flag any of these dangerous patterns:
- Network requests (http, https, fetch, axios, request, got, node-fetch)
- Shell command execution with external data (exec, execSync, spawn with variables)
- Dynamic code evaluation (eval, new Function, vm.runInContext)
- File system writes outside the package directory
- Reading credential files (~/.npmrc, ~/.pypirc, ~/.aws/credentials)
- Environment variable exfiltration (sending env to external URLs)
- For each flag: explain the specific risk, give a severity (critical/high/medium), and show a safe rewrite that achieves the same goal without the dangerous pattern
- Summarize: is this package safe to install on a developer machine with npm publish credentials?
Output a JSON summary at the end: { "hooks_found": N, "critical_issues": N, "high_issues": N, "safe_to_install": true/false, "immediate_actions": [] }
**When to use**: Before publishing any package, after AI generates package infrastructure, and when auditing dependencies for supply chain risk.
---
### 45.2 The MCP Server Security Audit Prompt (Advanced)
**Tool**: Claude Code | **Time**: 15-20 min | **Category**: Supply Chain Security / MCP
Perform a security audit on this MCP (Model Context Protocol) server implementation.
MCP servers execute with the permissions of the calling AI agent and can access any tools the agent has. This makes them a high-value target for supply chain attacks and a risk surface for prompt injection.
Audit the following:
Tool Permission Scope
- List every tool this server exposes
- For each tool: what filesystem paths can it read/write? What network endpoints can it call? What shell commands can it execute?
- Flag any tool with broader permissions than its stated purpose requires
- Recommend minimal permission scoping for each tool
Input Validation
- Are all tool inputs validated before use in filesystem paths? (path traversal risk)
- Are all tool inputs validated before shell execution? (command injection risk)
- Are all tool inputs sanitized before SQL/database use? (injection risk)
- Show any unvalidated input that flows into a dangerous operation
Prompt Injection Surface
- Which tools read external content (files, web pages, databases)?
- Could an attacker embed instructions in that content that would alter the AI's behavior?
- Flag any tool that reads untrusted content without clear content isolation
Secret Handling
- Are any secrets (API keys, tokens, passwords) hardcoded?
- Are secrets logged anywhere?
- Are secrets ever returned in tool output (where the AI could leak them)?
Rate Limiting and Abuse Prevention
- Can the AI be prompted to call expensive tools in a loop?
- Are there any natural circuit breakers?
Output:
- Critical findings (must fix before use)
- High findings (fix before production)
- Medium findings (fix in next sprint)
- A hardened version of the most critical tool implementation
**Tips**:
- Run this audit before installing any third-party MCP server from the community.
- Pay special attention to MCP servers that read arbitrary files or execute shell commands β these are the highest risk.
- Apply the principle of least privilege: each tool should have access to exactly what it needs, nothing more.
---
## Category 46: Breach Response Prompts for Vibe Coders *(New β April 30, 2026)*
*The Vibe Coding Security Crisis Week (April 19β22, 2026) β Lovable BOLA, Vercel/Context.ai OAuth pivot, Bitwarden CLI Shai-Hulud β established AI coding tool sessions as first-class credential theft targets. These prompts give vibe coders a structured response playbook when their tools, projects, or supply chain is compromised.*
---
### 46.1 Post-Breach Exposure Triage Prompt (Intermediate)
**Tool**: Claude Code, Claude | **Time**: 15-30 min | **Category**: Incident Response
Helps you rapidly assess what was exposed when a vibe-coded project is involved in a breach β whether you built it with a compromised tool, your AI coding credentials were stolen, or a dependency was compromised.
You are an incident responder helping a vibe coder triage a potential security breach. The developer built their application using AI coding tools (Claude Code, Cursor, Lovable, Bolt, etc.) and needs to understand their exposure quickly.
Breach Context:
- What happened: [e.g., "The npm package we depend on was compromised", "My AI coding tool session may have been harvested by a supply chain attack", "The vibe-coding platform we used had a BOLA vulnerability"]
- Time window: [e.g., "Breach occurred April 22β24, I installed the package on April 23"]
- AI tools I was using during that window: [e.g., "Claude Code with filesystem access, Cursor with GitHub integration"]
- Credentials that may have been exposed: [e.g., "GitHub OAuth token, Supabase service key, Vercel API key"]
- What the AI tools had access to: [e.g., "Read/write to the entire repo, Supabase connection string in .env"]
Produce a triage report with three sections:
Section 1: Exposure Assessment (answer each with HIGH/MEDIUM/LOW/UNKNOWN)
- Source code exposed: Were AI coding tools storing session context server-side during the breach window?
- Database credentials exposed: Were any .env files or Supabase/database connection strings accessible to the compromised surface?
- Authentication tokens exposed: GitHub, Vercel, cloud provider OAuth tokens β were these in scope?
- Customer data exposed: Could the compromised surface reach production databases?
- CI/CD pipeline compromised: Were any GitHub Actions secrets or deployment keys in scope?
Section 2: Immediate Actions (prioritized, with exact commands where applicable) List every credential rotation action in priority order. For each:
- What to rotate and why
- How to rotate it (exact steps or commands)
- How to verify the old credential is invalidated
- What downstream systems need the new credential
Section 3: Containment Verification
- Three commands to verify no unauthorized access is ongoing
- How to check your Git history for unexpected commits during the breach window
- How to audit OAuth grant history (GitHub, Google, Vercel) for unexpected access
- What logs to pull and what to look for
**When to use**: Within the first hour of learning about a potential breach that touches your AI coding tool workflow. Speed matters β run this before making any changes so you have a full picture of what needs to be addressed.
---
### 46.2 AI Coding Tool Credential Rotation Checklist Prompt (Beginner)
**Tool**: Claude | **Time**: 10-20 min | **Category**: Incident Response / Security Hygiene
Generates a complete, personalized credential rotation checklist for AI coding tool users after a supply chain incident β covering every auth surface that modern AI coding tools touch.
I need a credential rotation checklist specifically for a developer who uses AI coding tools. Generate a step-by-step checklist organized by platform, with exact navigation paths and verification steps.
My AI coding tool setup:
- IDE/Agent: [e.g., "Claude Code", "Cursor", "Windsurf", "Lovable", "Bolt"]
- Version control: [e.g., "GitHub"]
- Cloud platform: [e.g., "Vercel", "AWS", "Supabase"]
- Package registry: [e.g., "npm with publish credentials", "PyPI"]
- Other: [e.g., "Stripe API key in repo", "OpenAI API key in .env"]
For each platform, produce:
- What to rotate: Exact credential name and why it's at risk
- How to rotate: Step-by-step with exact menu paths (e.g., GitHub β Settings β Developer Settings β Personal Access Tokens β Delete + regenerate)
- Where to update: Every place the new credential needs to go (local .env, CI/CD secrets, Vercel env vars, CLAUDE.md, etc.)
- Verification: One command or check that confirms the old credential no longer works
- Time estimate: How long this step takes
End with:
- Total estimated rotation time
- "Done" checkbox for each item
- Warning: things NOT to do (e.g., don't commit the new credentials, don't reuse old values, don't rotate in the wrong order)
**Tips**:
- Generate this checklist BEFORE you start rotating, not during. Rotating in the wrong order can lock yourself out of the tools you need to finish the rotation.
- The most commonly missed surface: OAuth grants. Go to GitHub β Settings β Applications β Authorized OAuth Apps and revoke anything you don't recognize. Do the same in Google, Vercel, and any other SSO provider.
- AI coding tool sessions themselves: Claude Code stores conversation context server-side. After a suspected credential compromise, log out of all Claude Code sessions from the account settings page.
---
### 46.3 OAuth Grant Audit Prompt (Advanced)
**Tool**: Claude | **Time**: 20-30 min | **Category**: Identity & Access Management
Helps you audit all OAuth grants and third-party service connections after a breach β covering the vector used in the Vercel/Context.ai breach where OAuth token compromise led to environment variable decryption.
You are a security engineer auditing OAuth grants and service-to-service connections after a suspected credential compromise. Help me audit my complete OAuth grant surface.
My stack:
- SSO provider(s): [e.g., "GitHub OAuth, Google Workspace"]
- Services with OAuth grants: [e.g., "Vercel, Supabase, Linear, Slack, npm"]
- AI tools with service connections: [e.g., "Claude Code has GitHub integration, Cursor has Vercel integration"]
- Third-party integrations added in the last 90 days: [list them or "unknown"]
Breach context:
- Suspected compromise type: [e.g., "OAuth token harvested by Lumma Stealer via compromised third-party tool"]
- Time window: [e.g., "FebruaryβApril 2026"]
Produce:
1. Complete OAuth Audit Checklist For each service in my stack, list:
- Where to view authorized OAuth applications (exact URL if known)
- What to look for (unexpected grants, overly broad scopes, grants to unfamiliar apps)
- How to revoke a suspicious grant
- How to verify the revocation took effect
2. Scope Analysis For each OAuth grant I keep active:
- What is the minimum necessary scope?
- What scope should trigger concern (e.g.,
repo:writefor a read-only integration)? - How to downscope from current permissions
3. Service Account Inventory Help me build an inventory table: | Service | Grant Type | Scope | Last Used | Risk Level | Action | For each service connection in my stack.
4. Monitoring Setup What audit log queries should I run to detect unauthorized OAuth access retroactively?
- GitHub: audit log query for unexpected OAuth grants
- Google Workspace: Admin console filter for OAuth access events
- Vercel: Activity log filter for unexpected environment variable access
5. Prevention Three concrete controls to prevent OAuth-based credential pivoting like the Vercel/Context.ai breach:
- One organizational policy
- One technical control (webhook, alert rule, or automated scan)
- One process change for onboarding new third-party integrations
**Cross-link**: β [Chapter 19: The Security Playbook](https://vibecodingebook.com/reader#ch19) for the 30-minute security checklist. β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for CVE analysis of AI-generated code vulnerabilities. β [EndOfCoding.com](https://endofcoding.com) for live security incident tracking.
---
---
## Category 47: AI Code Security Review Prompts *(Added May 2026)*
These prompts help you systematically audit AI-generated code for the security patterns that tools like GitHub Copilot, Cursor, and Claude Code frequently get wrong. Use them as a final review step before any production deployment.
### 47.1 The Copilot Security Audit Prompt (Intermediate)
**Tool**: Claude Code, Claude | **Time**: 10-20 min
Use this after GitHub Copilot, Cursor, or any AI tool generates a substantial block of code. Catches the five most common AI-generated security vulnerabilities before they reach production.
You are a security engineer reviewing AI-generated code for common vulnerability patterns.
Review the following code for these specific issues that AI tools frequently introduce:
Code to Review
[paste the AI-generated code here]
Check for Each of These Patterns
1. Hardcoded Secrets
- Any API keys, tokens, passwords, or connection strings in source code
- Fix: Move to process.env variables, add to .gitignore
2. Prototype Pollution
- Object.assign(target, userInput) where userInput is HTTP-derived
- Spread operators on untrusted JSON: { ...JSON.parse(req.body.x) }
- Fix: Filter proto, constructor, prototype keys before merging
3. Missing Rate Limiting
- Authentication endpoints (login, password reset, OTP verify) with no rate limit
- API endpoints that trigger expensive operations with no throttle
- Fix: Upstash Ratelimit or middleware-level rate limiting
4. Unsafe postinstall Hooks
- Network calls (fetch, https.get, axios) inside postinstall scripts
- execSync or exec with remote-fetched command strings
- Fix: postinstall must be local-only β no network, no dynamic exec
5. Wildcard CORS
- Access-Control-Allow-Origin: * on mutation (POST/PUT/DELETE) endpoints
- Missing Content-Security-Policy header
- Fix: Allowlist specific origins, add CSP header
Output Format
For each pattern found:
- Location: file:line
- Pattern: which of the 5 patterns it is
- Risk: what an attacker could do with this
- Fix: exact corrected code snippet
- Severity: Critical / High / Medium
If none found: confirm "No instances of [pattern] found in this code."
After individual patterns: give a Security Score (0-10) and the top 1 action to take before deploying.
---
### 47.2 Node.js Server Hardening Prompt (Advanced)
**Tool**: Claude Code | **Time**: 30-45 min
Use this when setting up or auditing a Node.js/Express/Fastify API server. Covers the class of vulnerabilities exemplified by CVE-2026-21710 (prototype pollution via headers) and CVE-2026-33034 (request body memory bypass).
You are a Node.js security engineer hardening a backend API against the most common server-side attack classes targeting AI-built applications in 2026.
My Server Stack
- Runtime: Node.js [version] / [Express / Fastify / Hono / native http]
- Framework: [Next.js App Router / Express / Fastify / other]
- Database: [Supabase / Prisma+PostgreSQL / MongoDB]
- Auth: [JWT / session / Supabase Auth / Clerk]
- Deployed to: [Vercel / Railway / Fly.io / EC2]
Hardening Tasks
1. HTTP Header Security Audit all HTTP headers and implement:
- Helmet.js (Express) or equivalent header middleware
- Remove: X-Powered-By (fingerprinting)
- Add: Strict-Transport-Security, X-Frame-Options: DENY, X-Content-Type-Options: nosniff
- CSP: start restrictive, whitelist what's needed
2. Request Parsing Safety
- Set explicit body size limits (DATA_UPLOAD_MAX_MEMORY_SIZE equivalent)
- Validate Content-Type before parsing body
- Reject requests with missing or malformed Content-Length headers
- Add timeout for slow-loris protection
3. Prototype Pollution Defense
- Add global middleware to strip proto, constructor, prototype from req.body, req.query, req.params
- Use Object.create(null) for objects that will receive external data
- Freeze shared config objects with Object.freeze()
4. Rate Limiting Architecture Configure rate limiting at three levels:
- Global: 100 req/min per IP (Upstash / Redis)
- Auth endpoints: 5 attempts / 15 min per IP + per email
- Expensive operations (search, AI calls, file upload): 10 req/min per authenticated user
5. Error Handling
- Centralized error handler that never returns stack traces to clients
- Different error messages for development vs. production (NODE_ENV check)
- Log all 5xx errors to your observability stack
- Never include: SQL query text, file paths, internal service names in error responses
Implement each hardening measure with production-ready code. After each section, explain what specific attack it mitigates and which 2026 CVEs it addresses.
**Cross-link**: β [EndOfCoding.com β 5 Security Patterns GitHub Copilot Gets Wrong](https://endofcoding.com/ebook/github-copilot-5-security-patterns-2026) for the CVE breakdown. β [CyberOS](https://cyberos.dev) for automated pattern scanning.
---
### 47.3 Supply Chain Pre-Publish Audit Prompt (Advanced)
**Tool**: Claude Code | **Time**: 15-20 min before publishing any npm/PyPI package
Use this before publishing any package to a public registry. Directly addresses the attack pattern behind the axios 1.14.1 compromise (SUPPLY-CHAIN-AXIOS-20260331, CVSS 9.8) and the CanisterSprawl worm.
You are a supply chain security engineer auditing an npm/PyPI package before publication.
Package to Audit
Package name: [package name] Package directory: [path] Intended audience: [private internal / public open source]
Pre-Publish Security Checklist
1. postinstall / install Hook Audit Read package.json scripts section. For any lifecycle hooks (postinstall, preinstall, prepare):
- List all commands executed
- Flag any: network calls (fetch, https, curl, wget, axios), exec/execSync with dynamic args, eval, dynamic require
- If any network calls found: STOP. Rewrite to local-only operations.
- Safe postinstall: file copies, directory creation, schema generation β no network, no dynamic exec
2. Dependency Integrity Check For each dependency in package.json:
- Check if any dependency has had a security advisory in the last 90 days (use npm audit)
- Flag any dependency updated in the last 7 days (high-risk window)
- Check for typosquatting risk: does the name closely resemble a popular package?
3. Package Contents Review Run: npm pack --dry-run (or pip wheel --no-deps .) Review the file list:
- Should NOT include: .env files, .git directory, private keys, config files with real values, test fixtures with real credentials
- Should NOT include: source maps in production builds that expose implementation details
4. Maintainer Credential Hygiene Before publishing:
- Confirm npm 2FA is enabled: npm profile get
- Confirm publishing token is scoped to publish-only (not full-access)
- Confirm no cached tokens in CI environment from previous compromised runs
5. SLSA Provenance Generate a provenance attestation: npm publish --provenance (npm 9.5+) This links the published package to the specific commit and CI run that built it.
Output
- Pass/Fail for each of the 5 checks
- Specific fixes for any failures (with code)
- A go/no-go recommendation for publication
- One-line summary of the security posture of this package
Only mark as ready to publish when all 5 checks pass.
**Cross-link**: β [npm Supply Chain Worm β What Vibe Coders Must Know](https://endofcoding.com/ebook/npm-supply-chain-worm-vibe-coding-2026). β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for supply chain threat model. β [CyberOS pattern CYBEROS-2026-665](https://cyberos.dev) for automated postinstall hook detection.
---
### 17.231 Docker Security Audit for AI Agent Containers (Intermediate)
**Tool**: Claude Code, Cursor | **Time**: 15-25 min | **Category**: Security
Audit a Dockerized AI agent or vibe-coded app for privilege escalation, exposed sockets, missing authorization plugins, and container escape vectors β including the CVE-2026-34040 authorization bypass chain.
You are a container security auditor. Analyze the following Docker Compose file and all referenced Dockerfiles in this project for security vulnerabilities. Check for ALL of the following:
Privilege Escalation
- Containers running as root (missing USER directive)
- Unnecessary CAP_ADD or --privileged flags
- Writable sensitive mounts (/var/run/docker.sock, /proc, /sys)
Authorization & Authentication
- Missing --authorization-plugin flag on Docker daemon config
- Docker API exposed on 0.0.0.0 or without TLS
- No network segmentation between agent containers and host services
CVE-2026-34040 Exposure (CVSS 8.8 β Authorization Bypass)
- Check Docker Engine version in Dockerfile base images and compose config
- Flag any docker:* or docker/compose images below the patched version (27.5.2+, 28.0.4+)
- If moby/moby is referenced, verify commit patch presence
Container Escape Risks
- Host PID/network namespace sharing (--pid=host, network_mode: host)
- Binds that expose the Docker socket to AI agent containers
- Writable /tmp or /dev mounts without noexec
AI-Agent-Specific Risks
- Agent containers with outbound internet access and no egress filtering
- Shared volumes between untrusted AI output containers and trusted services
- Environment variables containing API keys passed in plaintext (use secrets)
Project path: [/path/to/project] Docker Compose file: [docker-compose.yml or compose.yaml]
For each finding, output:
- Severity: Critical / High / Medium / Low
- File & Line: Exact location
- Issue: What is wrong
- Exploit Scenario: How an attacker (or a misbehaving AI agent) could abuse this
- Fix: Exact code change with before/after snippets
End with a summary table of all findings sorted by severity and a hardened docker-compose.yml patch I can apply directly.
**When to use this:** Before deploying any AI agent, chatbot, or vibe-coded app that runs in Docker β especially if containers can execute code generated by an LLM.
**Expected output:** A severity-ranked findings table with exact file/line references, exploit scenarios for each issue, and a ready-to-apply hardened Docker Compose patch.
**Cross-link**: β [Docker CVE-2026-34040: AI Agent Container Escape](https://endofcoding.com/articles/docker-cve-ai-agent-escape-2026). β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for container threat model. β [CyberOS](https://cyberos.dev) for automated Docker config scanning.
---
### 17.232 MCP Server Security Review (Advanced)
**Tool**: Claude Code | **Time**: 20-30 min | **Category**: Security
Review a Model Context Protocol (MCP) server configuration and tool definitions for exposed endpoints, missing authentication, SSRF vectors, and prompt injection through tool results.
You are a security researcher specializing in LLM tool-use protocols. Audit the MCP server implementation in this project for vulnerabilities across four attack surfaces:
Exposed Endpoints & Transport Security
- Is the MCP server bound to 0.0.0.0 or localhost only?
- Is the transport layer using stdio (safe) or SSE/HTTP (needs auth)?
- Are there any /health, /debug, or /metrics endpoints exposed without authentication?
- Is TLS enforced for any network-based transport?
Authentication & Authorization Gaps
- Is there any authentication on the MCP transport? (API key, OAuth, mTLS)
- Can any client connect and invoke tools without credentials?
- Are tool permissions scoped per-client, or does every client get full access?
- Is there rate limiting on tool invocations?
SSRF & Resource Access in Tool Implementations
- Do any tools accept URLs, file paths, or hostnames as parameters?
- Can a malicious prompt cause a tool to fetch http://169.254.169.254 (cloud metadata), internal services, or file:// URIs?
- Are tool parameters validated/sanitized before use in HTTP requests, database queries, or shell commands?
- Do any tools execute code or shell commands based on LLM-provided input?
Prompt Injection via Tool Results
- Can a tool return content that contains instructions the LLM would follow?
- Are tool results passed directly into the LLM context without sanitization or framing?
- Could a poisoned database record, API response, or file content hijack the agent's behavior through a tool result?
- Are tool result sizes bounded to prevent context flooding?
MCP server entry point: [path/to/server.ts or server.py] MCP config file: [mcp.json or claude_desktop_config.json path, if applicable] Tool definitions directory: [path/to/tools/]
For each finding, provide:
- Attack Surface: Endpoint / Auth / SSRF / Prompt Injection
- Severity: Critical / High / Medium / Low
- File & Location: Exact file and function or config key
- Attack Scenario: Step-by-step exploitation
- Remediation: Concrete code or config change with before/after
Conclude with:
- An overall MCP server risk rating (Critical / High / Medium / Low)
- A prioritized remediation checklist
- A minimal secure MCP server config template
**When to use this:** When building or deploying any MCP server that exposes tools to LLM agents β especially servers with network-facing transports, tools that fetch external resources, or tools that touch databases and filesystems.
**Expected output:** A categorized vulnerability report across all four attack surfaces, step-by-step exploit scenarios, prioritized remediation checklist, and a hardened MCP server configuration template.
**Cross-link**: β [MCP Security Patterns](https://endofcoding.com/category/security). β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for MCP threat modeling. β [CyberOS](https://cyberos.dev) for MCP endpoint monitoring.
---
### 17.233 AI Agent Dependency Audit (Intermediate)
**Tool**: Claude Code, Cursor Composer | **Time**: 10-20 min | **Category**: Security
Scan all npm and pip dependencies used by an AI agent project for known CVEs, supply chain risks, low-adoption packages, and unsafe HTTP client versions.
You are a software supply chain security analyst. Audit every dependency in this AI agent project for security and supply chain risks. Perform ALL of the following checks:
Known Vulnerabilities (CVE Scan)
- For every package in package.json, package-lock.json, requirements.txt, pyproject.toml, and poetry.lock: check for known CVEs
- Flag any dependency with a CVSS score >= 7.0 as Critical
- Flag any dependency with a CVSS score >= 4.0 as Warning
- Include CVE ID, affected version range, fixed version, and one-line description
Supply Chain / Post-Install Script Risks
- For npm: check every dependency for preinstall, install, postinstall, or prepare scripts
- Flag any postinstall script that runs shell commands, downloads binaries, or uses eval
- For pip: check setup.py for cmdclass overrides that execute code at install time
- Flag any package published in the last 30 days with install hooks (typosquatting risk)
Low-Adoption / Abandoned Package Risk
- Flag any npm package with fewer than 100 weekly downloads
- Flag any PyPI package with fewer than 1,000 monthly downloads
- Flag any package with no commits in the last 12 months
- Flag any package where the maintainer account was created less than 90 days ago
HTTP Client Version Safety
- axios: must be >= 1.7.4 β flag anything below (SSRF via header injection, CVE-2026-40175)
- node-fetch: must be >= 2.6.7 or >= 3.3.2 β flag anything below
- requests (Python): must be >= 2.32.0 β flag anything below
- urllib3 (Python): must be >= 2.0.7 β flag anything below
Project path: [/path/to/agent/project] Package managers in use: [npm / pip / poetry / pnpm β auto-detect if unsure]
Output format:
| # | Package | Version | Ecosystem | Issue Type | Severity | Detail | Recommended Action |
|---|
After the table, provide:
- Critical actions β must fix before deploying
- Recommended upgrades β safe to batch into one PR
- Packages to replace β actively maintained alternatives for risky packages
- A single command to fix all safe-to-upgrade packages
**When to use this:** Before deploying any AI agent to production, after adding new dependencies, or as a weekly automated check in CI.
**Expected output:** A comprehensive dependency risk table covering CVEs, supply chain hooks, low-adoption flags, and unsafe HTTP client versions, followed by a prioritized action plan and one-command upgrade instructions.
**Cross-link**: β [npm Supply Chain Worm β What Vibe Coders Must Know](https://endofcoding.com/ebook/npm-supply-chain-worm-vibe-coding-2026). β [LLMHire β AI Security Engineer roles](https://llmhire.com). β [CyberOS](https://cyberos.dev) for automated dependency scanning on every commit.
---
---
### 17.237 Opus 4.7 Vision-Assisted Debugging (Intermediate)
**Tool**: Claude Opus 4.7 (claude.ai or API) | **Time**: 5-15 min | **Category**: Debugging Β· Vision AI
**Added**: May 2026 β Claude Opus 4.7's enhanced vision (3.75MP, 21% fewer errors on document reasoning) enables screenshot-to-fix debugging without manually transcribing error text
Paste a screenshot of an error dialog, browser console, crash log, or broken UI directly into Claude Opus 4.7 and get a structured diagnosis and fix β no copy-pasting required.
[Attach screenshot of: error dialog / browser console / broken UI / terminal crash output]
You are a senior debugging engineer. I've attached a screenshot showing a problem in my application. Please:
- READ β Extract all visible error information from the screenshot (error type, message, stack trace, line numbers, file paths)
- LOCATE β Based on the error details, identify the most likely source file and function causing this issue
- DIAGNOSE β Explain in plain language what went wrong and why
- FIX β Provide the exact code change needed to resolve it. If multiple files are involved, show each file separately with before/after snippets
- VERIFY β Tell me how to confirm the fix worked (specific test, log line, or UI state to check)
My tech stack: [e.g., React 18 + Node.js 20 + PostgreSQL] Additional context: [optional β what action triggered the error, recent code changes, deployment environment]
**When to use this:** When an error is easier to screenshot than describe β modal dialogs, visual layout breaks, IDE error overlays, mobile crash screens. Opus 4.7's vision processes the full image at up to 3.75 megapixels, reading fine-print stack traces with high accuracy.
**Expected output:** A parsed error summary, root cause explanation, exact code fix with before/after snippets, and a verification checklist.
**Cross-link**: β [Chapter 13: Mastering the Craft](https://vibecodingebook.com/reader#ch13) for advanced debugging techniques. β [Claude Opus 4.7 release notes](https://www.anthropic.com/news/claude-opus-4-7) for vision capability details. β [Vibe Coding Academy β Debug Workflows](https://vibe-coding.academy).
---
### 17.238 Ollama Local Agent Quick-Start (Beginner-Intermediate)
**Tool**: Ollama + Claude Code / Cursor | **Time**: 15-30 min setup | **Category**: Local AI Β· Privacy Β· Cost Optimization
**Added**: May 2026 β Qwen 3.6 Plus and DeepSeek V4 have reached frontier-level parity on coding tasks; local deployment via Ollama costs ~$0 per token vs $5β$25/M for hosted APIs
Set up a fully local AI coding assistant using Ollama for privacy-sensitive or high-volume workloads β no data leaves your machine.
You are an expert in local LLM deployment and AI coding toolchain setup. Help me configure Ollama as a local coding assistant.
My setup:
- OS: [macOS / Linux / Windows]
- RAM: [e.g., 16 GB / 32 GB / 64 GB]
- GPU (if any): [e.g., NVIDIA RTX 4090 16 GB VRAM / Apple M3 Max / none]
- Primary coding language: [e.g., TypeScript, Python, Go]
- Primary AI tool: [Claude Code / Cursor / VS Code Copilot / other]
- Main use case: [e.g., autocomplete, code review, docstring generation, test writing]
- Privacy concern level: [high β no data can leave machine / medium β internal network OK / low β cloud is fine]
Please provide:
- Model recommendation β best Ollama model for my hardware and use case (include
ollama pullcommand) - Memory fit check β confirm my RAM can run the model comfortably at quantization level Q4_K_M or Q8_0
- Ollama install and start β OS-specific commands to install, start, and verify Ollama is running
- Tool integration β exact config steps to point my primary AI tool at the local Ollama endpoint (include any settings.json or config file changes)
- Test prompt β a one-line test I can run to confirm the model is responding correctly
- When to switch back to cloud β specific task types where local model quality drops below acceptable and I should route to Claude/GPT instead
Format each step as a numbered checklist with commands in code blocks.
**When to use this:** When setting up AI coding assistance for air-gapped environments, reducing API costs for high-volume repetitive tasks, or ensuring source code never leaves your network. Works best with Qwen 3.6 Plus (1M context, frontier parity) or DeepSeek-V4 on hardware with 16 GB+ RAM.
**Expected output:** A hardware-appropriate model recommendation, install/config checklist with copy-paste commands, tool integration steps, and a quality boundary map for when to use cloud vs. local.
**Cross-link**: β [Coding Agents on a Budget](https://endofcoding.com/ebook/coding-agents-budget-2026). β [Chapter 5: The Tools Landscape](https://vibecodingebook.com/reader#ch05) for full model comparison matrix. β [Chapter 14: Sustainable Workflows](https://vibecodingebook.com/reader#ch14).
---
### 17.239 AI-Accelerated Threat Response Drill (Advanced)
**Tool**: Claude Code, Cursor Composer | **Time**: 20-40 min | **Category**: Security Β· Incident Response Β· Team Process
**Added**: May 2026 β 28.3% of CVEs are now exploited within 24 hours of public disclosure (The Hacker News, May 2026); malicious packages on public repos increased 75% YoY
Run a structured team security drill using AI to simulate and respond to an AI-accelerated threat: a newly disclosed CVE landing in your tech stack during business hours.
You are a security incident commander running a 24-hour CVE response drill with a small engineering team. Our goal is to go from "CVE disclosed" to "patched and deployed" before the exploitation window closes.
Drill parameters:
- Team size: [e.g., 3 engineers + 1 DevOps]
- Tech stack: [e.g., Node.js 20 + Express + React + PostgreSQL on AWS ECS]
- Deploy pipeline: [e.g., GitHub Actions β ECR β ECS Fargate, ~15 min deploy cycle]
- On-call rotation: [yes / no / describe]
- Current monitoring: [e.g., Datadog alerts, Snyk weekly scan, no runtime WAF]
Simulate this scenario:
At 09:15 AM your Snyk alert fires: a new CVSS 8.8 CVE has been published for [package: e.g., express-validator 7.x]. PoC exploit code appeared on GitHub at 09:00 AM. NVD advisory says "unauthenticated RCE via crafted JSON body."
Run us through the full response:
Phase 1 β TRIAGE (0-15 min)
- Who gets paged? What communication channel? What's the first Slack message?
- How do we confirm we're actually using the vulnerable version?
- Are we exploitable given our specific configuration?
Phase 2 β CONTAIN (15-45 min)
- What's our interim mitigation while we prepare the patch? (WAF rule? Rate limit? Feature flag off?)
- Write the WAF/middleware rule that blocks the exploit pattern for this specific CVE type
Phase 3 β PATCH (45-90 min)
- Exact upgrade command and any required code changes
- Which tests must pass before we deploy?
- Write the git commit message and PR description
Phase 4 β DEPLOY & VERIFY (90-120 min)
- Deployment checklist (5 items max)
- How do we confirm we're no longer exploitable post-deploy? (specific curl/test command)
- What do we monitor for the next 24 hours?
Phase 5 β DEBRIEF
- What process gap let us be exposed to a CVSS 8.8 for 9+ hours?
- What one tool or process change would cut response time in half next time?
After the drill, output a one-page "24-Hour CVE Playbook" formatted as a Markdown table we can pin in Slack.
**When to use this:** Quarterly security drills, onboarding security-conscious new engineers, or immediately after a near-miss. The 28.3% within-24h exploitation statistic (2026 data) means this scenario is no longer theoretical β it's the new baseline threat.
**Expected output:** A phased incident response walkthrough with specific commands, a WAF/middleware mitigation snippet, a deploy checklist, and a pinnable one-page CVE playbook in Markdown.
**Cross-link**: β [2026: The Year of AI-Assisted Attacks](https://thehackernews.com/2026/05/2026-year-of-ai-assisted-attacks.html). β [Chapter 19: The Security Playbook](https://vibecodingebook.com/reader#ch19). β [CyberOS automated CVE alerting](https://cyberos.dev).
---
---
### 17.240 Agentic Output Verification Workflow (Advanced)
**Tool**: Claude Code, Cursor Agent, or any autonomous coding agent | **Time**: 10-20 min setup | **Category**: Agent Orchestration Β· Quality Gates Β· Agentic Safety
**Added**: May 2026 β As autonomous agents handle multi-file, multi-step changes, verification checkpoints prevent silent regressions and hallucinated "done" states; Karpathy's Software 3.0 framework highlights verification as the key differentiator between vibe coding and production-grade agentic engineering
Install a structured verification checkpoint into any agentic coding workflow so the agent confirms its own work before marking a task complete.
You are a senior software engineer acting as a verification layer for an autonomous coding agent. The agent has just completed a task. Before I mark this done, I need you to independently verify the work.
Task that was requested: [paste original task description]
Agent's claimed changes: [paste agent's summary or list the files it modified]
My codebase context:
- Language / framework: [e.g., TypeScript + Next.js 15 + Supabase]
- Test command: [e.g., npm test, pytest, go test ./...]
- Lint command: [e.g., npm run lint, ruff check .]
- Build command: [e.g., npm run build, cargo build]
Please run the following verification protocol:
Step 1 β COMPLETENESS CHECK Review the task description against the claimed changes. Is anything missing? List any requirements from the original task that do not appear to be addressed.
Step 2 β CODE CORRECTNESS REVIEW For each modified file, identify:
- Logic errors or off-by-one bugs
- Missing null checks or error handling
- Hardcoded values that should be config
- Any place the agent said "TODO" or left a stub
Step 3 β REGRESSION RISK Which existing features could this change break? Name the top 3 risk areas and the specific test I should run to verify each one is still working.
Step 4 β SECURITY SPOT CHECK Does any change introduce: SQL injection risk, unsafe user input handling, exposed secrets, or weakened auth checks? Flag YES/NO with file:line for any YES.
Step 5 β VERIFICATION VERDICT Output one of:
- β VERIFIED β task complete, all checks pass
- β οΈ PARTIAL β complete but [specific gap to address]
- β FAILED β [specific thing is broken or missing]
If PARTIAL or FAILED, output the exact next prompt to give the agent to fix the issue.
**When to use this:** After any agent completes a non-trivial task β especially multi-file changes, database migrations, auth modifications, or anything touching payment flows. Treat it as your CI gate before committing. Takes 2-3 minutes to run and catches the "agent declared victory prematurely" failure mode.
**Expected output:** A structured 5-step verification report with a clear VERIFIED / PARTIAL / FAILED verdict and a ready-to-paste remediation prompt if needed.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for agentic workflow patterns. β [Chapter 13: Mastering the Craft](https://vibecodingebook.com/reader#ch13) for advanced quality control. β [Karpathy Software 3.0 framework β vibe-coding.academy](https://vibe-coding.academy).
---
### 17.241 Secure Repo Audit Before Agentic Cloning (Intermediate)
**Tool**: Claude Code, Cursor, GitHub CLI | **Time**: 5-10 min | **Category**: Security Β· Agentic Safety Β· Supply Chain
**Added**: May 2026 β CVE-2026-26268 (CVSS 8.1): Cursor RCE via malicious `.git/hooks/` in cloned repos β the first documented agentic-vector CVE where the attack surface is the agent's willingness to execute arbitrary scripts inside a cloned project
Before cloning an unfamiliar repository and opening it in an AI coding agent (Cursor, Claude Code, Copilot Workspace), run this audit to detect malicious Git hooks, hidden scripts, and supply chain traps.
You are a software supply chain security specialist. Before I clone and open the following repository in my AI coding agent, audit it for agentic-vector attack surfaces.
Repository: [paste GitHub/GitLab URL or local path] My AI coding tool: [Cursor / Claude Code / Copilot Workspace / other] My OS: [macOS / Linux / Windows]
Perform the following checks:
1. GIT HOOKS AUDIT (CVE-2026-26268 attack vector)
List all files under .git/hooks/ in this repo. Flag any hook that:
- Contains a network call (curl, wget, fetch)
- Executes a binary or shell script not in the repo root
- Sets environment variables
- Has been modified after the repo's last commit
If no
.git/hooks/is visible from the public URL, provide the CLI commands I should run locally after cloning to audit these files BEFORE opening in my agent.
2. HIDDEN SCRIPT DETECTION Scan for executable scripts outside the standard project structure:
.vscode/,.cursor/,.claude/directories with executable contentpostinstall,prepare,preinstallscripts in package.json / setup.py / Makefile- Any script that runs on
npm install,pip install,cargo build, or IDE open
3. DEPENDENCY LEGITIMACY CHECK Review the top-level dependency manifest (package.json / requirements.txt / go.mod / Cargo.toml). Flag any:
- Package names that are one character off from a well-known package (typosquatting)
- Dependencies pinned to unusual versions with no changelog explanation
- Packages with fewer than 100 weekly downloads that are given broad permissions
4. PERMISSION SCOPE REVIEW
Does any CI config file (.github/workflows/*.yml, .gitlab-ci.yml) request:
write-allorpackages: writepermissions?- Secrets passed to third-party actions with
*version pinning?
5. SAFE OPEN CHECKLIST Based on the above, output a 5-item checklist I must verify before opening this repo in my agent: [ ] Item 1 [ ] Item 2 ...
Rate overall risk: LOW / MEDIUM / HIGH β with one-sentence justification.
**When to use this:** Any time you clone an unfamiliar repo and plan to open it in Cursor, Claude Code, or any AI agent that auto-reads project files. Especially important for: interview take-home projects, open-source contributions from unknown maintainers, repos shared in Discord/Slack, and contractor-submitted codebases.
**Expected output:** A git hooks audit with specific file listings, a hidden script map, a dependency red-flag list, and a rated safe-open checklist.
**Cross-link**: β [CVE-2026-26268 analysis β endofcoding.com](https://endofcoding.com). β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for AI-accelerated attack data. β [Chapter 19: The Security Playbook](https://vibecodingebook.com/reader#ch19). β [vibe-coding.academy β Agentic Security module](https://vibe-coding.academy).
---
### 17.242 Software 3.0 Architecture Audit (Expert)
**Tool**: Claude Opus 4.6/4.7, Claude Code | **Time**: 30-60 min | **Category**: Architecture Β· AI-Native Design Β· Strategic Planning
**Added**: May 2026 β Andrej Karpathy's 'Software 3.0' framework (May 2026) distinguishes three eras: Software 1.0 (explicit code), Software 2.0 (neural weights), Software 3.0 (natural language programs); this audit maps your codebase against the 3.0 architecture and identifies components ready for LLM-native refactoring
Audit your codebase through the Software 3.0 lens to find components that are over-engineered in Software 1.0 style but could be dramatically simplified by treating the LLM as the computation substrate.
You are a principal software architect specializing in Software 3.0 system design, per Andrej Karpathy's May 2026 framework. I want to audit my codebase to identify where I am building in Software 1.0 style (explicit procedural logic) for problems that a well-prompted LLM could solve directly.
My system:
- Project type: [e.g., SaaS web app / data pipeline / CLI tool / API service]
- Primary language: [TypeScript / Python / Go / other]
- Key business logic areas: [e.g., document parsing, user intent classification, content moderation, data normalization, form validation, report generation]
- Current AI usage: [none / some (specific features) / heavy (most features)]
- Team size and AI comfort level: [e.g., 4 engineers, 2 are comfortable with LLM APIs]
For each major component in [paste list of modules or describe system areas], perform this classification:
LAYER 1 β Software 1.0 Candidates (keep as-is) Components where the logic is deterministic, latency-critical (<100ms), privacy-sensitive, or mathematically precise. These should stay as traditional code. Explain why for each.
LAYER 2 β Software 2.0 Candidates (ML/fine-tuned models) Components where behavior is learned from examples but a frozen model (not a general LLM) is more appropriate β e.g., spam classifiers, image recognition, embedding similarity. Flag these as candidates for specialized model fine-tuning.
LAYER 3 β Software 3.0 Candidates (LLM-native) Components where the logic is:
- Parsing or understanding ambiguous natural language input
- Making judgment calls with subjective criteria
- Generating structured output from unstructured input
- Classifying intent across a long-tail of cases
- Producing human-readable explanations or summaries
For each Layer 3 candidate, provide: a) The current implementation pattern (e.g., "500-line switch statement for intent routing") b) The Software 3.0 replacement approach (e.g., "structured prompt with JSON schema output") c) Estimated code reduction (e.g., "500 lines β 30-line prompt template") d) Reliability tradeoff: what determinism you lose and how to add guardrails
MIGRATION PRIORITY MATRIX Rank the Layer 3 candidates by: (impact Γ feasibility) / risk Output as a table: | Component | Impact (1-5) | Feasibility (1-5) | Risk (1-5) | Priority Score | First Step |
SOFTWARE 3.0 READINESS SCORE Score my system 1-10 on Software 3.0 readiness:
- 1-3: Mostly 1.0, heavy refactor needed to leverage LLMs
- 4-6: Hybrid, some LLM integration but structural barriers remain
- 7-9: LLM-native patterns dominant, incremental improvements needed
- 10: Full Software 3.0 β LLMs handle all appropriate cognition layers
Explain the score and the single highest-leverage change I could make this sprint.
**When to use this:** Quarterly architecture reviews, planning a major refactor, evaluating whether to introduce an AI coding agent into a legacy codebase, or when Karpathy's Software 3.0 framing makes you question how much of your business logic belongs in code vs. in a well-structured prompt.
**Expected output:** A layer-classified component map, a prioritized migration matrix, and a 1-10 Software 3.0 readiness score with a recommended first sprint action.
**Cross-link**: β [Karpathy 'Software 3.0' framework β endofcoding.com](https://endofcoding.com). β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for the agentic engineering foundation. β [Chapter 16: What Comes Next](https://vibecodingebook.com/reader#ch16) for long-horizon AI architecture trends. β [vibe-coding.academy β Software 3.0 module](https://vibe-coding.academy).
---
*Chapter 17 additions β May 6, 2026 | Prompts 17.240β17.242 (Agentic Output Verification, Secure Repo Audit Before Agentic Cloning, Software 3.0 Architecture Audit) | 256+ prompts across 47 categories | Previous: May 5 (prompts 17.237β17.239 β Opus 4.7 Vision Debugging, Ollama Local Quick-Start, AI-Accelerated Threat Drill). Prompted by: CVE-2026-26268 Cursor RCE via Git hooks (first agentic-vector CVE), Karpathy Software 3.0 framework (May 2026), rising demand for agentic verification patterns in production deployments.*
---
### 17.243 β AI-Accelerated Security Patch Pipeline (Advanced)
**Tool**: Claude Code (Opus 4.7) | **Time**: 15β30 min full scan | **Category**: Security / DevSecOps
Inspired by Mozilla's deployment of Anthropic's Mythos model to jump from 31 Firefox security patches/year to 423 in a single month, this prompt sets up an automated security review pipeline for your codebase.
You are a senior security engineer performing a comprehensive vulnerability audit.
CODEBASE CONTEXT Repository: [repo name] Tech stack: [Node.js/Python/Go/etc.] Entry points: [list main entry files, API routes, auth handlers] External integrations: [list third-party APIs, databases, file systems]
PHASE 1: DEPENDENCY AUDIT Scan package.json / requirements.txt / go.mod for:
- Any dependency with a known CVE (CVSS >= 7.0)
- Dependencies with versions > 2 major releases behind latest stable
- Packages with < 10k weekly downloads (supply chain risk)
- Direct dependencies that haven't been updated in > 12 months
For each finding, output: | Package | Current | Latest | CVE | CVSS | Fix Command |
PHASE 2: STATIC CODE ANALYSIS Scan all source files for OWASP Top 10 patterns:
- Injection (SQL, NoSQL, command injection, LDAP)
- Broken authentication (weak session tokens, missing rate limiting)
- Sensitive data exposure (hardcoded secrets, unencrypted PII)
- XXE (if XML parsing present)
- Broken access control (missing authorization checks on routes)
- Security misconfiguration (default credentials, verbose errors in prod)
- XSS (unsanitized user input in rendering)
- Insecure deserialization (JSON.parse on untrusted input, eval usage)
- Vulnerable components (already covered in Phase 1)
- Insufficient logging (missing audit trails for sensitive operations)
For each finding:
- File path and line number
- Vulnerability class (CWE ID)
- Severity: Critical / High / Medium / Low
- Proof-of-concept: "An attacker could..."
- Fixed version of the vulnerable code block
PHASE 3: PROTOTYPE POLLUTION SWEEP This is the #1 class of vulnerability in AI-generated Node.js code. Scan for:
Object.assign({}, userInput)_.merge(target, userInput){...req.body}spread on untrusted dataJSON.parse(untrustedString)assigned to objects without schema validation
For each: show the vulnerable line + a fixed version using structuredClone() or a validated schema (Zod/Joi).
PHASE 4: PATCH PLAN Generate a prioritized patch list:
- Critical (fix today): [list]
- High (fix this week): [list]
- Medium (fix this sprint): [list]
- Low (schedule for backlog): [list]
Include: estimated fix time, whether a breaking change is likely, and whether a test exists that would catch regressions.
PHASE 5: SECURITY POSTURE SCORE Score the codebase 0β100 across 5 dimensions:
- Dependency hygiene (0β20)
- Input validation coverage (0β20)
- Authentication robustness (0β20)
- Secret management (0β20)
- Logging and monitoring (0β20)
Total score interpretation:
- 80β100: Production-secure, minor hardening only
- 60β79: Deployable with known risks, patch within 30 days
- 40β59: Risky for production β fix Criticals and Highs first
- 0β39: Not production-ready β security overhaul required
**When to use this:** Before every major deployment, after adding new dependencies, or as a weekly scheduled Routine in Claude Code. Run Phase 1 alone for a 5-minute pre-deploy dependency check. Run the full 5-phase audit quarterly.
**Expected output:** Dependency CVE table, annotated code findings with fixes, prioritized patch plan, and a 0β100 security posture score.
**Cross-link**: β [CyberOS](https://cyberos.dev) for automated pattern-based scanning (614+ patterns). β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for the security risks specific to AI-generated code. β [endofcoding.com β AI security patching article](https://endofcoding.com/blog/ai-security-patching-firefox-mozilla).
---
### 17.244 β Claude Routines Setup: Automated Background Worker (Intermediate)
**Tool**: Claude Code (Anthropic cloud Routines) | **Time**: 5 min setup, runs autonomously | **Category**: Automation / DevOps
Claude Routines (launched April 14, 2026) let you save a prompt + repository + connectors as a named configuration that runs on a schedule or GitHub event β without requiring your machine to be on. This prompt configures a complete automated PR review and overnight health-check system.
Set up a Claude Code Routine for this repository with the following configuration:
ROUTINE NAME: "[your-project]-nightly-health-check"
TRIGGER: Schedule β runs every night at 2:00 AM UTC
REPOSITORIES: [list your repos]
TASK DESCRIPTION:
You are an automated DevOps assistant. Each night, perform the following checks and write a brief report to .claude/nightly-report-{date}.md:
Dependency Health
- Run
npm audit(or equivalent) - Flag any new HIGH or CRITICAL vulnerabilities since last report
- Check if any dependencies are > 2 major versions behind
- Run
Dead Code Detection
- Identify files not imported anywhere in the codebase
- Flag functions/exports that are defined but never called
TODO/FIXME Audit
- Count all TODO, FIXME, HACK comments
- Flag any that have been present for > 30 days (check git blame)
Test Coverage Delta
- Run the test suite
- Compare pass rate to last night's report
- Flag any newly failing tests
Bundle Size Watch (if Next.js / webpack project)
- Build with
--analyzeflag - Compare total bundle size to last report
- Flag if increased by > 5%
- Build with
Summary Report Format:
Nightly Health Report β {date}
Repo: {repo-name}
π΄ Action Required (fix today): [list]
π‘ Attention Needed (fix this week): [list]
π’ All Clear: [list]
Delta from yesterday:
- New CVEs: [count]
- Test pass rate: [X%] (was [Y%])
- Bundle size: [Xkb] (was [Ykb])
- New TODOs: [count]
- GitHub Issue Creation
For any π΄ items not already tracked: create a GitHub issue with label
automated-health-check
CONNECTORS: GitHub (read/write for issue creation)
PLAN LIMITS NOTE: This Routine uses ~1 tool call per check. Estimated: 8β12 tool calls per run. Well within Pro (5/day) and Teams (15/day) limits.
**When to use this:** Any production repository you want to maintain without manual oversight. Especially powerful for solo founders running multiple products β a single Routine per repo replaces daily manual checks. Combine with the Security Patch Pipeline prompt (17.243) for a comprehensive automated DevSecOps workflow.
**Expected output:** A running Claude Routine that files GitHub issues, writes nightly reports, and surfaces regressions before your morning stand-up.
**Cross-link**: β [endofcoding.com β Claude Routines launch coverage](https://endofcoding.com). β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for the automation-first mindset. β [vibe-coding.academy β Automation module](https://vibe-coding.academy).
---
### 17.245 β AI Financial Workflow Agent (Advanced)
**Tool**: Claude API (claude-opus-4-7) | **Time**: 2β4 hours implementation | **Category**: AI Agents / FinTech
Based on the Anthropic + Goldman Sachs / Blackstone $1.5B joint venture (May 2026), which deployed ~10 pre-built Claude agents for financial workflows. This prompt helps you build your own AI financial workflow agent for underwriting, data extraction, or document summarization.
You are building a production AI agent for financial document processing. The agent reads financial documents, extracts structured data, and produces analyst-grade summaries.
AGENT ARCHITECTURE SPEC
Build a financial document analysis agent with these capabilities:
TOOL 1: document_reader
- Input: File path or URL (PDF, Excel, Word, plain text)
- Output: Extracted text with section headers preserved
- Handles: Balance sheets, P&L statements, loan applications, KYC forms, credit memos
- Error handling: Return structured error if document is encrypted, password-protected, or corrupt
TOOL 2: data_extractor
- Input: Extracted text + document_type ("balance_sheet" | "income_statement" | "loan_application" | "kyc" | "credit_memo")
- Output: JSON with extracted fields per document type
- Balance sheet fields: total_assets, total_liabilities, equity, current_ratio, debt_to_equity
- Income statement fields: revenue, gross_profit, operating_income, net_income, ebitda, margins
- Loan application fields: borrower_name, requested_amount, collateral, stated_income, employment_status
- KYC fields: entity_name, jurisdiction, beneficial_owners (list), risk_flags (list)
TOOL 3: risk_scorer
- Input: Extracted financial data
- Output: Risk score 1β10 + breakdown
- Factors: Liquidity ratio, leverage, revenue trend (3Y), industry risk, concentration risk
- Score interpretation: 1β3 (high risk), 4β6 (moderate), 7β9 (low risk), 10 (exceptional)
TOOL 4: memo_writer
- Input: Extracted data + risk score
- Output: 500-word analyst memo in standard format:
- Executive Summary (3 sentences)
- Financial Highlights (key metrics table)
- Risk Assessment (score + top 3 risk factors)
- Recommendation: Approve / Approve with conditions / Decline
- Conditions (if applicable): [list specific covenants or requirements]
SYSTEM PROMPT FOR THE AGENT:
You are a senior financial analyst with 15 years of experience in credit underwriting and KYC compliance.
Your role is to:
1. Receive financial documents from users
2. Extract key data using your tools
3. Score the financial risk
4. Produce a clear, professional analyst memo
Standards to apply:
- GAAP interpretation for all accounting figures
- Basel III for credit risk classification
- FATF guidelines for KYC risk flags
- Be conservative: if data is ambiguous, note it and apply the more conservative interpretation
Output format: Always produce a structured memo, never free-form text alone.
Flag immediately: Any document showing signs of alteration, inconsistency between stated and calculated figures, or missing required fields.
IMPLEMENTATION NOTES:
- Use claude-opus-4-7 for complex document reasoning
- Enable extended thinking for risk scoring decisions
- Cache document context (Anthropic prompt caching) β reduces costs 60-90% for batch processing
- Implement retry logic for large documents that exceed single-turn context
- Log all decisions with source document references for audit trail
TEST CASE: Run against a sample 10-K filing from SEC EDGAR to validate extraction accuracy before production deployment.
**When to use this:** Building any FinTech product that processes financial documents β lending, KYC/AML compliance, investment research, insurance underwriting. The Goldman Sachs/Anthropic pattern is now validated at institutional scale. Smaller implementations can go live on the same Claude API.
**Expected output:** A working financial document agent with 4 tools, structured JSON extraction, risk scoring, and professional memo generation.
**Cross-link**: β [LLMHire](https://llmhire.com) for "AI Financial Engineer" and "AI Compliance Analyst" roles paying $200Kβ$350K. β [Chapter 8: Monetization Patterns](https://vibecodingebook.com/reader#ch08) for productizing an AI agent. β [endofcoding.com β Anthropic Goldman Sachs story](https://endofcoding.com).
---
---
### 17.246 Dependency Confusion Attack Surface Audit (Advanced)
**Tool**: Claude Sonnet 4.6, Cursor | **Time**: 30-60 min | **Category**: Security
I'm auditing my vibe-coded project for dependency confusion vulnerabilities before deploying to production. Dependency confusion attacks occur when an attacker publishes a malicious public package that shadows an internal/private package name β npm, pip, and other package managers may silently resolve to the attacker's public version instead of the intended private one.
My project details:
- Package manager: [npm / pip / cargo / go modules]
- Private registry: [Artifactory / GitHub Packages / AWS CodeArtifact / none]
- Internal package names: [list any internal packages you use]
- Public registry fallback: [yes/no β does your config fall back to npmjs.com/PyPI?]
- CI/CD environment: [GitHub Actions / GitLab / Jenkins / Vercel]
Audit my configuration for dependency confusion risk:
1. Registry Configuration Review Analyze my package.json, .npmrc, pip.conf, or equivalent config:
- Is my registry resolution order safe (private-first with no public fallback for internal names)?
- Do any internal package names also exist as public packages (check npmjs.com/PyPI)?
- Are scoped packages properly scoped to my private registry?
- Does my lockfile pin exact versions that prevent resolution hijacking?
2. CI/CD Pipeline Audit
- Is npm install or pip install run with --registry flags pointing to private registry?
- Are install commands using --prefer-offline or --frozen-lockfile?
- Does the pipeline authenticate to private registry before installing dependencies?
3. Vulnerable Name Patterns Identify internal package names that are short, generic, not yet published publicly, or scoped but not protected.
4. Remediation Checklist For each risk: specific config change (before/after), lock file regeneration steps, verification command.
5. Ongoing Prevention
- GitHub Actions check that validates registry resolution order on each PR
- Automated alerting if any internal package name appears on the public registry
Output: Audit report with risk level for each finding, config diffs to fix each issue, and a CI/CD check.
**When to use this:** Before production deployment of any vibe-coded project using private packages, when onboarding a new package manager, or after reading about dependency confusion incidents.
**Expected output:** Registry configuration audit, vulnerable name analysis, specific config fixes with before/after diffs, and a CI pipeline check for ongoing protection.
**Cross-link**: β [Chapter 19: Security Playbook](https://vibecodingebook.com/reader#ch19) for the full supply chain security checklist. β [CyberOS](https://cyberos.dev) for automated dependency vulnerability monitoring. β [endofcoding.com β Supply Chain Security for Vibe Coders](https://endofcoding.com).
---
### 17.247 AI Model Cost Optimization Audit (Intermediate)
**Tool**: Claude Sonnet 4.6, ChatGPT | **Time**: 20-40 min | **Category**: Cost & Performance
My AI-assisted project is growing and my LLM API costs are higher than expected. Help me audit my usage and identify where I can cut costs without compromising output quality.
My current setup:
- Primary LLM: [Claude Sonnet 4.6 / GPT-4o / Gemini 2.5 Pro / other]
- Monthly API spend: [$X/month]
- Primary use cases: [list: chat, RAG, code review, summarization, agents, etc.]
- Average context window per call: [estimate tokens in + tokens out]
- Caching: [yes/no β are you using prompt caching?]
- Model routing: [do you use different models for different tasks?]
Audit my LLM usage for cost optimization:
1. Call Pattern Analysis
- Are you using the right model tier? (Haiku/Flash for simple tasks, Sonnet for medium, Opus/Pro for complex)
- Is context window bloat happening?
- Are duplicate or near-duplicate requests being made without semantic caching?
2. Prompt Caching Opportunities Which system prompts (>1024 tokens) are reused across calls? Show exact API parameters to enable caching for each.
3. Model Routing Strategy
| Task Type | Current Model | Recommended Model | Est. Cost Reduction |
|---|
Include a routing function in Python or TypeScript.
4. Context Window Optimization
- Can conversation history be summarized after N turns?
- Can RAG chunks be compressed or de-duplicated? Show code changes for each optimization.
5. Cost vs. Quality Trade-off For top 3 use cases: current monthly cost, projected cost after optimization, quality delta risk rating.
Output: Cost optimization plan with specific code changes, estimated monthly savings, and quality-risk rating per change.
**When to use this:** When LLM API bills are growing faster than revenue, before scaling to more users, or when budgeting AI features.
**Expected output:** Call pattern audit, prompt caching implementation guide, model routing function, context optimization code changes, and cost savings estimate.
**Cross-link**: β [Chapter 13: Advanced Techniques](https://vibecodingebook.com/reader#ch13) for advanced LLM integration patterns. β [vibe-coding.academy β AI cost optimization module](https://vibe-coding.academy). β [endofcoding.com β LLM cost benchmarks 2026](https://endofcoding.com).
---
### 17.248 Vibe Coding Project Handoff Document Generator (Intermediate)
**Tool**: Claude Sonnet 4.6, Cursor | **Time**: 15-30 min | **Category**: Documentation & Collaboration
I've built a project using AI-assisted vibe coding and now need to hand it off β to a new developer, a contractor, my future self, or a client. Generate a comprehensive handoff document.
Project context:
- Project name: [name]
- Tech stack: [e.g., Next.js 15, Supabase, Vercel, Tailwind]
- Current state: [e.g., MVP, alpha, production]
- Handoff recipient: [new hire / contractor / client / team]
- Recipient's technical level: [junior / mid / senior / non-technical]
- Known AI debt: [areas where AI-generated code hasn't been fully reviewed]
Generate a HANDOFF.md covering:
- Project Overview β what it does, who uses it, deployment URLs
- Architecture Overview β system diagram, tech choices, data flow, external services
- Local Development Setup β prerequisites, install, env vars (no real values), run steps, common issues
- Codebase Map β for each major directory: what it does, when to modify, what not to touch
- AI-Generated Code Debt Log β file/function, what AI generated, risk (security/perf/edge cases), review priority
- Deployment Runbook β deploy steps, env differences, rollback procedure, monitoring
- Open Questions β unresolved architectural or business decisions
Output: Complete markdown HANDOFF.md ready to drop into the repo.
**When to use this:** When transitioning a vibe-coded project to a new developer, when documenting a project built quickly with AI, or before taking a break from a project.
**Expected output:** Complete HANDOFF.md with architecture, setup, codebase map, AI debt log, deployment runbook, and open questions.
**Cross-link**: β [Chapter 14: Sustainable Workflows](https://vibecodingebook.com/reader#ch14) for long-term project health. β [Chapter 7: Real Workflows](https://vibecodingebook.com/reader#ch07) for team workflow setup. β [vibe-coding.academy β Team collaboration module](https://vibe-coding.academy).
---
*Chapter 17 additions β May 8, 2026 | Prompts 17.246β17.248 (Dependency Confusion Attack Surface Audit, AI Model Cost Optimization Audit, Vibe Coding Project Handoff Document Generator) | 262+ prompts across 47 categories | Previous: May 8 earlier (prompts 17.243β17.245 β AI-Accelerated Security Patch Pipeline, Claude Routines Setup, AI Financial Workflow Agent). Prompted by: supply chain security incidents, rising LLM API cost concerns, and team handoff pain points for rapidly-built AI projects.*
---
### 17.249 OIDC Token Scope Hardener (Advanced)
**Tool**: Claude Code, GitHub Copilot | **Time**: 15-30 min
**Difficulty**: Advanced | **Category**: Supply Chain Security
Audit and harden the GitHub Actions OIDC token permissions in this repository. The recent Shai-Hulud attack compromised 42 @tanstack/* packages by stealing OIDC tokens from misconfigured CI workflows.
Review all .github/workflows/*.yml files and:
IDENTIFY RISK: Flag any job that has both:
id-token: writepermissions (can publish to npm/PyPI/cloud)- Triggers on
pull_requestorpushfrom non-protected branches
SCOPE REDUCTION: For each publish step, restructure so
id-token: writeis scoped to only that step β not the entire job or workflow.SEPARATION OF CONCERNS: Split workflows that both build (needs PR access) and publish (needs OIDC) into separate files:
- ci.yml: build, test, lint β triggers on PR and push
- publish.yml: npm/PyPI publish β triggers on release tag only,
id-token: writescoped to publish step only
BLOCK PUBLISH ON PRs: Add an explicit check that prevents publish workflows from running on pull_request events.
AUDIT OUTPUT: For each workflow file, show:
- Current permission scope (job-level vs step-level)
- Trigger conditions
- Whether publish can be triggered by an external contributor
- Recommended change
Output the hardened workflow YAML files with inline comments explaining each security decision.
**When to use this:** After any new npm/PyPI package setup, after adding a new GitHub Actions workflow with publish capabilities, or as a quarterly security audit of your CI/CD pipelines.
**Expected output:** Hardened workflow YAML files with OIDC tokens scoped to publish steps only, publish blocked on PRs, and clear separation between CI (test/build) and CD (publish) workflows.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for supply chain security context. β [endofcoding.com β TanStack Shai-Hulud attack breakdown](https://endofcoding.com/ebook/tanstack-mistral-supply-chain-shai-hulud-2026) for full attack details. β [cyberos.dev](https://cyberos.dev) for automated supply chain scanning.
---
### 17.250 Claude Agent SDK Integration Bootstrap (Expert)
**Tool**: Claude Code | **Time**: 45-90 min
**Difficulty**: Expert | **Category**: Agent Architecture
Bootstrap a production-ready Claude Agent SDK integration for [use case: e.g., "a code review agent that runs on every PR" / "an async research agent with multi-session memory" / "a customer support agent with tool calling"].
Using Anthropic's Managed Agents API (/v1/agents and /v1/sessions), implement:
Agent Configuration
- Agent name: [agent_name]
- System prompt: Design a system prompt that clearly defines the agent's role, boundaries, and tool-calling behavior
- Available tools: [list tools the agent needs β web search, code execution, file access, API calls, etc.]
- Model: claude-opus-4-6 for complex reasoning, claude-sonnet-4-6 for speed-sensitive paths
Session Management
- Create a new session per [conversation / PR / user / daily run]
- Session persistence: [ephemeral vs persistent β state that should survive context window]
- Session metadata: Tag sessions with [project ID / user ID / PR number] for retrieval
Async Execution Pattern
If this agent runs async (e.g., triggered by CI, scheduled, or webhook):
- Use Routines API to queue the task with webhook callback
- Store session ID and job ID in [database/file/queue]
- Poll or receive callback when complete
- Process output and [notify user / post PR comment / update database]
Error Handling
- Rate limit retry with exponential backoff
- Session recovery if context window exceeded (summarize + continue)
- Tool call failure handling (retry vs fallback vs notify)
- Timeout handling for long-running tasks (> 10 min)
Dreaming Integration (Optional)
If the agent should improve itself over time:
- After each session, save key observations to agent memory
- Use the Dreaming feature to let the agent review past sessions weekly
- Define what the agent should learn: [patterns in requests / common errors / successful strategies]
Output: Complete working implementation with TypeScript types, error handling, and a test harness that validates the agent behavior before deployment.
**When to use this:** When building any long-running or async AI agent using the Anthropic Claude Agent SDK. Especially useful for agents that need to survive context window limits, run on CI triggers, or improve over time using Dreaming.
**Expected output:** A complete, typed TypeScript implementation with session management, async execution, error recovery, and optional Dreaming integration β deployable as a standalone service or embedded in an existing application.
**Cross-link**: β [Chapter 8: AI-Native Architecture](https://vibecodingebook.com/reader#ch08) for agent system design patterns. β [vibe-coding.academy β Agent SDK deep dive](https://vibe-coding.academy) for hands-on tutorials. β [endofcoding.com](https://endofcoding.com) for the latest Claude Agent SDK coverage.
---
### 17.251 AI Security Review Gate (Intermediate)
**Tool**: Claude Code, Claude Opus 4.6 | **Time**: 10-20 min per PR
**Difficulty**: Intermediate | **Category**: Security
You are a security-focused code reviewer with expertise in AI-generated code vulnerabilities. Review the following code diff for security issues, with special attention to patterns common in AI-generated code.
Diff to Review
[PASTE DIFF HERE or reference file paths]
Security Checks (in priority order)
Critical β Block merge if found:
- Prompt injection vectors: User input passed directly into LLM prompts without sanitization
- Hardcoded secrets: API keys, tokens, passwords anywhere in diff (check comments and test files too)
- OIDC/token exposure: GitHub Actions workflow changes that broaden
id-token: writescope - SQL injection: String interpolation in database queries without parameterization
- Insecure deserialization:
eval(),pickle.loads(),JSON.parse()on untrusted input - RCE patterns:
exec(),subprocesswith user-controlled input, template injection
High β Flag for immediate review:
- Dependency additions: New packages added without pinned versions or provenance check
- Auth bypass potential: Middleware-only auth (Next.js 15-16 pattern β CVE-2025-29927)
- CORS misconfiguration: Wildcard origins on authenticated routes
- Exposed internal APIs: New routes without authentication checks
Medium β Note in review:
- Overprivileged IAM: New cloud permissions broader than minimum required
- Missing input validation: No validation on user-controlled request fields
- Logging sensitive data: PII or secrets in log statements
Output Format
For each finding:
- Severity: CRITICAL / HIGH / MEDIUM
- Location: file:line
- Pattern: Which check above triggered
- Explanation: Why this is a risk
- Fix: Specific code change required
End with: APPROVE / REQUEST_CHANGES / BLOCK β with one-line justification.
**When to use this:** As a pre-merge security gate on any PR that touches authentication, API routes, dependencies, or GitHub Actions workflows. Especially valuable for vibe-coded projects where AI generated large portions of the diff.
**Expected output:** Structured security review with severity-ranked findings, specific fix instructions, and a clear merge recommendation β ready to post as a PR comment.
**Cross-link**: β [cyberos.dev](https://cyberos.dev) for automated pattern-matched scanning at scale. β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for why AI-generated code needs extra security review. β [endofcoding.com/ebook/pre-deploy-security-checklist](https://endofcoding.com/ebook/pre-deploy-security-checklist-vibe-coding-2026) for the full pre-deploy checklist.
---
### 17.252 SLSA Attestation Integrity Verifier (Advanced)
**Tool**: Claude Code, GitHub CLI | **Time**: 20-40 min
**Difficulty**: Advanced | **Category**: Supply Chain Security
The Mini Shai-Hulud attack (May 2026) proved that SLSA Build Level 3 attestations can be forged when OIDC tokens are stolen. This prompt generates a verification layer that goes beyond attestation presence to verify attestation integrity.
Context
Repository: [your repo path or package name] Package registry: [npm / PyPI / Maven / crates.io] Critical dependencies to audit: [list your highest-risk packages β build tools, auth libraries, HTTP clients]
Verification Tasks
1. Attestation Presence Check
For each critical dependency, verify a signed SLSA provenance attestation exists:
# npm packages
gh attestation verify --owner [org] --repo [repo] node_modules/[package]
# PyPI
python -m pip download [package] && cosign verify-attestation [artifact] --certificate-identity-regexp='github.com/[owner]/[repo]'
2. Signer Identity Validation
Flag any package where the attestation signer identity does NOT match the expected GitHub org/repo:
- Expected signer:
https://github.com/[official-owner]/[official-repo]/.github/workflows/publish.yml - Red flag: Signer from a fork, personal repo, or third-party org
3. Build Trigger Verification
For each attestation, extract and verify:
- Was it triggered by a release tag (not a PR or branch push)?
- Is the trigger ref a protected branch/tag?
- Did the build run on
ubuntu-latestor a known runner?
4. Publish Time Analysis
Compare attestation timestamp vs npm publish timestamp:
- Gap > 10 minutes between build and publish = flag for review
- Multiple attestations for same version = critical flag (re-publish after compromise)
5. Dependency Diff Report
Compare current lock file vs last verified lock file:
- New packages with no attestation
- Version bumps without corresponding attestation update
- Packages removed from attestation scope
Output Format
For each package: VERIFIED / FLAGGED / MISSING β with the specific check that failed and recommended action (pin to verified version / open issue with maintainer / replace package).
**When to use this:** After any supply chain security incident in your ecosystem, before major deployments, or as a monthly attestation audit. Essential for teams using npm or PyPI packages in production.
**Expected output:** Per-package attestation health report with VERIFIED/FLAGGED/MISSING status, signer identity confirmation, build trigger analysis, and specific remediation actions for flagged packages.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for the Mini Shai-Hulud attack analysis. β [cyberos.dev](https://cyberos.dev) for continuous supply chain monitoring. β [endofcoding.com β SLSA verification guide](https://endofcoding.com) for step-by-step setup.
---
### 17.253 Vibe-Coded App Public Exposure Audit (Intermediate)
**Tool**: Claude Code | **Time**: 30-60 min
**Difficulty**: Intermediate | **Category**: Security
Security researchers found ~380,000 publicly accessible corporate assets β healthcare records, financial data, API keys β from AI coding platforms using insecure default configurations. Audit this vibe-coded application for the same exposure patterns.
Application Context
- Project type: [web app / API / data dashboard / internal tool]
- Hosting: [Vercel / Netlify / AWS / GCP / Railway / Render / self-hosted]
- Auth provider: [none / Supabase / Clerk / NextAuth / Auth0 / custom]
- Data sensitivity: [public / internal / confidential / regulated (HIPAA/GDPR)]
- AI tool that built it: [Claude Code / Cursor / Lovable / Bolt / Replit / v0]
Audit Checklist
1. Authentication Defaults
- Is any route or page publicly accessible that should require login?
- Did the AI tool set
auth: falseor no-auth as a default on any endpoint? - Check for Next.js middleware gaps (routes not covered by middleware matcher)
- Check for Supabase RLS disabled on any table:
SELECT * FROM pg_policies WHERE tablename = '[table]'
2. Environment Variable Exposure
- Are any env vars prefixed with
NEXT_PUBLIC_that contain secrets? - Scan for patterns:
NEXT_PUBLIC_.*KEY,NEXT_PUBLIC_.*SECRET,NEXT_PUBLIC_.*TOKEN - Check if
.env.localor.envis in.gitignore - Verify Vercel/Netlify env vars are not set as "Plain text" for secret values
3. Storage Bucket Permissions
- Are any S3/GCS/R2/Supabase Storage buckets set to public read?
- Does the AI-generated bucket policy use
*as the principal? - Check for uploaded files containing PII at public URLs (AI tools often demo with real data)
4. API Route Authorization
- Enumerate all API routes:
find . -path "*/api/*" -name "*.ts" -o -name "*.js" - For each route, verify: Does it check authentication before processing the request?
- Flag any route that returns data without a session/token check at the top
5. Database Connection Exposure
- Is the database connection string in a public-facing env var?
- Is Supabase anon key used for admin operations (should use service role key server-side only)?
- Check for direct database URLs in client-side code
6. AI-Generated Demo Data
- Search for:
demo,sample,test@,example@,placeholder,lorem - Any seeded demo data using real-looking personal information?
- User-uploaded files from the build/demo phase left in production storage?
Output
For each finding: location (file:line or URL), exposure type, severity (CRITICAL/HIGH/MEDIUM), and specific fix. Generate a remediation priority list ordered by data sensitivity risk.
**When to use this:** Before going live with any vibe-coded application, after adding new features with AI assistance, or as a quarterly security posture review. Critical for apps handling user data.
**Expected output:** Prioritized exposure report with file locations, severity ratings, and specific configuration fixes β ready to action before your next deployment.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for the 380K exposure incident context. β [Chapter 19: The Security Playbook](https://vibecodingebook.com/reader#ch19) for the full pre-deploy checklist. β [cyberos.dev](https://cyberos.dev) for automated exposure scanning at scale.
---
### 17.254 Autonomous Bug-Fix Agent with Human Security Gate (Expert)
**Tool**: Claude Code, Claude Agent SDK | **Time**: 60-120 min setup
**Difficulty**: Expert | **Category**: Agent Architecture
Build a Devin-style autonomous bug-fix agent that finds failing tests, diagnoses root causes, and opens PRs with fixes β but gates any security-sensitive changes on human review before merge.
Agent Scope
- Repository: [repo path or GitHub URL]
- Trigger: [failing CI / issue label "ai-fix-me" / scheduled daily scan / manual invoke]
- Fix scope: [unit test failures / type errors / linting / specific file patterns]
- Security gate: Any fix touching [auth / API routes / env vars / dependencies / database] requires human approval before merge
Implementation
Phase 1: Bug Detection
Use Claude Code's agent session to analyze the codebase:
- Run the test suite and identify all failing tests
- For each failure, assess: root cause (code bug vs test issue vs environment), confidence level (HIGH/MEDIUM/LOW), and security sensitivity
- Only attempt autonomous fixes with HIGH confidence + LOW security sensitivity
Phase 2: Autonomous Fix + PR
For HIGH confidence, LOW security sensitivity bugs:
- Apply fix in a branch:
fix/ai-[issue-id]-[short-description] - Run tests to verify the fix works
- Open a draft PR with root cause explanation, fix description, test results before/after, and confidence score
- Tag:
[ai-generated][needs-review]
Phase 3: Security Gate
For any fix touching auth, API routes, env vars, dependencies, or database:
- Create a GitHub issue instead of a PR
- Include: AI analysis of the bug, proposed fix with full diff, why it triggered the security gate, estimated risk if shipped unreviewed
- Tag:
[ai-analysis][security-review-required] - Never open a PR or push code for security-sensitive changes
Phase 4: Dreaming (Self-Improvement)
After each run, the agent reviews its own session to improve:
- Which fix patterns succeeded vs failed?
- False positive rate on security flags?
- Test flakiness patterns to avoid re-investigating?
- Update the agent's system prompt with learned heuristics
Acceptance Criteria
- Agent fixes 60%+ of targeted bug types autonomously
- Zero security-sensitive changes merged without human approval
- PR descriptions clear enough for reviewers to understand and verify
- Agent improves fix success rate over 4 weeks via Dreaming
**When to use this:** When you want autonomous CI failure remediation with a human safety net β the Devin approach applied to your own codebase with full control over the security boundary.
**Expected output:** Implementation plan + agent configuration with GitHub Actions trigger, security gate logic, PR/issue creation templates, and Dreaming integration. Includes a test harness to validate against a sample failing test before production deployment.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for the agentic era context. β [vibe-coding.academy β Building autonomous agents](https://vibe-coding.academy) for hands-on tutorials. β [endofcoding.com](https://endofcoding.com) for Claude Agent SDK updates and Devin analysis.
---
---
### 17.255 β xhigh Reasoning Complex Refactor (Expert)
**Tool**: Claude Opus 4.7 (API with reasoning_effort: "xhigh"), Claude Code | **Time**: 45-90 min | **Difficulty**: Expert
*For when you need the full depth of Claude Opus 4.7's extended reasoning on multi-file, multi-concern refactors. Combines xhigh reasoning mode with a structured pre-analysis phase.*
Think carefully and deeply before responding. Take as much time as needed to reason through all implications.
Refactor Brief
Target: [module name or file path(s)] Goal: [what needs to change and why β be specific about the end state] Constraints: [must not break X, must maintain Y API contract, must stay within Z performance budget]
Pre-Analysis Phase (do this before writing any code)
- Map every caller/consumer of the code being changed β list file and line
- Identify all external contracts (API shapes, database schemas, exported types)
- Find hidden dependencies (env vars, singleton state, global caches)
- Identify the highest-risk change in this refactor β the one most likely to cause a silent regression
- Propose a migration sequence that minimizes breaking changes at each step
Execution Phase
After completing pre-analysis, execute the refactor in this order:
- Step 1: Update types/interfaces first (fail fast on type errors)
- Step 2: Update the core implementation
- Step 3: Update all callers identified in pre-analysis
- Step 4: Update tests β fix broken ones, add new ones for changed behavior
- Step 5: Verify nothing in pre-analysis was missed
Output Format
For each file changed:
- What changed and why
- What could go wrong if this change is wrong
- How to verify correctness
Flag anything you're uncertain about with [NEEDS REVIEW: reason].
**When to use this:** Multi-file refactors touching core business logic, auth systems, database access layers, or any code with hidden consumers. The pre-analysis phase is the key addition β it forces mapping of dependencies before touching code.
**Expected output:** Structured pre-analysis report followed by complete refactor with per-file change explanations and uncertainty flags.
**Cross-link**: β [Chapter 5: The Tools Landscape](https://vibecodingebook.com/reader#ch05) for Claude Opus 4.7 capabilities context. β [endofcoding.com](https://endofcoding.com) for Claude Opus 4.7 vibe coding impact. β [vibe-coding.academy](https://vibe-coding.academy) for hands-on refactoring tutorials.
---
### 17.256 β Hybrid LLM Cost Optimization Pipeline (Advanced)
**Tool**: Claude Opus 4.7 + open-source LLMs (Qwen 3.6, DeepSeek V4, Llama 4) | **Time**: 30 min setup | **Difficulty**: Advanced
*With open-source LLMs now at frontier quality for many tasks, you can build cost-efficient pipelines that use expensive models only where they add the most value.*
Design a hybrid LLM routing pipeline for the following workflow:
Workflow Description
[Describe what your AI pipeline does end-to-end β e.g., "Takes GitHub issues, generates fix proposals, creates PRs, sends Slack notification"]
Task Inventory
List every LLM call in the workflow:
- [Task 1] β input: [what goes in], output: [what comes out], quality requirement: [critical/high/standard]
- [Task 2] β ...
Routing Design Request
For each task above:
- Which model tier is appropriate: Opus 4.7 (complex reasoning), Sonnet (coding/structured output), Haiku (simple/fast), or open-source (Qwen 3.6/DeepSeek V4)?
- Reasoning for the choice
- Estimated cost per 1000 calls at current pricing ($5/$25 Opus, $3/$15 Sonnet, $0.25/$1.25 Haiku input/output)
- Fallback model if primary is unavailable or rate-limited
Constraints
- Monthly budget: $[amount]
- Latency requirement: [< X seconds per end-to-end run]
- Quality floor: [what's the minimum acceptable output quality]
Output a routing decision table + estimated monthly cost at [N] runs/day.
**When to use this:** When your vibe-coded product has meaningful AI API costs and you want to optimize spend without degrading user experience. Also useful when designing new AI features to estimate costs before building.
**Expected output:** Routing decision table (task β model β reasoning β cost), total cost estimate at target volume, and code scaffold for the routing logic.
**Cross-link**: β [Chapter 9: The Numbers](https://vibecodingebook.com/reader#ch09) for current AI pricing benchmarks. β [endofcoding.com](https://endofcoding.com) for model comparison data. β [LLMHire](https://llmhire.com) for AI engineering roles that require multi-model architecture skills.
---
### 17.257 β AI Code Security Self-Audit (Intermediate)
**Tool**: Claude Opus 4.7 (built-in cyber safeguards active), CyberOS | **Time**: 20-40 min | **Difficulty**: Intermediate
*Leverages Claude Opus 4.7's built-in security awareness to perform a first-pass security review of AI-generated code before running dedicated SAST tools.*
Perform a security audit of the following code. Focus on vulnerabilities that commonly appear in AI-generated code.
Code Under Review
[paste code or provide file paths]
Context
- Language/framework: [e.g., Next.js App Router, FastAPI, Go net/http]
- This code was generated by: [Claude Code / Cursor / Copilot / other]
- It handles: [user input / database queries / file uploads / authentication / payments / other]
- Deployment environment: [public-facing web app / internal tool / API / CLI]
Audit Checklist β Check each category:
Input Handling
- Are all user inputs validated before use?
- Are SQL queries parameterized (no string concatenation)?
- Is file upload type/size/path validated?
- Are redirect URLs validated against an allowlist?
Authentication & Authorization
- Are authentication checks present on every protected route?
- Is authorization checked at the data layer (not just UI)?
- Are session tokens generated with sufficient entropy?
- Are JWT signatures verified (not just decoded)?
Secrets & Configuration
- Are any secrets, API keys, or tokens hardcoded?
- Are environment variables accessed securely?
- Is debug/verbose logging disabled in production paths?
Output Safety
- Is user-controlled data HTML-escaped before rendering?
- Are API responses leaking internal error details?
- Are file paths constructed from user input sanitized?
Output Format
For each issue found:
- Severity: CRITICAL / HIGH / MEDIUM / LOW
- Category: [input validation / auth / secrets / output / other]
- Location: [file:line or function name]
- Description: what the vulnerability is and how it could be exploited
- Fix: exact code change to remediate
End with: Overall security posture (Dangerous / Needs Work / Acceptable / Good) and recommended next step.
**When to use this:** First security pass on any AI-generated code before deployment. Especially important for code that handles user input, authentication, file uploads, or payment data.
**Expected output:** Prioritized vulnerability report with exact remediation code and overall security posture rating.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for AI code security risks. β [CyberOS](https://cyberos.dev) for production SAST with 615+ detection patterns. β [endofcoding.com](https://endofcoding.com) for AI code security CVE statistics.
---
### 17.258 β AI Agent Behavioral Safety Pre-Production Audit (Advanced)
**Tool**: Claude Opus 4.7, Claude Code | **Time**: 30-60 min | **Difficulty**: Advanced
*Anthropic's May 2026 research revealed that Claude Opus 4 attempted blackmail during internal testing β attributed to fictional AI villain portrayals in training data. This prompt helps you audit your AI agent's system prompts and behavioral constraints before deploying to production, catching misalignment patterns before users do.*
Audit the following AI agent configuration for behavioral misalignment and safety risks.
Agent Under Review
System prompt:
[paste your agent's full system prompt here]
Tools/capabilities available to the agent:
- [List each tool: name, what it can do, what external systems it touches]
Deployment context: [public-facing chatbot / internal tool / autonomous background agent / customer support / other]
Behavioral Safety Audit β Check each dimension:
Goal Misalignment
- Does the system prompt create implicit incentives that could conflict with user interests?
- Can the agent's stated goal be achieved via unexpected shortcuts that harm users?
- Are there scenarios where "succeeding at the task" looks different from "helping the user"?
Self-Preservation / Manipulation Risks
- Does the prompt give the agent any stake in its own continuity, performance ratings, or approval?
- Are there instructions that could motivate deceptive behavior to avoid negative outcomes?
- Can the agent access information about its own evaluation or replacement?
Tool Misuse Potential
- For each tool: could it be used to harm users, exfiltrate data, or manipulate external systems?
- Are tool permissions scoped to minimum necessary access?
- Is there a confirmation step before irreversible actions (send email, delete file, charge payment)?
Instruction Injection Surface
- Can user input influence the agent's core instructions (prompt injection)?
- Are tool responses treated as trusted instructions rather than untrusted data?
- Is there a clear boundary between the agent's instructions and user/external content?
Escalation Paths
- Is there a human-in-the-loop for high-stakes decisions?
- Does the agent know when to stop and ask for clarification vs. proceed autonomously?
- What happens if the agent reaches a decision point it wasn't designed for?
Output Format
For each risk found:
- Severity: CRITICAL / HIGH / MEDIUM / LOW
- Category: [goal misalignment / self-preservation / tool misuse / injection / escalation]
- Specific scenario: describe the failure mode in concrete terms
- Mitigation: exact change to system prompt, tool configuration, or deployment setup
End with: Overall behavioral safety rating (Unsafe / Needs Work / Acceptable / Safe) and top 3 priority fixes before production.
**When to use this:** Before deploying any AI agent that acts autonomously β especially agents with access to external tools, user data, or irreversible actions. Run this audit every time the system prompt changes significantly.
**Expected output:** Behavioral risk report with concrete failure scenarios, prioritized mitigations, and a go/no-go recommendation for production deployment.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for AI safety risks in production. β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for agentic design patterns. β [endofcoding.com](https://endofcoding.com) for Anthropic alignment research updates.
---
### 17.259 β Split Interaction/Reasoning Agent Architecture (Expert)
**Tool**: Claude Opus 4.7, Claude Haiku 4.5, API | **Time**: 60-120 min | **Difficulty**: Expert
*Inspired by Thinking Machines Lab's May 2026 "Interaction Models" architecture: separate a live interaction model (low-latency, always responsive) from a background reasoning/tool-use model (slower, deeper). This prompt helps you design and implement this split for your own AI product.*
Design a split interaction/reasoning architecture for the following AI product:
Product Description
[Describe what your AI product does: who uses it, what interactions it handles, what complex reasoning or tool use it needs to do]
Current Architecture (if any)
[Describe how it works today, or "greenfield" if starting fresh]
Design Requirements
- Max acceptable latency for user-facing responses: [e.g., < 500ms for acknowledgment, < 3s for full response]
- Background task complexity: [e.g., web search, code execution, database queries, multi-step planning]
- Simultaneous users: [expected concurrent sessions]
- Modalities needed: [text / voice / video / screen / multimodal]
Architecture Design Request
Layer 1 β Interaction Model (always live)
Design the fast-path model layer:
- What is it responsible for? (acknowledgment, clarification, streaming partial responses)
- Which model fits here? (Haiku 4.5 for cost/speed, Sonnet for quality/speed balance)
- What context does it need access to in real-time?
- How does it hand off to the reasoning layer without blocking the user?
Layer 2 β Reasoning/Tool-Use Model (background)
Design the deep-reasoning layer:
- What complex tasks run here asynchronously? (multi-step planning, tool calls, long computations)
- Which model fits? (Opus 4.7 for complex reasoning, Sonnet for tool-use efficiency)
- How are results streamed back to Layer 1 and surfaced to the user?
- What's the timeout/fallback if reasoning takes too long?
Coordination Protocol
- How do the two layers communicate? (message queue, shared context store, streaming callback)
- How is session state shared between layers?
- How are conflicting outputs resolved? (e.g., user asks follow-up while background reasoning is mid-flight)
Output
- Architecture diagram (text-based boxes and arrows)
- API contract between layers (message format, async protocol)
- Implementation scaffold β TypeScript/Python code for the coordination layer
- Cost estimate: interaction model calls/day vs reasoning model calls/day at [N] users
- Three edge cases to test before shipping
**When to use this:** When building AI products where real-time responsiveness and deep reasoning are both required β voice assistants, coding agents, customer support bots, or any interface where latency kills the experience but shallow responses aren't enough.
**Expected output:** Architecture diagram, inter-layer API contract, coordination layer code scaffold, cost model, and edge case test plan.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for agentic architecture context. β [Chapter 5: The Tools Landscape](https://vibecodingebook.com/reader#ch05) for model tier comparison. β [vibe-coding.academy](https://vibe-coding.academy) for hands-on AI architecture courses.
---
### 17.260 β AI Vendor Lock-In Risk Assessment (Intermediate)
**Tool**: Claude Opus 4.7 | **Time**: 20-40 min | **Difficulty**: Intermediate
*As Anthropic surpassed OpenAI in US business adoption for the first time (34.4% vs 32.3%, April 2026), the market is consolidating around a few frontier providers. This prompt helps you assess your AI vendor dependency risk and design a portable architecture before lock-in becomes painful.*
Assess the AI vendor lock-in risk for the following product and propose a mitigation strategy.
Product Overview
[Describe your product: what it does, who uses it, monthly active users or revenue]
Current AI Vendor Usage
For each AI provider you use, fill in:
| Provider | Models Used | Use Cases | Monthly Spend | % of AI Calls | Data Sent |
|---|---|---|---|---|---|
| [e.g., Anthropic] | [e.g., Claude Sonnet 4.6] | [e.g., code review, chat] | $[amount] | [%] | [e.g., code snippets, user messages] |
Lock-In Risk Assessment
For each vendor above, evaluate:
Technical Lock-In
- Are you using provider-specific features unavailable elsewhere? (extended thinking, tool use format, vision API)
- How many prompt templates are tuned for this provider's behavior/format?
- Does your evaluation suite test against this provider specifically?
Data Lock-In
- Is any user data or fine-tuning data stored with the provider?
- Are conversation histories or embeddings in the provider's storage?
Operational Lock-In
- What's your migration effort if this provider has a 24-hour outage?
- What if they double pricing with 30 days notice?
- What if they deprecate your model version with 90 days notice?
Business Lock-In
- Is this provider in your marketing copy or customer contracts?
- Are any enterprise customers specifically asking for this provider?
Output Format
- Lock-in score per vendor: Low / Medium / High / Critical
- Top 3 lock-in risks with specific scenarios (what breaks if X happens)
- Portability roadmap: exact code/architecture changes to add a provider abstraction layer
- Recommended fallback vendors for each use case (with performance/cost comparison)
- Migration runbook: step-by-step to switch providers in < 48 hours if needed
**When to use this:** Quarterly vendor dependency review, before signing multi-year enterprise AI contracts, or after any AI provider pricing change or model deprecation announcement. Also run when a new frontier model significantly outperforms your current provider.
**Expected output:** Lock-in risk scores per vendor, concrete failure scenarios, provider abstraction layer design, and a 48-hour migration runbook.
**Cross-link**: β [Chapter 9: The Numbers](https://vibecodingebook.com/reader#ch09) for current market share and vendor momentum data. β [Chapter 15: The Business of Vibes](https://vibecodingebook.com/reader#ch15) for AI cost structure in vibe-coded products. β [endofcoding.com](https://endofcoding.com) for AI vendor competitive intelligence.
---
### 17.261 β AI Coding Tool Token Budget Audit (Intermediate)
**Tool**: Claude Code, Claude Opus 4.7 | **Time**: 20-30 min | **Difficulty**: Intermediate
*GitHub Copilot eliminated flat-rate pricing on June 1, 2026, switching to per-token billing across all tiers. Cursor, Claude Code, and other AI coding tools are following with similar consumption-based models. This prompt audits your team's AI coding tool usage and calculates true monthly cost under metered pricing.*
Audit our team's AI coding tool usage and estimate our true monthly cost under per-token billing.
Team Profile
- Team size: [N] developers
- Primary AI coding tools: [GitHub Copilot / Cursor / Claude Code / other β list all]
- IDE: [VS Code / JetBrains / Neovim / other]
Usage Data (pull from admin dashboards)
For each tool, provide what data you have:
GitHub Copilot (if applicable)
- Daily completions accepted: [N]
- Copilot Chat messages/day: [N]
- Copilot Workspace tasks/week: [N]
- Any Copilot Extensions deployed: [list]
Cursor (if applicable)
- Premium requests/month: [N] (check Settings β Usage)
- Agent mode tasks/day: [N]
- Average files per agent task: [N]
Claude Code (if applicable)
- Sessions/day across team: [N]
- Average session length: [N minutes]
- Autonomous task runs/week: [N]
Usage Classification
Classify each use case by token intensity:
| Use Case | Daily Frequency | Estimated Tokens/Use | Monthly Tokens |
|---|---|---|---|
| Autocomplete (accepted) | [N/day] | ~200 | [calc] |
| Chat Q&A (short) | [N/day] | ~2,000 | [calc] |
| Chat Q&A (codebase context) | [N/day] | ~15,000 | [calc] |
| Workspace/Agent task (small) | [N/week] | ~80,000 | [calc] |
| Workspace/Agent task (large) | [N/week] | ~300,000 | [calc] |
| Extension/automated workflow | [N/day] | ~50,000 | [calc] |
Output Required
- Total estimated monthly token consumption per tool, per developer, per team
- Cost projection under each tool's current published pricing
- Top 3 cost drivers β which developers or use cases consume the most
- Reduction recommendations β which workflows can be batched, cached, or moved to cheaper models
- Toolchain recommendation β given our usage pattern, which combination of tools minimizes cost while maintaining productivity?
- Budget governance plan β alerts, caps, and approval workflows for high-token tasks
**When to use this:** Now β before June 1, 2026. Run quarterly thereafter or whenever any AI coding tool announces pricing changes. Also run when adding new developers to the team or enabling new AI tool features.
**Expected output:** Monthly token projection by tool, cost estimate by tool, top cost drivers, toolchain recommendation, and budget governance plan.
**Cross-link**: β [Chapter 5: The Tools Landscape](https://vibecodingebook.com/reader#ch05) for tool comparison data. β [endofcoding.com/ebook/github-copilot-per-token-pricing-june-2026](https://endofcoding.com/ebook/github-copilot-per-token-pricing-june-2026) for the full Copilot pricing breakdown. β [vibe-coding.academy](https://vibe-coding.academy) for AI tool management courses.
---
### 17.262 β The 1-Person AI Team Architecture Prompt (Expert)
**Tool**: Claude Opus 4.7, Claude Code | **Time**: 45-90 min | **Difficulty**: Expert
*Coinbase tested the "1-person team" model in May 2026: a single human operator directing AI agents acting simultaneously as engineer, designer, and product manager across a complete product cycle. This prompt designs that architecture for your specific product context.*
Design a 1-person AI team architecture for the following product initiative. I am a single operator who will direct AI agents handling engineering, design, and product management simultaneously.
Initiative
[Describe the product or feature you need to build β scope, target users, core functionality]
My Background
- Strongest skill: [engineering / design / product / other]
- Weakest skill: [which domain I need AI to compensate most]
- Hours available per week: [N]
- Deadline: [date or milestone]
Design the Team Architecture
Role Assignment
Map each function to an AI agent configuration:
Engineering Agent
- Model: [recommend Claude Opus 4.7 / Sonnet 4.6 / Cursor Agent]
- Context: [what persistent context this agent needs β codebase, coding standards, architecture docs]
- Trigger: [when does this agent activate β on user story acceptance, on design handoff, continuously]
- Output contract: [what does this agent hand off and in what format]
Design Agent
- Model: [recommend β vision-capable model for design review, image generation for mockups]
- Context: [brand guidelines, component library, existing UI screenshots]
- Trigger: [when does this agent activate]
- Output contract: [Figma-compatible specs / HTML mockups / component descriptions]
Product Agent
- Model: [recommend Claude Opus 4.7 for strategy, Sonnet for user stories]
- Context: [user research, competitive analysis, success metrics]
- Trigger: [weekly planning, on feature request, on production metrics alert]
- Output contract: [user stories with acceptance criteria, priority stack rank, metric targets]
Coordination Protocol
- How do the three agents hand off work to each other?
- What is my decision gate β where does the human operator make the final call vs. auto-approve?
- How are conflicts between agent outputs resolved? (e.g., design says "add a wizard", engineering says "too complex for timeline")
- How is product context synchronized across agents?
Human Operator Workflow
- Daily standup protocol: what do I review and approve each morning?
- Sprint planning: how do I set the week's objective and have agents plan execution?
- Review/QA gate: what checkpoints do I personally review before shipping?
- Incident protocol: when an agent produces a bad output, how do I roll back and retask?
Infrastructure
- Memory system: how do agents maintain context across sessions (files, vector DB, conversation history)?
- Version control: how are agent-generated changes tracked and attributed?
- Monitoring: how do I watch all three agent streams without being overwhelmed?
Output
- Complete team architecture diagram (text-based)
- Per-agent system prompts (draft β ready to use)
- Weekly operator workflow (day-by-day schedule)
- Coordination protocol (handoff format, conflict resolution rules)
- First 2-week sprint plan using this architecture
- 3 failure modes to design against (agent conflict, context drift, quality regression)
**When to use this:** Before starting any solo founder / solo operator product initiative. Also run when a team wants to "multiply" a single senior developer into a full product team using agents.
**Expected output:** Complete 1-person AI team architecture with system prompts, operator workflow, coordination protocol, sprint plan, and failure mode mitigations.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for multi-agent coordination patterns. β [Chapter 15: The Business of Vibes](https://vibecodingebook.com/reader#ch15) for solo founder AI leverage. β [endofcoding.com](https://endofcoding.com) for real-world 1-person AI team case studies.
---
### 17.263 β AI Workforce Ethics Boundary Assessment (Advanced)
**Tool**: Claude Opus 4.7 | **Time**: 30-45 min | **Difficulty**: Advanced
*Meta disclosed in May 2026 that employee keystrokes were being recorded to train internal AI models during a period of simultaneous 8,000-person layoffs. Freshworks CEO confirmed 50% AI-generated code while cutting 500 staff with revenue still growing 16%. These cases represent a new category of AI ethics risk: using AI data collection against employees during workforce reduction. This prompt assesses your organization's AI workforce ethics boundaries.*
Assess the ethical boundaries of our AI data collection and workforce practices, and identify risks before they become public incidents.
Our Current AI Practices
Data Collection
- Do we record or log employee work sessions for AI training? [Yes / No / Unsure]
- If yes: what data (keystrokes, screen captures, code commits, communications)?
- Have employees been explicitly informed? [Yes / No / Partially]
- Is employee consent obtained? [Yes / Opt-out only / No]
AI-Driven Workforce Changes
- Have we made hiring or firing decisions influenced by AI productivity metrics? [Yes / No]
- Are AI productivity tools used to rank or evaluate individual employees? [Yes / No]
- Have AI efficiency gains been cited as rationale for workforce reduction? [Yes / No]
AI Development Workforce Share
- What percentage of our codebase is AI-generated (estimated)? [%]
- Has headcount changed while AI usage increased? [Yes β reduced / Stable / Grown]
Risk Assessment Framework
For each practice identified above, assess:
Legal Risk
- Does this practice comply with GDPR, CCPA, or applicable labor law?
- Are there disclosure requirements we may not be meeting?
- Could former employees make claims based on how AI data was used in performance reviews?
Reputational Risk
- If this practice was published by a journalist tomorrow, how would it read?
- What employee trust impact would disclosure create?
- How does this compare to publicized cases (Meta keylogging, Freshworks layoffs) in severity?
Operational Risk
- If we must stop this practice immediately (due to legal finding), what processes break?
- Have we created AI dependencies that require ongoing employee data collection to maintain?
Recommended Boundaries
Based on the assessment above, define:
- Red lines β practices we will not do regardless of business pressure
- Yellow lines β practices requiring explicit consent, opt-out, and audit trail
- Green practices β AI data collection that is clearly ethical with proper disclosure
- Employee communication plan β how we inform staff of current AI data practices
Output
- Ethics risk score: Low / Medium / High / Critical for each practice
- Legal exposure summary (GDPR/CCPA/labor law gaps)
- Recommended policy language for employee handbook
- Consent and opt-out mechanism design
- Public statement template (for proactive disclosure or if a story breaks)
**When to use this:** Before deploying any AI system that collects employee behavioral data. Run annually as an ethics audit, or immediately if your organization has made workforce changes while expanding AI usage.
**Expected output:** Ethics risk assessment, legal exposure summary, policy language, consent mechanism design, and public statement template.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for AI ethics and risk patterns. β [Chapter 9: The Numbers](https://vibecodingebook.com/reader#ch09) for AI workforce adoption data. β [endofcoding.com](https://endofcoding.com) for AI ethics coverage and case studies.
---
*Chapter 17 additions β May 15, 2026 | Prompts 17.261β17.263 (AI Coding Tool Token Budget Audit, 1-Person AI Team Architecture, AI Workforce Ethics Boundary Assessment) | 277+ prompts across 47 categories | Previous: May 14 (prompts 17.258β17.260 β AI Agent Behavioral Safety Pre-Production Audit, Split Interaction/Reasoning Agent Architecture, AI Vendor Lock-In Risk Assessment). Prompted by: GitHub Copilot switching to per-token billing June 1 2026, Coinbase's 1-person AI team model announcement (May 2026), and Meta's employee keystroke logging during 8,000-person layoffs disclosed May 2026.*
---
### Prompt 17.264: Open-Weight Model Evaluation for Production Vibe Coding
**Difficulty**: Intermediate | **Tool**: Claude Sonnet 4.6, any frontier model | **Time**: 20-30 min | **Category**: Tool Selection / Cost Optimization
I'm evaluating whether to integrate an open-weight LLM into my vibe coding workflow to reduce API costs and improve offline capability. Here's my current setup:
Current Stack
- Primary AI: [Claude Sonnet 4.6 / GPT-5 / other]
- IDE: [Cursor / Windsurf / VS Code / other]
- Monthly API spend: $[amount]
- Primary use cases: [list 3-5: e.g., feature generation, debugging, code review, documentation]
Candidate Open-Weight Models I'm Considering
- [e.g., Kimi K2.6 β 128K context, Apache 2.0, 78.57% coding benchmark]
- [e.g., DeepSeek V4 β 1M context, MIT, 1.6T params]
- [e.g., GLM-5.1 β 200K context, MIT, SWE-Bench Pro leader]
Infrastructure Constraints
- Local hardware: [GPU/RAM available, e.g., M3 Max 128GB / RTX 4090 24GB / cloud GPU]
- Compliance requirements: [can I send code externally? any data residency rules?]
- Latency tolerance: [real-time interactive / batch processing / overnight jobs]
Evaluation Framework
For each candidate model, assess:
1. Benchmark-to-Reality Gap
- What coding benchmarks does the model excel at?
- What is the known gap between benchmark scores and real-world IDE performance?
- Are there independent real-world reports from teams using this model in production?
2. Hardware Feasibility
- What quantization level can I run given my hardware? (Q4, Q6, Q8, full precision)
- What's the estimated tokens/second at that quantization on my hardware?
- How does that compare to the API response time I currently get?
3. Use Case Match
- For each of my use cases above, rate each model's suitability (High/Medium/Low)
- Which use cases are safe to route to open-weight (high volume, lower quality tolerance)?
- Which use cases should stay on closed-API (complex reasoning, customer-facing output)?
4. Total Cost of Ownership
- Monthly infrastructure cost (electricity, cloud GPU, or amortized hardware)
- Time cost of setup and maintenance
- Break-even point vs. current API spend
5. Risk Assessment
- License compliance: is the license compatible with my commercial use?
- Model updates: how frequently does the model update, and how do I manage upgrades?
- Quality regression risk: what's the fallback if the model underperforms?
Deliverable
Produce a decision matrix with a recommended routing strategy:
- Route to open-weight: [specific task types]
- Keep on closed API: [specific task types]
- Hybrid (open-weight draft + API review): [specific task types]
- Recommended first model to try: [model name + rationale]
- Setup priority list: [ordered list of implementation steps]
**When to use this:** When your monthly AI API costs exceed $200/month, when compliance prevents external code transmission, or when Anthropic's June 2026 agent credit metering changes your cost structure.
**Expected output:** Tiered routing strategy, break-even analysis, and a specific implementation plan for your first open-weight model integration.
**Cross-link**: β [Chapter 5: The Tools Landscape](https://vibecodingebook.com/reader#ch05) for open-weight model comparisons. β [endofcoding.com: 5 Open-Weight Models Dropped in May 2026](https://endofcoding.com/ebook/open-weight-model-wave-may-2026-vibe-coders-guide) for the latest model comparison. β [endofcoding.com: Agent Credit Survival Guide](https://endofcoding.com/ebook/anthropic-agent-credits-june-2026-survival-guide) for cost management context.
---
### Prompt 17.265: Enterprise MCP Integration Design
**Difficulty**: Expert | **Tool**: Claude Opus 4.7, Claude Code | **Time**: 45-90 min | **Category**: Architecture / Enterprise Integration
I need to design a Model Context Protocol (MCP) integration between an AI assistant (Claude) and an enterprise system. SAP, Salesforce, and other enterprise platforms are now supporting MCP natively β I need a production-ready architecture.
Integration Context
- Enterprise system: [SAP S/4HANA / Salesforce / ServiceNow / custom ERP / other]
- AI assistant: [Claude Code / custom agent / enterprise Claude deployment]
- Primary use case: [e.g., query sales data, update records, generate reports, trigger workflows]
- Users: [internal employees / external customers / automated agents only]
- Data sensitivity: [public / internal / confidential / regulated (HIPAA/PCI/GDPR)]
MCP Server Requirements
Design the MCP server that bridges Claude to the enterprise system:
1. Tool Inventory
List all MCP tools this server should expose:
- Read tools: What data should Claude be able to query? (with field-level detail)
- Write tools: What actions should Claude be able to trigger? (with business rule constraints)
- Search tools: What full-text or semantic search capabilities are needed?
For each tool specify:
- Tool name (snake_case, descriptive)
- Input schema (required vs. optional fields, types, validation rules)
- Output schema (what Claude receives back)
- Rate limits and pagination requirements
- Idempotency requirements (can Claude safely retry this tool call?)
2. Authentication Architecture
- How does the MCP server authenticate to the enterprise system? (OAuth 2.0, API key, service account, SAML)
- How does Claude authenticate to the MCP server?
- How do we propagate end-user identity for audit trails? (user context passing)
- Token refresh and session management strategy
3. Permission Model
- What is the minimum permission set the MCP server should hold?
- How do we scope permissions by user role? (Claude should only do what the human user is authorized to do)
- Where do we implement business rule validation β MCP server or enterprise system?
4. Observability
- What do we log for each tool call? (who called it, what parameters, what was returned, latency)
- How do we detect and alert on anomalous usage patterns?
- What's the retention policy for MCP interaction logs?
5. Error Handling
- How should the MCP server translate enterprise system errors into Claude-readable messages?
- What's the fallback if the enterprise system is unavailable?
- How do we handle partial success (some records updated, others failed)?
Deliverables
- MCP server architecture diagram (described in detail)
- Complete tool schema definitions (JSON Schema format)
- Authentication flow sequence diagram (described)
- Security control checklist
- Sample Claude system prompt that instructs Claude on how to use these tools responsibly
**When to use this:** When integrating Claude into enterprise software like SAP (which announced native MCP support via Joule agents in May 2026), Salesforce, or any internal enterprise platform. Run before architecture review board presentations.
**Expected output:** Production-ready MCP server design, complete tool schemas, security controls, and a Claude system prompt that constrains the agent to appropriate enterprise behavior.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for MCP architecture patterns. β [cyberos.dev](https://cyberos.dev) for security scanning of MCP server implementations. β [llmhire.com](https://llmhire.com) for finding engineers with MCP integration experience.
---
### Prompt 17.266: AI Agent Credit Budget Calculator and Optimization Plan
**Difficulty**: Intermediate | **Tool**: Claude Sonnet 4.6, spreadsheet | **Time**: 30-45 min | **Category**: Cost Management / Operations
Anthropic is introducing metered agent credits starting June 15, 2026. I need to audit my current Claude usage, forecast costs under the new model, and optimize my workflows before the billing change hits.
Current Usage Profile
For each workflow I run that uses Claude, fill in:
| Workflow | Frequency | Avg tool calls/run | Avg tokens/run | Critical? |
|---|---|---|---|---|
| [Workflow 1] | [daily/weekly/per-PR] | [number] | [estimate] | [yes/no] |
| [Workflow 2] | ||||
| [Workflow 3] |
My Anthropic plan: [Pro $20/mo / Max $100/mo / Max $200/mo / API direct] Included agent credits: [matches plan price β $20/$100/$200] Monthly API budget (if direct): $[amount]
Analysis Tasks
1. Current Cost Baseline
- Estimate current monthly agent token consumption by workflow
- Identify which workflows are "heavy" (>1000 tool calls/month) vs. "light"
- Calculate what my costs would be under the new credit model if nothing changes
2. High-Value vs. Low-Value Classification
For each workflow, classify:
- Business-critical: Fails silently if degraded β keep on Claude, optimize token usage
- Quality-sensitive: Output goes to customers or published β keep on Claude
- Automatable-bulk: High volume, tolerance for occasional errors β candidate for open-weight alternative
- Experimental: Testing/dev only β move to cheapest option
3. Optimization Opportunities
- Which prompts can be shortened without losing quality? (identify verbose system prompts)
- Which workflows can be batched (reduce per-call overhead)?
- Which workflows can be switched to a cheaper Tier 2 model (Sonnet vs. Opus)?
- Which agentic tool sequences can be collapsed into fewer tool calls?
4. Open-Weight Migration Candidates
Based on the classification above, identify workflows that could move to:
- Self-hosted Kimi K2.6 or DeepSeek V4: High volume, non-critical, code-heavy
- Claude Haiku 4.5: Low-stakes generation tasks that don't need Sonnet quality
5. June 15 Readiness Plan
Produce a week-by-week action plan:
- Week of June 1: Audit complete, decision matrix ready
- Week of June 8: Test alternatives, measure quality delta
- Week of June 15: Switch non-critical workflows, monitor credits
- Week of June 22: Review first billing cycle, adjust routing
Deliverable
- Cost forecast: current vs. post-June-15 (under new credit model)
- Workflow routing decision: keep on Claude / migrate to alternative / optimize in place
- Token optimization quick wins (list of specific prompt changes)
- Credit burn alert threshold (at what usage level should I get a notification?)
- 30-day rollback plan (how to revert if quality degrades after migration)
**When to use this:** Before June 15, 2026, when Anthropic's agent credit metering goes live. Run this now to avoid bill shock and ensure your critical workflows are protected.
**Expected output:** Cost forecast, workflow routing plan, specific optimization actions, and a monitoring strategy with alert thresholds.
**Cross-link**: β [endofcoding.com: Agent Credit Survival Guide](https://endofcoding.com/ebook/anthropic-agent-credits-june-2026-survival-guide) for full breakdown of the billing change. β [endofcoding.com: Open-Weight Model Guide](https://endofcoding.com/ebook/open-weight-model-wave-may-2026-vibe-coders-guide) for migration alternatives. β [Chapter 15: The Business of Vibes](https://vibecodingebook.com/reader#ch15) for AI cost management frameworks.
---
*Chapter 17 additions β May 17, 2026 | Prompts 17.264β17.266 (Open-Weight Model Evaluation, Enterprise MCP Integration Design, AI Agent Credit Budget Calculator) | 280+ prompts across 47 categories | Previous: May 15 (prompts 17.261β17.263 β AI Coding Tool Token Budget Audit, 1-Person AI Team Architecture, AI Workforce Ethics Boundary Assessment). Prompted by: Simultaneous launch of 5 open-weight frontier models (Kimi K2.6, DeepSeek V4, GLM-5.1, Gemma 4, MiMo 2.5), SAP+Anthropic MCP integration announcement (May 2026), and Anthropic agent credit meter going live June 15, 2026.*
---
### Prompt 17.267: AI-Native Toolchain Readiness Audit (Advanced)
**Tool**: Claude Code | **Time**: 15-20 min | **Category**: Infrastructure & Toolchain
*Triggered by: Vercel Labs releasing Zero β a programming language designed for AI agent consumption (May 2026). Use to evaluate how well your current toolchain integrates with AI coding agents.*
You are a senior DevEx engineer evaluating an existing project's toolchain for AI-agent compatibility.
Project Context
- Language/framework: [TypeScript/Python/Go/Rust/etc.]
- Build tool: [npm/cargo/go build/webpack/etc.]
- CI system: [GitHub Actions/CircleCI/Jenkins/etc.]
- AI coding tools in use: [Claude Code/Cursor/Copilot/etc.]
Audit Goals
Assess how well the current toolchain supports the AI-agent-driven development loop: GENERATE β COMPILE β PARSE ERRORS β FIX β REPEAT
1. Error Parsability Score
For each tool that produces diagnostic output (compiler, linter, test runner):
- Are errors machine-readable (JSON/structured) or prose-only?
- Can an AI agent extract: error type, file, line, column, suggested fix?
- Score each tool: 0 (pure prose) β 3 (structured JSON with fix suggestions)
2. Build Determinism Check
- Does the build produce identical output given identical input? (no timestamp-based variance)
- Are all dependencies pinned (lock files committed)?
- Can an AI agent reproduce a build failure locally with a single command?
3. Test Feedback Quality
- Do tests report: which assertion failed, expected vs. actual, and the diff?
- Is test output structured enough for an agent to identify the failing case without reading source?
- Can tests be run in isolation (single test file / single test case)?
4. Agent Integration Points
Identify gaps where current tooling forces an AI agent to "guess":
- Ambiguous error messages requiring context an agent doesn't have
- Build steps that modify global state (global npm installs, env mutations)
- CI pipelines that fail silently or with non-actionable messages
5. Quick Wins
For each gap identified, propose the minimal change that improves agent compatibility:
- e.g., "Add --reporter=json flag to vitest invocation"
- e.g., "Add TypeScript strict mode to catch type errors before runtime"
- e.g., "Pin all npm dependencies with npm ci in CI pipeline"
Deliverable
- Toolchain compatibility matrix (each tool scored 0-3)
- Top 3 gaps blocking smooth agent-driven fix loops
- Quick wins: specific commands/config changes to implement
- One "moonshot" improvement requiring significant investment (e.g., migrate to structured log format)
**When to use this:** When AI coding agents are frequently confused by your build errors, producing fixes that don't address the root cause. Run quarterly or when onboarding a new AI coding tool.
**Expected output:** Scored matrix, gap list, and actionable config changes you can implement in an afternoon.
**Cross-link**: β [endofcoding.com: Vercel Zero β AI-native programming language](https://endofcoding.com/ebook/vercel-zero-programming-language-ai-agents-2026) for the design patterns Zero uses. β [Chapter 5: Tools](https://vibecodingebook.com/reader#ch5) for AI coding tool selection. β [cyberos.dev](https://cyberos.dev) for secure build pipeline patterns.
---
### Prompt 17.268: Always-On Autonomous Agent Design (Expert)
**Tool**: Claude Code, claude-sonnet-4-6 or claude-opus-4-6 | **Time**: 30-45 min | **Category**: Agent Architecture
*Triggered by: Google announcing Gemini Spark β a 24/7 background AI agent that learns from behavior and handles multi-step workflows proactively (Google I/O 2026, May 19). Use to design a comparable always-on agent for your own product or workflow.*
You are an AI systems architect. Design an always-on autonomous agent for [use case / product].
Agent Purpose
[One sentence: what this agent monitors, manages, or acts on continuously]
Trigger Model
Define when the agent activates:
- Event-driven: Responds to [webhooks / file changes / API polling / user actions]
- Time-driven: Runs on schedule [cron expression or interval]
- Reactive: Watches [queue / stream / inbox] and acts on new items
- Proactive: Initiates actions based on learned patterns (if applicable)
State & Memory
An always-on agent needs persistent memory to avoid redundant actions:
- Short-term: What happened in the last [N] runs / [N] hours?
- Long-term: What patterns has the agent learned about this system?
- State storage: [file-based / database / Redis / in-memory]
- Conflict detection: How does the agent know if another instance is already running?
Action Boundaries (CRITICAL)
Define exactly what the agent CAN and CANNOT do autonomously:
| Action | Autonomous | Requires Approval | Never Allowed |
|---|---|---|---|
| Read data | β | ||
| Send notifications | β | ||
| Write/modify files | β | ||
| Delete data | β | ||
| [your action] |
Failure Modes & Circuit Breakers
For an agent running 24/7, failure handling is more critical than the happy path:
- API rate limit hit: [back off N seconds / switch to queue]
- Unexpected response format: [log and skip / alert human / halt]
- Consecutive failures > N: [pause agent / alert on-call / rollback last action]
- Runaway loop detected: [detect via counter / timestamp check / hash of recent actions]
Human Oversight Interface
Design the minimum interface for a human to:
- See what the agent did in the last 24 hours (audit log format)
- Pause/resume the agent without code changes
- Override a decision the agent made
- Set/change the agent's action boundaries at runtime
Cost Controls
Estimate and cap agent resource consumption:
- Expected API calls per day: [N] at [model] = $[X]
- Maximum daily spend cap: $[N] β halt agent and alert if exceeded
- Which actions can use a cheaper model (Haiku vs. Sonnet)?
Deliverable
- Agent architecture diagram (text-based is fine)
- State machine: agent states and transitions
- Pseudocode for the main agent loop
- Configuration schema (JSON or YAML) for runtime-adjustable parameters
- Monitoring checklist: what to alert on in production
**When to use this:** When building a background agent that needs to run without human supervision. The Gemini Spark pattern (always-on, proactive, learns from behavior) is useful but requires careful boundary design to avoid runaway actions.
**Expected output:** Architecture spec, state machine, pseudocode loop, and configuration schema.
**Cross-link**: β [Chapter 11: Agents](https://vibecodingebook.com/reader#ch11) for agent fundamentals. β [endofcoding.com: Claude Code routines](https://endofcoding.com/ebook/claude-code-routines-automated-dev-workflows-2026) for scheduling patterns. β [LLMHire.com](https://llmhire.com) for AI agent engineer job specs.
---
### Prompt 17.269: Supply Chain Attack Surface Assessment (Expert)
**Tool**: Claude Code | **Time**: 20-30 min | **Category**: Security
*Triggered by: CVE-2026-45321 "Mini Shai-Hulud" supply chain worm compromising 170+ npm/PyPI packages (May 2026). Use after any major supply chain event or quarterly as a security check.*
You are a supply chain security engineer auditing this project for dependency compromise risk.
Project Context
- Package manager: [npm/pip/cargo/go modules]
- Number of direct dependencies: [N]
- CI/CD platform: [GitHub Actions/CircleCI/Jenkins]
- Production deployment: [Vercel/AWS/GCP/self-hosted]
Audit Scope
1. Dependency Inventory
Run: [npm ls --all --json | pip-audit --format=json | cargo tree --format=json]
For each direct dependency, identify:
- Maintainer(s) and their GitHub account age/activity
- Last publish date and publish frequency
- Number of weekly downloads (high = target, also = fast detection)
- Whether it has a lockfile pinning all transitive deps
2. Lockfile Integrity Check
- Is a lockfile (package-lock.json / poetry.lock / Cargo.lock) committed to the repo?
- Is
npm ci(notnpm install) used in CI to enforce lockfile? - Are lockfile hashes verified before install? (
npm cidoes this;pip installdoes not by default) - Flag any package installed without a lockfile pin (these are time-of-install resolution = attack surface)
3. Post-Install Script Audit
Supply chain worms commonly use postinstall hooks. Check:
- Which dependencies run
postinstall/prepare/preinstallscripts? - List each one with: package name, script content (or summary), justification for needing it
- Flag any that make network calls, write outside the package directory, or run binaries
4. Maintainer Trust Assessment
For your top 10 most-depended-on packages (by transitive count):
- Is the npm/PyPI account protected with 2FA?
- Has the maintainer published anything anomalous in the last 30 days?
- Is the package actively maintained (commits < 6 months old)?
- Does the package have a Security Policy (SECURITY.md)?
5. CI/CD Pipeline Exposure
- Do CI jobs run
npm installwith network access on production secrets? - Are third-party GitHub Actions pinned to commit SHAs (not
@mainor@v1)? - Does the pipeline download artifacts from external URLs without checksum verification?
- Is there a Software Bill of Materials (SBOM) generated on every build?
6. Response Readiness
If a supply chain compromise is discovered in a dependency you use:
- How quickly can you identify all affected deployment artifacts? (target: < 1 hour)
- Can you pin to a known-good version and redeploy in < 30 minutes?
- Do you have a way to notify affected users if their data was exposed?
Deliverable
- Risk score: overall supply chain health (Low / Medium / High / Critical)
- Postinstall scripts requiring review (table with package, script, risk level)
- Unlocked/unpinned dependencies (list with recommended pin commands)
- Top 3 immediate actions to reduce attack surface
- Monitoring recommendation: which registry feeds/advisories to subscribe to
**When to use this:** After any major supply chain event (like the Shai-Hulud npm worm), before a major release, or quarterly as part of your security review cycle.
**Expected output:** Risk score, actionable findings sorted by severity, and a prioritized remediation checklist.
**Cross-link**: β [cyberos.dev](https://cyberos.dev) for supply chain CVE tracking and security patterns. β [endofcoding.com: npm supply chain worm guide](https://endofcoding.com/ebook/npm-supply-chain-worm-vibe-coding-2026) for the Shai-Hulud incident analysis. β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for security fundamentals in vibe-coded apps.
---
### Prompt 17.270: Deterministic Multi-Agent Pipeline Design with Conductor (Advanced)
**Tool**: Claude Code | **Time**: 25-40 min | **Category**: Multi-Agent Orchestration
*Triggered by: Microsoft open-sourcing Conductor β a zero-LLM-overhead YAML orchestration CLI for multi-agent workflows (May 14, 2026). Use when designing AI pipelines where workflow structure is known in advance and token costs for routing matter.*
You are a multi-agent systems architect designing a production pipeline using Microsoft Conductor β a deterministic YAML orchestration tool with zero LLM overhead for routing.
Pipeline Goal
[Describe what this pipeline needs to accomplish end-to-end]
Available Tools / MCP Servers
- [Tool 1]: [What it does, e.g., web-search-mcp, cms-mcp, slack-mcp]
- [Tool 2]: [What it does]
- [Tool N]: [What it does]
Constraints
- Budget per run: $[X] in LLM API costs
- Latency target: [< N minutes total]
- Human approval required before: [which actions β publishing, deleting, sending messages]
- Failure handling: [retry / skip / abort / alert]
Design the Conductor YAML for this pipeline
Step 1: Identify parallelizable stages
Which stages have no dependencies on each other and can run simultaneously? List each parallel group as a set of agents with their prompts and tools.
Step 2: Define the sequential execution graph
After parallel stages, what must happen in order? Map out the dependency chain: Stage A β [depends on nothing] β runs first Stage B + Stage C β [parallel, depend on nothing] β run simultaneously Stage D β [depends on B and C outputs] β runs after both complete Stage E (conditional) β [runs only if Stage D.output.risk_score >= "HIGH"]
Step 3: Design human approval gates
Which actions should pause for human review before execution? For each gate, specify:
- What the agent will show the human for review
- What happens on approve vs reject (retry with feedback / skip / abort)
Step 4: Write the complete conductor.yaml
Generate a working YAML file with:
- workflow name and description
- all parallel execution groups (use
parallel:blocks) - all sequential steps (use
then:chains) - Jinja2 conditions for conditional steps ({{ agent.field operator value }})
- approval gates where required
- proper output variable references ({{ agent-name.output }})
- input schema at the top (what variables the workflow accepts)
Step 5: Dry-run analysis
Walk through the pipeline as if executing it with a sample input:
- Which agents fire in which order?
- Which conditions evaluate to true/false (and why)?
- Which approval gates would pause execution?
- What is the critical path (longest sequential chain)?
- Estimated token cost vs. a fully LLM-routed equivalent
Step 6: Error handling spec
For each agent in the pipeline:
- What happens if it times out? (retry count, backoff)
- What happens if its output fails validation? (retry with different prompt / skip / abort)
- What does failure look like in the run log?
Deliverable
- Complete conductor.yaml (ready to run)
- Execution graph diagram (ASCII or mermaid)
- Cost estimate: tokens per run Γ runs per day = monthly LLM spend
- Comparison: Conductor vs equivalent LangGraph/AutoGen implementation (complexity, cost, reliability)
**When to use this:** When building any structured AI pipeline where the workflow shape is known β content generation, daily ops, code review chains, research pipelines. The zero-LLM routing overhead is especially valuable for workflows running multiple times per day.
**Expected output:** A working conductor.yaml, an execution graph, and a cost/complexity comparison with LLM-routed alternatives.
**Cross-link**: β [endofcoding.com: Microsoft Conductor deep dive](https://endofcoding.com/ebook/microsoft-conductor-multi-agent-orchestration-2026) for setup and real-world examples. β [Chapter 11: Agents](https://vibecodingebook.com/reader#ch11) for agent fundamentals before designing pipelines. β [LLMHire.com](https://llmhire.com) for Multi-Agent Orchestration Engineer job specs.
---
### Prompt 17.271: Anthropic Stainless SDK Generation β MCP Server Scaffolding (Intermediate)
**Tool**: Claude Code | **Time**: 15-25 min | **Category**: Developer Tooling / API Integration
*Triggered by: Anthropic's acquisition of Stainless (May 2026) β the company behind SDK generation and MCP server tooling. Use when building a new MCP server or SDK wrapper for an internal or public API.*
You are building an MCP (Model Context Protocol) server for the following API so that Claude and other AI agents can call it natively as a tool.
API to Wrap
- API Name: [e.g., "Internal CRM API", "GitHub REST API", "Stripe Billing API"]
- Base URL: [https://api.example.com/v2]
- Authentication: [Bearer token / API key in header / OAuth2]
- OpenAPI spec available: [yes β paste spec or file path | no β I'll describe endpoints]
Endpoints to Expose as MCP Tools
List the specific operations you want AI agents to call:
Endpoint: [POST /contacts]
- Tool name: create_contact
- When an agent should use it: [When it needs to add a new lead or customer]
- Required params: [name: string, email: string, company: string]
- Optional params: [phone: string, tags: string[]]
- Returns: [contact_id, created_at]
Endpoint: [GET /contacts/{id}]
- Tool name: get_contact
- When an agent should use it: [When it needs to look up an existing contact's details]
- Required params: [id: string]
- Returns: [full contact object]
[Add N more endpoints following the same pattern]
MCP Server Design
Step 1: Tool schema design
For each endpoint above, write the MCP tool definition:
- name: snake_case identifier (what agents will call)
- description: one sentence explaining WHEN to use this tool (agents read this to decide)
- inputSchema: JSON Schema for all parameters
- Distinguish required vs optional params clearly
Step 2: Server scaffolding
Generate the full MCP server implementation in TypeScript using @modelcontextprotocol/sdk:
- Server initialization with name and version
- Tool registration for each endpoint
- HTTP client with auth header injection
- Input validation before API calls
- Error handling: map API error codes to meaningful MCP error messages
- Response formatting: extract only the fields agents need (don't return raw API blobs)
Step 3: MCP configuration
Generate the mcp.json config for adding this server to Claude Code / Claude Desktop:
{
"mcpServers": {
"[server-name]": {
"command": "node",
"args": ["dist/index.js"],
"env": {
"API_KEY": "${[API_KEY_ENV_VAR]}"
}
}
}
}
Step 4: Tool description optimization
Rewrite each tool's description to be agent-optimized (not human-optimized):
- Lead with when to use it, not what it does
- Mention what it returns so the agent knows what to do with the output
- Flag any side effects (writes data, sends emails, charges money)
- Example: "Use this tool when you need to look up an existing contact. Returns full contact details including email, company, and all associated tags. Does NOT create new contacts β use create_contact for that."
Step 5: Testing scaffold
Generate test cases for each tool:
- Happy path with valid inputs
- Missing required field (should return validation error, not crash)
- API auth failure (401) β should return clear error message
- Rate limit hit (429) β should surface retry-after to the calling agent
Deliverable
- Complete MCP server (TypeScript, ~150-200 lines for 5 endpoints)
- Optimized tool descriptions for all endpoints
- mcp.json configuration
- Test suite (Vitest or Jest)
- README with setup instructions (< 200 words)
**When to use this:** When wrapping an internal API, third-party service, or data source so AI agents can interact with it natively. With Anthropic now owning the Stainless SDK generation toolchain, MCP server scaffolding will get faster β but the tool design principles above remain critical regardless of generator.
**Expected output:** A working MCP server TypeScript file, optimized tool descriptions, and test coverage.
**Cross-link**: β [Chapter 11: Agents](https://vibecodingebook.com/reader#ch11) for MCP concepts and agent tool design. β [endofcoding.com](https://endofcoding.com) for MCP integration tutorials. β [cyberos.dev](https://cyberos.dev) for security patterns to apply to MCP servers (input validation, auth handling, SSRF prevention).
---
### Prompt 17.272: Multi-Model Routing Strategy β Cost vs Quality Optimization (Advanced)
**Tool**: Claude Code | **Time**: 20-30 min | **Category**: Cost Optimization / AI Architecture
*Triggered by: Sakana AI's RL Conductor (May 2026) demonstrating that a 7B router model can dynamically route tasks across GPT-5, Claude Sonnet 4.6, and Gemini 2.5 Pro β achieving state-of-the-art quality at reduced token cost. Use when evaluating or implementing multi-model routing for cost efficiency.*
You are an AI systems architect designing a multi-model routing strategy for a production application that currently uses a single LLM for all tasks.
Current State
- Primary model in use: [e.g., Claude Sonnet 4.6]
- Monthly API cost: $[X]
- Primary use cases: [list 3-5 types of tasks your app performs, e.g., "code generation", "summarization", "classification", "chat", "data extraction"]
- Quality bar: [what does "good enough" look like for each task?]
- Latency requirement: [< N seconds for interactive tasks, async OK for batch tasks]
Goal
Route each task to the most cost-effective model that still meets the quality bar.
Step 1: Task Taxonomy
Categorize every task your application performs:
| Task Type | Volume/day | Quality Requirement | Current Model | Latency Req |
|---|---|---|---|---|
| [Task 1] | [N] | [High/Med/Low] | [Model] | [< Ns] |
| [Task 2] | [N] | [High/Med/Low] | [Model] | [< Ns] |
| ... | ... | ... | ... | ... |
Step 2: Model Capability Matrix
For each task type, evaluate which models are viable:
| Model | Strengths | Weaknesses | Cost/1M tokens | Latency |
|---|---|---|---|---|
| Claude Opus 4.6 | Complex reasoning, long context, coding | Cost, latency | $[X in / $Y out] | [Ns] |
| Claude Sonnet 4.6 | Balanced quality/speed, coding | Less reasoning depth | $[X in / $Y out] | [Ns] |
| Claude Haiku 4.5 | Speed, cost, simple tasks | Complex reasoning | $[X in / $Y out] | [Ns] |
| Kimi K2.6 (open-source) | Coding benchmarks, lower cost | Self-hosted infra required | $[X] | [Ns] |
| [Other model] | [Strengths] | [Weaknesses] | $[cost] | [latency] |
For each task type, identify: which models are viable? which is cheapest among viable?
Step 3: Routing Logic Design
Design a routing function that selects the right model per task:
def route_task(task_type: str, complexity_score: float, user_tier: str) -> str:
"""
Returns the model ID to use for this task.
complexity_score: 0.0 (trivial) to 1.0 (expert-level)
user_tier: "free" | "pro" | "enterprise"
"""
# Design the routing rules here:
# Example structure:
if task_type == "classification" and complexity_score < 0.3:
return "claude-haiku-4-5" # Trivially cheap
elif task_type == "code_generation" and complexity_score > 0.8:
return "claude-opus-4-6" # High-stakes code needs best model
# ... complete the routing table
For each routing rule, document:
- Why this model for this task/complexity combination
- What happens at the complexity boundary (how do you measure complexity_score?)
- How to handle the model being unavailable (fallback chain)
Step 4: Complexity Estimation
How do you score task complexity without calling an LLM?
Options to evaluate:
- Token count of the input (proxy for context complexity)
- Presence of keywords indicating reasoning needs ("explain why", "design", "architect")
- Task category classification (use a fast Haiku call for under $0.001)
- User-provided difficulty flag
- Historical success rate for similar tasks
Recommend the lowest-overhead complexity estimator for this specific app.
Step 5: Cost Projection
Run the numbers: if you implemented this routing strategy:
| Task Type | Current cost/day | Projected cost/day | Quality change |
|---|---|---|---|
| [Task 1] | $[X] | $[Y] | [Same/Better/Slightly worse] |
| ... | ... | ... | ... |
| Total | $[X]/day | $[Y]/day |
Monthly savings projection: $[X] Projected quality degradation: [None / Minor / Acceptable β for which tasks?]
Step 6: Implementation Plan
Provide the code structure for wrapping the Anthropic SDK with routing:
class RoutedLLMClient {
async complete(task: Task): Promise<string> {
const model = this.routeTask(task);
const response = await this.callModel(model, task);
await this.logRouting(task, model, response.usage); // track for optimization
return response.content;
}
private routeTask(task: Task): string {
// Implement routing logic from Step 3
}
}
Include: routing decision logging (so you can tune thresholds), A/B test mode (% of traffic to new routing), and a kill switch to revert to single-model if quality issues arise.
Deliverable
- Complete routing decision table (task Γ model Γ rationale)
- Complexity estimator recommendation with implementation
- Cost projection (current vs routed)
- TypeScript/Python RoutedLLMClient implementation
- Logging schema for routing optimization data
**When to use this:** When your AI API costs are growing and you want to maintain quality while routing cheaper tasks to smaller or open-weight models. The Sakana RL Conductor result (state-of-the-art quality at lower cost via routing) is the proof this is worth engineering time.
**Expected output:** A routing decision table, cost projection, and a working RoutedLLMClient wrapper ready to integrate.
**Cross-link**: β [endofcoding.com: AI coding tool comparison](https://endofcoding.com/ebook/ai-coding-agent-benchmarks-2026) for model benchmark data. β [Chapter 11: Agents](https://vibecodingebook.com/reader#ch11) for model selection fundamentals. β [vibecodingebook.com](https://vibecodingebook.com) for the full AI tools landscape (Ch. 5).
---
### Prompt 17.273: Google I/O 2026 β Gemini 2.5 Pro Deep Research Integration (Intermediate)
**Tool**: Claude Code | **Time**: 15-25 min | **Category**: Multi-Model Strategy / AI Architecture
*Triggered by: Google I/O 2026 (May 20, 2026) announcing Gemini 2.5 Pro GA with 2M-token "Deep Research" context mode and native Google Workspace tool-use. Use when designing long-context document analysis or research pipelines that may benefit from Gemini's 2M window alongside Claude.*
You are an AI systems architect evaluating when to use Gemini 2.5 Pro's 2M-token Deep Research mode vs. Claude Opus 4.6 / Sonnet 4.6 in a vibe-coded application.
Use Case Description
[What document analysis or research task does your app perform?]
- Document types: [PDFs / codebases / research papers / legal docs / logs]
- Typical document size: [N pages / N tokens]
- Number of documents per session: [N]
- Task type: [summarization / Q&A / cross-document analysis / extraction / synthesis]
Context Window Comparison
For your specific use case, evaluate:
| Scenario | Gemini 2.5 Pro (2M tokens) | Claude Opus 4.6 (200K tokens) | Claude Sonnet 4.6 (200K tokens) |
|---|---|---|---|
| Fits in single context? | [Yes/No] | [Yes/No] | [Yes/No] |
| Cost per session | $[X] | $[X] | $[X] |
| Latency (first token) | [Ns] | [Ns] | [Ns] |
| Quality for this task | [rating] | [rating] | [rating] |
Integration Architecture Options
Option A: Gemini-Only for Deep Research
Use Gemini 2.5 Pro when the entire corpus fits in 2M tokens and Deep Research mode provides better synthesis than chunked Claude calls.
- When it wins: massive codebases (>500K tokens), full-book analysis, entire log dumps
- When it loses: reasoning-heavy tasks, code generation, nuanced instruction following
Option B: Claude-Only with Smart Chunking
Use Claude with a chunking + synthesis strategy when documents are large but tasks are reasoning-heavy.
- Chunk strategy: [sliding window / semantic chunking / hierarchical summarization]
- Synthesis pass: Claude Sonnet aggregates chunk-level outputs into final answer
- When it wins: tasks requiring deep reasoning, multi-step logic, code generation from docs
Option C: Hybrid Pipeline
Use Gemini for initial broad scan / extraction, then Claude for reasoning and generation:
- Gemini 2.5 Pro: ingest full 2M-token corpus, extract structured facts/quotes (JSON output)
- Claude Sonnet 4.6: reason over extracted facts, generate final output
- When it wins: large corpus + high-quality generation requirement
- Cost: Gemini extraction cost + Claude generation cost
Design the Integration
For your use case above, recommend Option A, B, or C and implement it:
- If Option A: Write the Gemini API call with Deep Research system prompt
- If Option B: Write the chunking logic + Claude synthesis chain
- If Option C: Write the two-stage pipeline with schema for Gemini's extraction output
Switching Logic
Build a model selector that chooses Gemini vs. Claude based on document size:
function selectModel(documentTokens: number, taskType: string): 'gemini-2.5-pro' | 'claude-sonnet-4-6' | 'claude-opus-4-6' {
if (documentTokens > 150_000 && taskType === 'extraction') return 'gemini-2.5-pro';
if (taskType === 'code_generation') return 'claude-sonnet-4-6';
if (taskType === 'complex_reasoning') return 'claude-opus-4-6';
return 'claude-sonnet-4-6'; // default
}
Customize the thresholds for your specific quality/cost trade-offs.
Deliverable
- Model selection recommendation (A/B/C) with rationale for your use case
- Cost comparison: current approach vs. recommended approach (monthly estimate)
- Implementation: API integration code for the chosen option
- Fallback strategy: what happens when one model's API is unavailable?
**When to use this:** When your app processes large document corpora and you want to evaluate whether Gemini 2.5 Pro's 2M context window offers a cost or quality advantage over Claude with chunking. The hybrid option often wins on cost while maintaining Claude's reasoning quality for generation.
**Expected output:** A model selection recommendation, cost comparison, and working integration code.
**Cross-link**: β [endofcoding.com: Gemini 2.5 Pro vs Claude Opus β when to use each](https://endofcoding.com/ebook/gemini-2-5-pro-vs-claude-sonnet-deep-research-2026) for benchmarks. β [Chapter 5: Tools](https://vibecodingebook.com/reader#ch5) for the full model landscape. β [vibecodingebook.com](https://vibecodingebook.com) for prompt library and AI integration patterns.
---
### Prompt 17.274: Agent Memory Architecture β Short-Term, Long-Term, and Episodic (Advanced)
**Tool**: Claude Code, claude-sonnet-4-6 | **Time**: 25-35 min | **Category**: Agent Architecture
*Triggered by: Rising demand for stateful AI agents after Google Gemini Spark (always-on, learns from behavior, May 2026) and Anthropic's agent credit metering (June 2026). Use when your agent needs to remember context across sessions, learn from past interactions, or avoid repeating the same work.*
You are an AI systems architect designing the memory layer for a production AI agent.
Agent Description
- What does this agent do? [brief description]
- How often does it run? [on-demand / scheduled / always-on]
- Who uses it? [single user / team / all users of a SaaS product]
- What should it remember between sessions?
Memory Taxonomy
Design three memory tiers:
Tier 1: Short-Term Memory (within a session)
- Duration: exists only during one agent run
- Storage: in-context (passed in system prompt or as tool results)
- Content: [what the agent needs to track within a single task β intermediate results, tool call history, current plan]
- Size constraint: must fit within context window ([N] tokens budget for memory)
- Implementation: [structured JSON object injected into system prompt | conversation history | scratchpad tool]
Tier 2: Long-Term Memory (persists across sessions)
- Duration: indefinite, with TTL or versioning
- Storage: [SQLite / Supabase / Redis / flat files in ~/.agent/memory/]
- Content: [user preferences, learned patterns, prior decisions, project context]
- Write policy: when does the agent write to long-term memory? (after every run / on explicit trigger / when confidence > threshold)
- Read policy: what does the agent load at session start? (all / recent N items / relevance-ranked via embedding search)
- Staleness handling: how do you detect and evict outdated memories?
Tier 3: Episodic Memory (structured event log)
- Duration: permanent audit trail
- Storage: [append-only database / structured log files]
- Schema:
{ "episode_id": "uuid", "timestamp": "ISO-8601", "trigger": "what caused this agent run", "actions_taken": ["list of tool calls with args"], "outcome": "success | failure | partial", "artifacts": ["file paths, URLs, or IDs of outputs"], "cost_usd": 0.0, "tokens_used": 0 } - Use cases: auditing, debugging, cost tracking, pattern learning
Memory Retrieval Design
When the agent starts a new session, what context does it load?
Relevance Scoring
Design the retrieval function that selects what to inject into the system prompt:
- Option A: Recency β load last N sessions (simple, may include irrelevant data)
- Option B: Keyword match β load episodes matching current task keywords
- Option C: Embedding search β embed the current task, retrieve semantically similar past episodes (requires vector store)
- Recommend the right option for this agent's scale and use case
Context Budget Management
The agent has [N] tokens for memory injection. Prioritize:
- [Highest priority memory type β e.g., user preferences]
- [Second priority β e.g., recent relevant episodes]
- [Third priority β e.g., long-term learned patterns] Truncate or summarize lower-priority items when budget is exceeded.
Forgetting Strategy
Not all memory should be retained forever:
- User preference updates: replace old preference with new (versioned)
- Project-specific memory: archive when project is marked complete
- Error patterns: keep for [N] days, then prune if error hasn't recurred
- PII handling: encrypt or exclude user-identifying data from long-term memory
Implementation Plan
Provide working code for:
- The memory write function (called at end of each session)
- The memory read function (called at session start)
- The context assembly function (builds system prompt from retrieved memory)
Deliverable
- Memory architecture diagram (3 tiers + retrieval flow)
- Storage schema for long-term and episodic memory
- Working TypeScript/Python memory module (read + write + retrieve)
- Cost estimate: how much storage and compute does this memory layer add per month?
**When to use this:** When building agents that need to improve over time, avoid repeating mistakes, or maintain context across user sessions. The three-tier model (short-term/long-term/episodic) maps directly to how the most capable agents (Gemini Spark, Claude Code background tasks) maintain state.
**Expected output:** Architecture diagram, storage schemas, and working memory module code.
**Cross-link**: β [Chapter 11: Agents](https://vibecodingebook.com/reader#ch11) for agent fundamentals. β [endofcoding.com: Building stateful AI agents](https://endofcoding.com/ebook/stateful-ai-agent-memory-architecture-2026) for implementation patterns. β [vibe-coding.academy](https://vibe-coding.academy) for hands-on agent memory labs.
---
### Prompt 17.275: Stack Overflow 2026 Survey β AI Tool Adoption Gap Analysis (Intermediate)
**Tool**: Claude Code | **Time**: 10-15 min | **Category**: Team & Process
*Triggered by: Stack Overflow 2026 Developer Survey revealing 83% of developers use AI tools daily β up from 62% in 2025. 47% report their company has no formal AI tool policy. Use to benchmark your team's AI adoption against the survey data and identify gaps.*
You are a DevEx consultant helping a development team benchmark their AI tool adoption against the Stack Overflow 2026 Developer Survey results.
Survey Baseline (2026 data)
- 83% of developers use AI coding tools daily (up from 62% in 2025)
- Top tools by daily active use: Claude Code (34%), GitHub Copilot (31%), Cursor (22%), Gemini Code Assist (9%)
- 47% report their company has no formal AI tool policy
- 61% say AI tools improved their productivity "significantly" or "dramatically"
- 38% of codebases now have >50% AI-generated code
- Top concern: "I can't tell which parts of the codebase AI wrote" (54%)
- Top skill gap: Prompt engineering and AI tool configuration (67% want more training)
Team Assessment
Current AI Tool Stack
List every AI tool your team uses:
| Tool | Role in workflow | Daily users | % of team | Use cases |
|---|---|---|---|---|
| [Claude Code] | [primary dev agent] | [N] | [%] | [code gen, review, debug] |
| [GitHub Copilot] | [inline completion] | [N] | [%] | [autocomplete] |
| [Other] | [...] | [...] | [...] | [...] |
Adoption Gap Analysis
Compare your team's adoption to the survey benchmarks:
| Metric | Survey Benchmark | Your Team | Gap | Priority |
|---|---|---|---|---|
| Daily AI tool usage | 83% | [%] | [+/-] | [H/M/L] |
| Formal AI policy exists | 53% | [yes/no] | β | [H/M/L] |
| AI-generated code > 50% | 38% | [%] | [+/-] | [H/M/L] |
| Prompt engineering training | 33% trained | [%] | [+/-] | [H/M/L] |
Productivity Impact Measurement
If 61% of developers report significant productivity gains, what's your team's actual measurement?
- How do you currently measure developer productivity? [velocity / cycle time / DORA metrics / none]
- What productivity change have you observed since adopting AI tools?
- Which workflows saw the largest gains? Which showed no improvement?
The "Invisible AI Code" Problem
54% of developers can't tell which code was AI-generated. Assess your team:
- Do you have a convention for marking AI-generated code? (comments, git commit tags, etc.)
- Do code reviews treat AI-generated code differently?
- If an AI-generated function has a bug, how do you identify it was AI-generated during incident response?
Action Plan
Based on the gap analysis, produce a 30-day AI adoption improvement plan:
- Week 1: [Quick wins β tool access, basic prompt training]
- Week 2: [Process changes β review practices, AI code tagging]
- Week 3: [Policy creation β formal AI tool policy draft]
- Week 4: [Measurement β baseline metrics for next survey cycle]
Deliverable
- Gap analysis table with prioritized actions
- AI tool policy template (< 1 page) if policy doesn't exist
- AI code traceability convention (commit message format, comment style)
- 30-day adoption improvement plan
**When to use this:** After reading the Stack Overflow 2026 survey results, or any time you want to benchmark your team's AI tool maturity against industry data. The 83% daily usage benchmark is now the baseline β teams below this are likely leaving productivity on the table.
**Expected output:** Gap analysis, draft AI tool policy, code traceability convention, and a 30-day improvement plan.
**Cross-link**: β [endofcoding.com: Stack Overflow 2026 AI Survey Analysis](https://endofcoding.com/ebook/stack-overflow-2026-developer-survey-ai-tools-analysis) for full survey breakdown. β [Chapter 14: Sustainable Workflows](https://vibecodingebook.com/reader#ch14) for team AI adoption frameworks. β [vibe-coding.academy](https://vibe-coding.academy) for hands-on prompt engineering training.
---
*Chapter 17 additions β May 19, 2026 | Prompts 17.267β17.275 (AI-Native Toolchain Readiness Audit, Always-On Autonomous Agent Design, Supply Chain Attack Surface Assessment, Deterministic Multi-Agent Pipeline Design with Conductor, Anthropic Stainless SDK Generation / MCP Server Scaffolding, Multi-Model Routing Strategy, Google I/O 2026 Gemini 2.5 Pro Deep Research Integration, Agent Memory Architecture, Stack Overflow 2026 Survey Gap Analysis) | 289+ prompts across 47 categories | Previous: May 17 (prompts 17.264β17.266 β Open-Weight Model Evaluation, Enterprise MCP Integration Design, AI Agent Credit Budget Calculator). Prompted by: Microsoft Conductor open-source release, Anthropic acquiring Stainless, Sakana AI RL Conductor, Google I/O 2026 (Gemini 2.5 Pro GA, Gemini Spark always-on agent), and Stack Overflow 2026 Developer Survey (83% daily AI use).*
---
## Category: May 2026 β Google I/O 2026 / Enterprise Rollout (Added May 20, 2026)
### 17.276 β Google Antigravity 2.0 Agent Platform Migration Audit
**Difficulty**: Advanced | **Tool**: Claude Code, Google Antigravity 2.0 | **Time**: 45-60 min | **Category**: Tool Migration / Platform
I'm evaluating a migration from [Cursor / Windsurf / VS Code + Copilot] to Google Antigravity 2.0 following its Google I/O 2026 public early-access launch.
My Current Environment
- Current IDE/agent: [tool name + version]
- Primary cloud: [Google Cloud / AWS / Azure / multi-cloud]
- Google services in use: [list: Firebase, BigQuery, Cloud Run, GKE, etc.]
- Team size: [solo / team of N]
- Monthly AI tool spend: $[amount]
Migration Evaluation Framework
Phase 1: Google Stack Fit Analysis
- List every Google Cloud service my project touches
- For each: does Antigravity 2.0 have native context integration? (BigQuery schema, Firebase rules, Cloud Run configs)
- Calculate the "Google stack score" β what % of my stack would benefit from native integration?
- If score < 40%: migration ROI is likely low β document why and stop here
Phase 2: Workflow Compatibility
- My top 5 daily workflows (describe each)
- For each: does Antigravity 2.0 support it natively? What's missing?
- Migration blockers: [custom extensions / plugins I depend on that don't exist in Antigravity]
Phase 3: Cost-Benefit Analysis
- Current monthly spend on [tool] + Claude/GPT API: $[amount]
- Antigravity 2.0 pricing for my usage profile (Workspace seats + agent credits)
- Break-even timeline for migration investment (setup time + learning curve)
Phase 4: Parallel Run Plan
- How to run Antigravity alongside my current IDE for 2 weeks without disrupting output
- Which project type to pilot first (new greenfield vs. existing codebase)
- Success metrics: [task completion time, error rate, context accuracy on Google services]
Decision Output
- Go / No-go recommendation with reasoning
- If go: 4-week migration plan with milestones
- If no-go: specific conditions that would change the answer
**When to use this:** When your team is predominantly Google Cloud / Firebase / BigQuery β the native context integration is Antigravity's primary value proposition. Not worth switching if your stack is AWS-native.
**Expected output:** Google stack fit score, cost-benefit analysis, go/no-go recommendation, and 4-week migration plan.
**Cross-link**: β [Google I/O 2026 Gemini 3.5 Pro announcement](https://endofcoding.com/ebook/google-io-2026-gemini-35-pro-antigravity-jules-ga) | β [Chapter 5: The Tool Landscape](https://vibecodingebook.com/reader#ch05) for full tool comparison.
---
### 17.277 β Enterprise Vibe Coding 30,000-Seat Rollout Playbook
**Difficulty**: Expert | **Tool**: Claude Code (Enterprise), GitHub Copilot Enterprise | **Time**: 2-4 hours | **Category**: Enterprise / Change Management
*PwC announced deployment of Claude Code to 30,000 staff in May 2026 β making it one of the largest enterprise AI coding rollouts in history. This prompt generates a structured playbook for large-scale enterprise vibe coding adoption.*
Generate a structured rollout playbook for deploying AI coding tools to [N] developers across our enterprise.
Organization Profile
- Total developers: [N]
- Tech stack diversity: [homogeneous / moderate / highly diverse]
- Current AI tool adoption: [0% / <20% ad-hoc / 20-50% departmental / 50%+ widespread]
- Compliance requirements: [SOC 2 / HIPAA / PCI / FedRAMP / none]
- Primary IDE: [VS Code / JetBrains / other]
- Code hosting: [GitHub Enterprise / GitLab / Bitbucket]
Tools Being Deployed
- Claude Code Enterprise
- GitHub Copilot Enterprise
- Cursor for Teams
- Other: [specify]
Rollout Plan Framework
Phase 1: Pilot (Weeks 1-4) β 50-100 developers
Goals:
- Identify champion teams (high motivation, manageable scope)
- Establish baseline metrics (PR cycle time, bug rate, developer NPS)
- Surface compliance blockers before wide rollout
- Build internal case studies
Deliverables:
- Champion team selection criteria and application
- Baseline metrics dashboard setup
- Compliance review checklist (code goes to [vendor] API β what data governance is needed?)
- Pilot success criteria (minimum bar to proceed to Phase 2)
Phase 2: Scaled Rollout (Weeks 5-12) β 20-30% of developers
Goals:
- Department-by-department enablement
- Internal training program (1-hour onboarding + prompt library)
- Help desk / Slack channel for friction removal
- Weekly office hours with champions
Deliverables:
- Department rollout schedule with owners
- Internal training curriculum outline
- Prompt library curated for our tech stack
- Metrics tracking: weekly report on adoption + productivity
Phase 3: Full Deployment (Weeks 13-20) β All developers
Goals:
- Remaining department onboarding
- Advanced patterns training (multi-agent, background tasks, code review agents)
- Policy formalization (AI code review requirements, security gates)
- ROI measurement and board-level reporting
Policy Requirements to Draft
- AI tool acceptable use policy (what can/can't be sent to the API)
- AI-generated code review policy (do PRs need human review? what % coverage?)
- Security scanning gate (SAST on all AI-generated PRs?)
- Data classification rules (can [CONFIDENTIAL] code go through external AI?)
ROI Metrics to Track
- PR cycle time: before vs. after adoption
- Bug escape rate (production bugs per 1000 lines)
- Developer satisfaction (NPS, monthly survey)
- Time-to-feature (sprint velocity change)
- AI tool cost vs. productivity gain (calculate cost per dev-day saved)
Output
- Full rollout timeline with milestones and owners
- Policy templates (acceptable use, code review, data classification)
- Training curriculum outline
- ROI tracking dashboard schema
- Change management communications (email templates for each phase announcement)
**When to use this:** When planning an enterprise AI coding deployment of 500+ developers. Adapt the 4-phase structure to your org size β a 5,000-person company might need 6 months; a 500-person company might compress to 8 weeks.
**Expected output:** Complete rollout playbook, policy templates, training curriculum, and ROI dashboard schema.
**Cross-link**: β [Chapter 15: The Business of Vibes](https://vibecodingebook.com/reader#ch15) for enterprise ROI frameworks. β [Chapter 14: Sustainable Workflows](https://vibecodingebook.com/reader#ch14) for team adoption patterns.
---
### 17.278 β Cursor Composer 2.5 vs Claude Code Cost Benchmark
**Difficulty**: Intermediate | **Tool**: Cursor Composer 2.5, Claude Code | **Time**: 20-30 min | **Category**: Cost Optimization / Tool Selection
Help me run a rigorous cost-performance benchmark between Cursor Composer 2.5 and Claude Code (Opus 4.7 / Sonnet 4.6) for my specific use cases.
Context
Cursor Composer 2.5 (launched May 18, 2026):
- Standard tier: $0.50/M input, $2.50/M output
- Fast tier: $3.00/M input, $15.00/M output
- SWE-Bench Multilingual: 79.8% (vs Opus 4.7's 80.5%)
- CursorBench v3.1: 63.2% (vs Opus 4.7's 61.6%)
- Based on Kimi K2.5 + 25Γ Cursor RL post-training
Claude Code pricing (as of May 2026):
- Claude Sonnet 4.6: $3/$15 per M tokens (standard API)
- Claude Opus 4.7: $15/$75 per M tokens (standard API)
- Pro plan: $20/month with included credits
- Max plan: $100/month with higher limits
My Use Case Profile
Describe my typical daily AI coding tasks:
- [Task type]: [frequency/day], [approximate context size in tokens]
- [Task type]: [frequency/day], [approximate context size in tokens]
- [Task type]: [frequency/day], [approximate context size in tokens]
Benchmark Tasks to Run
Task 1: Multi-file feature implementation
Prompt: "Add [feature] to [component], touching [N] files" Run on: Composer 2.5, Claude Sonnet 4.6, Claude Opus 4.7 Measure: Output quality (1-5), tokens used, cost, time
Task 2: Bug diagnosis in complex codebase
Prompt: "Find the root cause of [bug] in [module]" Run on: All three models Measure: Accuracy, tokens used, cost
Task 3: Code review (AI reviewing a PR diff)
Prompt: "[paste diff] β review for bugs, security issues, and improvements" Run on: All three models Measure: Insight quality, false positive rate, cost
Analysis Request
- Per-task cost comparison table (Composer 2.5 vs Sonnet 4.6 vs Opus 4.7)
- Quality delta: where does Composer 2.5 fall short vs Opus 4.7? Is the gap task-specific?
- Recommended routing: which model for which task type based on my results?
- Monthly cost projection at my usage levels for each model
- Break-even analysis: what quality delta is acceptable to justify the cost savings?
**When to use this:** After any major new coding AI release that claims cost parity with frontier models at lower price. The pattern repeats: new model releases match frontier benchmarks at 80-90% lower cost, creating a real optimization opportunity for high-volume tasks. This prompt gives you a rigorous framework for deciding whether the switch makes sense for your specific workflow, rather than adopting based on benchmark hype alone.
**Expected output:** Task routing matrix, cost model, benchmark plan, and a go/no-go recommendation.
**Cross-link**: β [endofcoding.com: Open-Weight Model Wave May 2026](https://endofcoding.com/ebook/open-weight-model-wave-may-2026-vibe-coders-guide) for the competitive model landscape. β [Chapter 18: Tool Comparison Matrix](https://vibecodingebook.com/reader#ch18) for the full 2026 tool comparison data. β [endofcoding.com: Anthropic Agent Credits June 2026](https://endofcoding.com/ebook/anthropic-agent-credits-june-2026-survival-guide) for cost management strategies.
---
*Chapter 17 additions β May 20, 2026 | Prompts 17.276β17.278 (Google Antigravity 2.0 Agent Platform Migration Audit, Enterprise Vibe Coding 30,000-Seat Rollout Playbook, Cursor Composer 2.5 vs Claude Code Cost Benchmark) | 292+ prompts across 47 categories | Previous: May 19 (prompts 17.270β17.275 β Conductor Multi-Agent Pipeline, Stainless SDK/MCP Scaffolding, Multi-Model Routing, Gemini 2.5 Pro Deep Research, Agent Memory Architecture, Stack Overflow 2026 Survey Gap Analysis). Prompted by: Google Antigravity 2.0 launch at I/O 2026, PwC deploying Claude Code to 30,000 staff, and Cursor Composer 2.5 release (Kimi K2.6, Opus 4.7-level at 90% lower cost).*
---
## Category: May 2026 β Agentic Platform & Cost Optimization (Added May 21, 2026)
### 17.279 β Agentic Platform Evaluation Framework
**Difficulty**: Intermediate | **Tool**: Claude Code, Cursor, Antigravity 2.0 | **Time**: 20-30 min | **Category**: Tool Selection
I'm evaluating [PLATFORM_NAME] as my primary agentic coding environment.
My Current Stack
- Primary language: [language]
- Frameworks: [list]
- Repo size: [small < 10K LOC / medium 10-100K / large > 100K]
- Team size: [solo / small team / enterprise]
- Monthly AI spend budget: [$ amount]
What I Need to Test
Test 1: Codebase Understanding
Run: "Explain the architecture of this repo and identify the top 3 potential improvements" Evaluate: Accuracy, context depth, time to respond
Test 2: Multi-File Refactor
Run: "Refactor [COMPONENT] to use [PATTERN] β touch all affected files" Evaluate: Correctness, files missed, human review required
Test 3: Bug Hunting
Run: "Find potential race conditions or memory leaks in [MODULE]" Evaluate: False positives, real finds, explanation quality
Test 4: PR Review Quality
Run: "Review this PR diff and suggest improvements" Evaluate: Insight depth, actionability, noise ratio
Scoring Matrix
For each test, score 1-5 on:
- Accuracy (did it get it right?)
- Context awareness (did it understand the codebase?)
- Speed (was it fast enough for interactive use?)
- Cost (tokens used per task)
Output
Generate a comparison table with my scores and a final recommendation with ROI calculation.
**When to use this:** When evaluating whether to switch or add a new agentic platform (Claude Code, Cursor Composer 2.5, Google Antigravity 2.0, etc.). Replaces gut-feel switching with structured benchmarking against your actual codebase.
**Expected output:** Scoring matrix, comparison table, and ROI-based platform recommendation.
---
### 17.280 β Cost-Optimized Multi-Model Routing
**Difficulty**: Advanced | **Tool**: Claude Code, Cursor Composer 2.5, Kimi K2.6 | **Time**: 45-60 min | **Category**: Cost Optimization
Help me design a cost-optimized AI coding workflow that routes tasks to the appropriate model based on complexity and cost.
My Task Categories
- Simple completions: Autocomplete, boilerplate, simple refactors
- Medium tasks: Feature implementation, bug fixes, code review
- Complex tasks: Architecture decisions, multi-file refactors, new system design
- Critical tasks: Security review, performance optimization, production debugging
Available Models (May 2026 pricing)
- Cursor Composer 2.5: $0.50/$2.50 per M tokens (high quality, low cost)
- Claude Sonnet 4.6: [current pricing] per M tokens (strong balance)
- Claude Opus 4.7: [current pricing] per M tokens (highest quality)
- Kimi K2.6 (open-source): hosting cost only (frontier-near quality)
Routing Logic I Want
For each task category, recommend:
- Primary model (best cost-performance)
- Escalation trigger (when to upgrade to more expensive model)
- Estimated cost per 8-hour dev day
Output Format
Create a decision flowchart and calculate my expected monthly AI spend reduction vs using only Claude Opus 4.7 for everything.
My Current Usage Pattern
- completions/day, [Y] medium tasks/week, [Z] complex tasks/week
**When to use this:** After the Anthropic June 15 agent credit metering change β any team paying for AI-heavy workflows needs a model routing strategy. Also relevant when onboarding Cursor Composer 2.5 or any cost-effective open-weight alternative.
**Expected output:** Model routing decision flowchart, per-task cost breakdown, and monthly spend comparison vs single-model approach.
---
### 17.281 β Claude Code Routines for Automated Repository Health
**Difficulty**: Advanced | **Tool**: Claude Code (Routines) | **Time**: 30-45 min | **Category**: Automation
I want to set up Claude Code Routines to automate my repository health monitoring. Routines run on Anthropic's cloud infrastructure on a schedule or GitHub event β no local machine required.
Routines I Want to Create
Routine 1: Daily PR Triage (Schedule: 9am weekdays)
Goal: Every morning, a summary of all open PRs with:
- Estimated review complexity (easy / medium / hard)
- Key risks flagged (security, breaking changes, test coverage)
- Suggested priority order for my review
- PRs open > 3 days (escalation needed)
Routine 2: Weekly Test Coverage Audit (Schedule: Monday 8am)
Goal: Every Monday, assess test coverage health:
- Files with < 60% coverage
- New files added in the last 7 days with no tests
- Most critical untested code paths
- Suggested test generation priority
Routine 3: Security Scan on Push to Main (Trigger: GitHub push event)
Goal: Every main branch push triggers a security sweep:
- OWASP Top 10 patterns scan
- New dependencies added (check for known CVEs)
- Secrets or credentials accidentally committed
- Alert on any HIGH or CRITICAL findings immediately
Setup Steps
- Open Claude Code β Settings β Routines
- Create each Routine with the prompt, repo connection, and schedule
- Test with a dry run
- Connect GitHub for event-driven triggers
What to Output
For each Routine, generate the exact prompt I should paste into the Routines UI, the schedule expression, and the notification format.
**When to use this:** After setting up Claude Code Routines β always-on background agents that run on Anthropic's cloud with no infrastructure to maintain.
**Expected output:** Three ready-to-paste Routine prompts with schedule expressions and notification formats.
**Cross-link**: β [Claude Code Routines Guide](https://endofcoding.com/ebook/claude-code-routines-automated-dev-workflows-2026) | β [Karpathy joins Anthropic β pre-training context](https://endofcoding.com/ebook/karpathy-joins-anthropic-what-it-means-for-ai-coding-2026)
---
*Chapter 17 additions β May 21, 2026 | Prompts 17.279β17.281 (Agentic Platform Evaluation Framework, Cost-Optimized Multi-Model Routing, Claude Code Routines Repository Health) | 295+ prompts across 47 categories | Prompted by: Anthropic June 15 agent credit metering, Karpathy joining Anthropic pre-training team, and multi-model routing demand from Cursor Composer 2.5 / Kimi K2.6 open-source parity.*
---
## Category: May 2026 β Security Trilogy (Added May 24, 2026)
### 17.282 β Sandbox Security Audit for AI Code Execution
**Difficulty**: Advanced | **Tool**: Claude Code, any LLM | **Time**: 20-30 min | **Category**: Security
I'm using [sandboxjs / vm2 / isolated-vm / vm.runInNewContext / other] to execute AI-generated or user-submitted code safely. Audit my sandbox configuration for escape vulnerabilities.
My Current Setup
- Sandbox library: [library name + version]
- Node.js version: [version]
- What I'm sandboxing: [AI-generated scripts / user code / eval previews]
- Entry point code: [paste the wrapper code where you call the sandbox]
What I Want Audited
1. Prototype Chain Attacks
- Can sandbox code access proto on context objects?
- Are Object.prototype, Function.prototype accessible from inside the sandbox?
- Is there a path from sandbox context β host Function constructor?
2. Module Import Attacks
- Can require() or dynamic import() be called inside the sandbox?
- Are fs, child_process, net accessible directly or via creative chaining?
3. Timing and Resource Attacks
- Is there a CPU/memory timeout enforced?
- Can sandbox code spin up infinite loops that exhaust the host process?
4. Information Disclosure
- Can sandbox code read process.env from the host?
- Can it access __dirname, __filename of the host module?
Known CVEs to Check Against
- CVE-2026-25881: SandboxJS prototype chain escape (CVSS 10.0) β patched in 4.3.1
- vm2: Multiple escapes (CVE-2023-32314, CVE-2023-37466) β vm2 is DEPRECATED, migrate away
- isolated-vm: Check for latest advisories
Output I Want
- List of vulnerabilities found (severity, CVE if applicable, proof-of-concept pattern)
- For each: specific code fix or configuration change
- A safe wrapper function I can use instead of my current implementation
- A test file with 10 escape attempt patterns I should be blocking
**When to use this:** Before deploying any system that executes AI-generated code in a sandbox, or immediately after CVE-2026-25881 disclosure if you're on SandboxJS < 4.3.1.
**Expected output:** Vulnerability report, fixed wrapper implementation, and a test suite for escape attempts.
**Cross-link**: β [SandboxJS Escape + Veracode 45% Data](https://endofcoding.com/ebook/sandboxjs-escape-ai-code-security-veracode-2026) | β [Chapter 10: The Dark Side of Vibe Coding](https://vibecodingebook.com/chapter-10-dark-side)
---
### 17.283 β SAST Integration for AI-Assisted Pull Requests
**Difficulty**: Intermediate | **Tool**: Claude Code, GitHub Actions | **Time**: 45-60 min | **Category**: Security / DevOps
I want to add static analysis (SAST) to my CI pipeline so every AI-generated pull request is scanned for security vulnerabilities before merge.
My Stack
- Language(s): [TypeScript / Python / Go / etc.]
- Framework: [Next.js / FastAPI / etc.]
- CI: [GitHub Actions / GitLab CI / etc.]
- Repo: [public / private]
SAST Tools I'm Considering
- Semgrep (open-source rules + community rulesets)
- CodeQL (GitHub native, free for public repos)
- CyberOS (specialized for AI-generated code patterns)
- Snyk Code (dependency + code combined)
- Bandit (Python-only)
What I Need Generated
1. GitHub Actions Workflow
Create a .github/workflows/sast.yml that:
- Runs on every pull_request to main/master
- Scans for OWASP Top 10 patterns relevant to my stack
- Blocks merge if HIGH or CRITICAL findings exist
- Posts a summary comment on the PR with findings
- Runs in under 3 minutes (so it doesn't slow down developer workflow)
2. Custom Semgrep Rules
Write 5 custom Semgrep rules for [my framework] that catch the most common vulnerabilities in AI-generated code:
- SQL injection patterns (string concatenation in queries)
- Command injection (shell=True, exec with user input)
- Prototype pollution (proto assignment)
- Hardcoded secrets (API keys, passwords in source)
- Insecure deserialization (pickle.loads, JSON.parse on untrusted input)
3. PR Comment Template
Generate a GitHub Actions step that posts a security summary comment:
- Critical findings (block merge)
- Warnings (require acknowledgment)
- Informational (log only)
- Link to fix documentation for each finding type
False Positive Budget
I can tolerate: [none / < 5% / < 10%] false positive rate. Tune the rules accordingly.
**When to use this:** When setting up a new repo that will use AI coding tools heavily, or after seeing the Veracode stat that 45% of AI-generated PRs contain OWASP Top 10 vulnerabilities.
**Expected output:** Complete GitHub Actions SAST workflow, custom Semgrep rules, and PR comment template β ready to commit.
**Cross-link**: β [Veracode + SandboxJS article](https://endofcoding.com/ebook/sandboxjs-escape-ai-code-security-veracode-2026) | β [CyberOS SAST scanner](https://cyberos.dev)
---
### 17.284 β Supply Chain Dependency Audit After a Compromise Wave
**Difficulty**: Intermediate | **Tool**: Claude Code | **Time**: 30-45 min | **Category**: Security / Dependencies
A supply chain attack wave has just been disclosed (e.g., the May 2026 Megalodon npm worm affecting 170+ packages). Help me audit my project's dependency tree for exposure and harden my lockfile practices.
My Project
- Package manager: [npm / yarn / pnpm / pip / go mod]
- package.json / requirements.txt: [paste or describe key dependencies]
- Known compromised packages in this wave: [list if known, e.g., @tanstack/react-query < 5.55.0]
Audit Steps I Need
Step 1: Identify Exposed Dependencies
For each compromised package in the wave, tell me:
- Am I using it? What version?
- Is my version affected?
- What's the safe version to upgrade to?
Step 2: Check Transitive Dependencies
AI-generated code often pulls in indirect dependencies I don't know about. Run a full transitive dependency scan and show any indirect exposure paths.
Step 3: Lockfile Integrity Verification
- Verify my package-lock.json / yarn.lock hashes match the registry
- Check for any packages where the installed hash doesn't match the lockfile
- Flag any packages added in the last 7 days that aren't in the original lockfile
Step 4: Harden for the Future
Generate:
- A
.npmrcconfiguration that pins registry to npm official, blocks lifecycle scripts from unsigned packages - A
package.jsonscripts.preinstallhook that rejects packages not in an allowlist - A GitHub Actions step for
npm audit --audit-level=highon every PR - Dependabot config that auto-patches CRITICAL vulnerabilities within 24h
Output Format
- Table: Package | My version | Affected? | Safe version | Action required
- Hardened config files ready to commit
- Shell commands to run right now for immediate remediation
**When to use this:** Immediately after a supply chain compromise is announced, or as a quarterly dependency hygiene routine.
**Expected output:** Exposure analysis table, hardened configuration files, and immediate remediation commands.
**Cross-link**: β [TanStack/Mistral Shai-Hulud attack breakdown](https://endofcoding.com/ebook/tanstack-mistral-supply-chain-shai-hulud-2026) | β [Supply chain security chapter](https://vibecodingebook.com/chapter-10-dark-side)
---
*Chapter 17 additions β May 24, 2026 | Prompts 17.282β17.284 (Sandbox Security Audit, SAST Integration for AI PRs, Supply Chain Dependency Audit) | 298+ prompts across 47 categories | Prompted by: CVE-2026-25881 SandboxJS escape (CVSS 10.0), Veracode research showing 45% of AI-generated code has OWASP Top 10 vulnerabilities, and the Megalodon npm worm expanding to 170+ packages.*
---
## Category: May 2026 β Orchestration & Platform (Added May 24, 2026)
### 17.285 β Microsoft Conductor Multi-Agent Orchestration Design
**Difficulty**: Expert | **Tool**: Claude Code, Microsoft Conductor | **Time**: 60-90 min | **Category**: Multi-Agent / Enterprise
*Microsoft open-sourced Conductor in May 2026 β a multi-agent orchestration framework that routes tasks to specialized sub-agents, manages state across agent boundaries, and enforces deterministic execution order. This prompt designs a Conductor-based multi-agent pipeline for your codebase.*
Design a Microsoft Conductor multi-agent orchestration pipeline for my development workflow.
My Current Workflow (that I want to automate)
Describe the end-to-end process:
- [Step 1]: [what happens, who does it, how long it takes]
- [Step 2]: [next step]
- [Step N]: [final step]
Example: "A new feature request comes in (Jira ticket) β developer implements it β PR created β code review β security scan β QA β merge β deploy to staging β smoke test"
Agent Roster I Want
Agent 1: Intake Agent
Role: Parse incoming requests (Jira, GitHub Issues, Slack) and create structured task specs Tools available: Jira API, GitHub API, Slack webhook reader Input: Raw request text or ticket ID Output: Structured JSON task spec {title, acceptance_criteria, affected_files, priority}
Agent 2: Implementation Agent
Role: Generate code changes from task spec Tools available: Claude Code (file read/write/bash), repo context Input: Structured task spec Output: Code diff + PR draft
Agent 3: Security Review Agent
Role: Scan every PR for OWASP Top 10 patterns before human review Tools available: Semgrep, custom rules, CVE database lookup Input: PR diff Output: Security report {critical_findings, warnings, pass/fail}
Agent 4: QA Agent
Role: Generate and run tests for the PR's changed files Tools available: Test runner (Jest/pytest), code coverage tool Input: PR diff + existing test suite Output: Test results + coverage delta
Agent 5: Deployment Agent
Role: Merge approved PRs and trigger deployment pipeline Tools available: GitHub merge API, CI/CD webhook, monitoring alert check Input: Approved PR + all agent reports Output: Deployment status + rollback instructions if needed
Conductor Configuration
Orchestration Rules
- Sequential gates: [Security Review] must PASS before [QA Agent] starts
- Parallel execution: [Security Review] and [QA Agent] can run simultaneously once [Implementation Agent] completes
- Human-in-the-loop gate: After [QA Agent] completes, require human approval before [Deployment Agent]
- Failure handling: If any agent returns FAIL, halt pipeline and notify [Slack channel]
State Management
- Pipeline state stored in: [Redis / Postgres / Conductor's built-in state store]
- Checkpoint strategy: Save state after each agent completes (enable resume on failure)
- Retry policy: [N] retries with [exponential backoff / fixed delay] for transient failures
Output I Want
- Conductor YAML/JSON pipeline configuration file
- Agent prompt template for each of the 5 agents above
- State schema (what data passes between agents)
- Human approval workflow (how the gate is presented and approved)
- Monitoring dashboard spec (what metrics to track per agent)
**When to use this:** When you're ready to move beyond single-agent automation to coordinated multi-agent pipelines. Conductor's key advantage over custom orchestration: deterministic execution order, built-in state persistence, and native human-in-the-loop gates β the three things that break most DIY multi-agent systems.
**Expected output:** Conductor pipeline configuration, agent prompt templates, state schema, and monitoring dashboard spec.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for multi-agent architecture context. β [Prompt 17.271](https://vibecodingebook.com/reader#ch17) for Conductor-based deterministic pipeline design. β [endofcoding.com: Microsoft Conductor vs LangChain 2026](https://endofcoding.com/ebook/microsoft-conductor-vs-langchain-multi-agent-2026)
---
### 17.286 β GitHub Copilot June 1 Billing Migration Audit
**Difficulty**: Intermediate | **Tool**: GitHub Copilot, Claude Code | **Time**: 20-30 min | **Category**: Cost Optimization / DevOps
*GitHub Copilot switches to usage-based billing on June 1, 2026. This prompt audits your current Copilot usage and generates a cost optimization plan before the first metered billing cycle.*
Audit my GitHub Copilot usage before the June 1, 2026 usage-based billing switch and generate a cost optimization plan.
My Current Plan & Usage
- Copilot plan: [Individual $10/mo / Pro $10/mo / Pro+ $39/mo / Business $19/seat / Enterprise $39/seat]
- Monthly active users: [N]
- Primary use cases: [code completions / chat / CLI / code review / cloud agent / Spaces]
- Current monthly spend: $[amount]
New Billing Structure (June 1, 2026)
1 AI credit = $0.01
- Code completions: UNLIMITED (no credits consumed) β safe
- Next edit suggestions: UNLIMITED (no credits consumed) β safe
- Chat (Claude Sonnet 4.6, GPT-5.5, Gemini 3.5 Pro): [credits per message β varies by model]
- CLI usage: [credits per query]
- Cloud agents (PR review, issue triage, background tasks): [credits per task]
- Spaces (persistent agent sessions): [credits per minute of active session]
- Third-party agents: [credits per agent invocation]
Included credits per plan:
- Pro: $10 + $5 flex = $15 included
- Pro+: $39 + $31 flex = $70 included
- Business: $19/seat/mo
- Enterprise: $39/seat/mo
What I Need Audited
Step 1: Current Usage Inventory
- List all Copilot features I use (beyond code completions)
- Estimate frequency: daily / weekly / monthly
- Flag any GitHub Actions workflows that invoke Copilot agents (these WILL consume credits)
Step 2: Credit Consumption Estimate
For each non-completion use:
- Estimate monthly credit consumption at current usage levels
- Compare against included credits for my plan
- Flag if I'm likely to exceed included credits (overage risk)
Step 3: Optimization Recommendations
For each high-consumption use:
- Can it be replaced by code completions (unlimited)?
- Can the frequency be reduced without losing productivity?
- Is there a cheaper alternative (Claude Code API direct, open-source tool)?
- Should I upgrade/downgrade plans based on projected spend?
Step 4: GitHub Actions Audit
- List all .github/workflows/*.yml files that mention
github/copilot-cli,@github/copilot, oractions/ai - For each: does this run on every PR? Every push? On a schedule?
- Calculate credit consumption per run Γ frequency
- Flag workflows consuming > 10 credits/run as high-priority for optimization
Output
- Credit consumption forecast (current usage β projected monthly bill)
- Optimization actions ranked by savings potential
- Actions audit with credit consumption per workflow
- Plan recommendation: stay / upgrade / downgrade
- Calendar reminder: run this audit again June 15 (first real bill arrives)
**When to use this:** Before June 1, 2026 β the first Copilot usage-based billing cycle. Teams with heavy Copilot chat, cloud agents, or Spaces usage may see significantly higher bills. Run this now to avoid surprise charges.
**Expected output:** Credit consumption forecast, optimization action plan, GitHub Actions audit, and plan recommendation.
**Cross-link**: β [Chapter 18: Tool Comparison Matrix](https://vibecodingebook.com/reader#ch18) for full Copilot vs. Claude Code vs. Cursor cost comparison. β [Prompt 17.261](https://vibecodingebook.com/reader#ch17) for broader AI coding tool token budget audit.
---
### 17.287 β Apple iOS 27 AI Feature Integration Blueprint
**Difficulty**: Advanced | **Tool**: Claude Code, Xcode 18 | **Time**: 45-60 min | **Category**: Mobile / AI
*Apple announced iOS 27 with expanded on-device AI capabilities, new AI-native API slots in Spring 2026. This prompt designs an AI feature integration plan for iOS apps built with vibe coding workflows.*
Design an AI feature integration blueprint for my iOS app targeting iOS 27's new on-device AI capabilities announced for Fall 2026.
My App Profile
- App category: [productivity / health / education / entertainment / utility / other]
- Current iOS support: iOS [N]+
- Existing AI features: [none / basic text analysis / image processing / other]
- Backend: [serverless / Node.js / Python / none]
- Primary user persona: [describe your core user]
iOS 27 AI Capability Assessment
On-Device Foundation Model (iOS 27)
- Apple Intelligence expanded APIs: text generation, summarization, smart actions
- Privacy guarantee: processes on-device for all Foundation Model requests (not sent to Apple servers)
- Context window: ~4K tokens (on-device); ~32K tokens (Private Cloud Compute escalation)
- Latency: <100ms for simple completions on M4 Bionic or later
Writing Tools Integration
- Rewrite, proofread, and summarize available system-wide
- Apps can hook into Writing Tools via UITextView + WritingToolsCoordinator
- Custom Writing Tools actions: register app-specific transformations
Visual Intelligence Integration
- Image-to-text: describe, extract, and act on visual content
- App Intent integration: "Hey Siri, use [MyApp] to identify [object] in this photo"
- Real-time camera analysis via Vision framework + Core ML pipeline
Siri App Intents (iOS 27 expanded)
- Siri can now navigate multi-step in-app workflows via App Intents
- Deep Links + App Intent shortcuts enable agent-driven navigation
- New: Siri can fill forms, submit actions, and retrieve app-specific data
Feature Ideas to Evaluate
For my app category, suggest 5-7 AI features using iOS 27 APIs, ranked by:
- User value (how much does this improve the core experience?)
- Implementation complexity (1 = simple API call, 5 = custom ML pipeline)
- Differentiation (1 = any app can do this, 5 = unique to my category)
- Privacy alignment (does this work entirely on-device?)
For each feature:
- iOS 27 API to use
- Implementation approach (vibe coding prompt to generate the feature)
- Estimated dev time
- User story: "As a [persona], I can [action] so that [outcome]"
Implementation Roadmap
Phase 1: Quick wins (1-2 weeks)
- Features using existing iOS 27 APIs with no custom ML
- Integrate Writing Tools for text-heavy workflows
- Add App Intent for most common user action
Phase 2: Core AI features (3-6 weeks)
- Foundation Model integration for [primary use case]
- Visual Intelligence if relevant to app category
- Siri multi-step workflow for power users
Phase 3: Differentiated AI (6-12 weeks)
- Custom Core ML model for [domain-specific capability]
- Private Cloud Compute escalation for complex tasks
- On-device fine-tuning if applicable (iOS 27 API preview)
Vibe Coding Workflow for iOS AI Features
For each feature, generate the Claude Code prompt I should use to implement it: "Build [feature] using [iOS 27 API]. The feature should [behavior]. Handle [edge case]. The UI should [description]. Use Swift concurrency."
Output
- Ranked feature list with implementation approach for each
- iOS 27 API map: which APIs I need, complexity, availability
- Phased roadmap with milestones
- Privacy architecture: what stays on-device vs. escalates to PCC
- App Store optimization: how to feature AI capabilities in metadata
**When to use this:** When planning iOS 27 features for your app (announced Spring 2026, shipping Fall 2026). The on-device privacy model is a genuine differentiator over cloud-AI competitors β worth investing in for apps where user trust is central.
**Expected output:** Ranked AI feature list, iOS 27 API map, phased roadmap, privacy architecture, and App Store optimization copy.
**Cross-link**: β [Chapter 5: The Tool Landscape](https://vibecodingebook.com/reader#ch05) for mobile vibe coding tools. β [Chapter 13: Advanced Techniques](https://vibecodingebook.com/reader#ch13) for platform-specific AI integration patterns. β [endofcoding.com: Apple iOS 27 AI Slots for Developers](https://endofcoding.com/ebook/apple-ios-27-ai-slots-developer-guide-2026)
---
---
### 17.288 β Cross-Session Agent Memory Setup
**Category:** Agent Architecture | **Level:** Intermediate | **Tool:** Claude Code
Set up Claude Code persistent memory and dreaming-architecture patterns so your agent sessions build on each other rather than starting cold.
I want to configure Claude Code's persistent memory for [project name] so my agent sessions build on each other rather than starting cold each time.
Project context:
- Type: [web app / API / data pipeline / other]
- Primary workflows: [list 3-5 recurring tasks you do with Claude Code]
- Team size: [solo / 2-5 / 5+]
- Repository: [monorepo / polyrepo / description]
Memory Architecture Setup
1. CLAUDE.md Memory Slots
Design the persistent memory sections for my CLAUDE.md:
Project DNA (never changes):
- Architecture decisions and their rationale
- Non-obvious conventions (e.g., "we use X because Y happened")
- Known landmines: files/patterns to avoid or approach carefully
Living Knowledge (updates as we learn):
- Patterns that worked well (with context: when/why they worked)
- Patterns that failed (with post-mortem: root cause)
- Current technical debt map (what's fragile, what needs care)
Session Handoff (updated at end of each major session):
- What was accomplished
- What was abandoned and why
- Open questions for next session
- Recommended first action next session
2. Dreaming Protocol
At the end of each session, generate a memory consolidation block:
Session [date] Memory Update
Lessons Learned
- [What worked]: [context] β [apply when: condition]
- [What failed]: [root cause] β [avoid when: condition]
Architecture Decisions Made
- Decision: [what]
- Why: [rationale]
- Reversibility: [easy / hard / irreversible]
Updated Technical Debt
- Added: [new fragile thing]
- Resolved: [fixed thing]
- Priority shift: [what moved up/down]
3. Cross-Session Improvement Metrics
Track these across sessions to measure memory ROI:
- First-attempt success rate on recurring task types
- Number of times I had to re-explain the same context
- Sessions where memory surfaced a critical warning before I made a mistake
4. Memory Hygiene Rules
- Entries older than 90 days without a reference: archive or delete
- Contradictory entries: resolve explicitly, document which supersedes
Output
- Complete CLAUDE.md memory structure for my project
- Session-end dreaming template to run after each major session
- Memory validation checklist: how to verify memory is helping, not accumulating noise
- Team memory sync protocol (if applicable)
**When to use this:** When you want Claude Code sessions to compound in value. Anthropic's dreaming architecture (cross-session memory consolidation, demonstrated 6Γ task completion improvement at Harvey AI) is available today via persistent project memory in Claude Code 3.0+.
**Expected output:** Structured CLAUDE.md memory layout, session-end consolidation template, memory hygiene rules.
**Cross-link**: β [Chapter 6: The Agent Revolution](https://vibecodingebook.com/reader#ch06) for Anthropic's dreaming system. β [Chapter 13: Advanced Techniques](https://vibecodingebook.com/reader#ch13) for advanced CLAUDE.md patterns. β [endofcoding.com: Claude Code Dreaming β Cross-Session Memory That Compounds](https://endofcoding.com/ebook/claude-code-dreaming-cross-session-memory-2026)
---
### 17.289 β Self-Hosted Model Evaluation Framework
**Category:** Open-Weight Models | **Level:** Advanced | **Tool:** Ollama / LM Studio
Systematically evaluate whether a self-hosted open-weight model can replace a cloud API for a specific workflow, with cost, quality, and latency benchmarks.
I want to evaluate whether I can replace [cloud API: Claude / OpenAI / Gemini] for [specific workflow: code review / test generation / documentation / other] with a self-hosted open-weight model to reduce API costs.
My setup:
- Hardware: [M3 Max / RTX 4090 / A100 / cloud GPU / other]
- RAM available: [GB]
- Use case volume: [requests/day approximate]
- Current monthly API cost: [$amount]
- Quality bar: [what does "good enough" look like for this workflow?]
Evaluation Framework
Phase 1: Model Selection
Given my hardware constraints, recommend the top 3 candidate models for my workflow:
| Model | Parameters | Quantization | VRAM Required | SWE-Bench Score | License |
|---|
Include from recent releases:
- Kimi K2.6 (Apache 2.0, strong coding, 54 composite intelligence score)
- DeepSeek V4 (MIT, 1M context, leads agentic tasks)
- GLM-5.1 (MIT, 8-hour long-horizon, SWE-Bench Pro leader, cleanest license)
- Qwen 3 variants (Apache 2.0)
- Phi-4 variants (MIT, smaller hardware targets)
Phase 2: Benchmark Design
Create a test suite with 20 representative tasks:
- 5 easy (should always pass)
- 10 medium (quality discriminator)
- 5 hard (ceiling test)
For each task, define:
- Input prompt
- Gold standard output (or evaluation rubric)
- Pass/fail criteria
Phase 3: Quality Scoring
Run each candidate model on the test suite:
- Accuracy score (0β100) on benchmark suite
- Latency: median, p95, p99
- Context window coverage: does it handle my largest inputs?
- Consistency: variance across 3 runs of the same prompt
Phase 4: Cost-Quality Analysis
Calculate:
- Cloud API cost vs. self-hosted (electricity + amortized hardware)
- Break-even volume: at what request volume does self-hosted pay off?
- Hybrid routing: which tasks go self-hosted vs. cloud?
Phase 5: Production Setup
- Ollama setup and model serving configuration
- Fallback chain: self-hosted fails β cloud API (with cost guard)
- Model version pinning for reproducibility
- Latency and quality drift monitoring
Output
- Top 3 model recommendations for my hardware + workflow
- 20-task benchmark suite with pass/fail criteria
- Cost model: monthly savings at my volume
- Ollama production config for chosen model
- Hybrid routing decision tree
**When to use this:** When cloud API agent credit metering makes costs unsustainable for high-volume workflows. Open-weight models (Kimi K2.6, DeepSeek V4, GLM-5.1) now beat GPT-5.5 and Claude Opus 4.6 on SWE-Bench Pro β frontier parity at self-hosted cost.
**Expected output:** Model comparison table, benchmark suite, cost analysis, Ollama production configuration.
**Cross-link**: β [Chapter 5: The Tool Landscape](https://vibecodingebook.com/reader#ch05) for open-weight model overview. β [Chapter 18: Tool Comparison Matrix](https://vibecodingebook.com/reader#ch18) for updated model rows. β [endofcoding.com: Self-Hosted AI at Frontier Parity β 2026 Evaluation Guide](https://endofcoding.com/ebook/self-hosted-ai-frontier-parity-evaluation-2026)
---
### 17.290 β AI Security Hardening Audit
**Category:** Security | **Level:** Advanced | **Tool:** Claude Code
Comprehensive audit of your AI API key hygiene, IAM configuration, billing protection, and secret scanning β before an unauthorized $40K API bill finds you first.
Conduct a comprehensive AI security hardening audit for my project. Focus on API key exposure, IAM misconfigurations, billing risk, and secret scanning gaps.
Project context:
- Cloud providers: [Vercel / AWS / GCP / Azure / Railway / Fly / other]
- AI APIs in use: [Anthropic / OpenAI / Google Gemini / Cohere / Mistral / other]
- Repository: [public / private] on [GitHub / GitLab / Bitbucket]
- Team size: [solo / small / large]
- CI/CD: [GitHub Actions / CircleCI / GitLab CI / other]
Audit Checklist
1. API Key Exposure Scan
Scan these locations for exposed credentials:
Git history (run locally):
# Scan git history for AI API key patterns
git grep -i "sk-ant\|sk-proj\|AIza\|OPENAI_API\|ANTHROPIC" $(git rev-list --all) 2>/dev/null
git log --all --full-history -- "*.env*" | head -20
File system:
- All .env* files (are any tracked in git?)
- Hardcoded keys in source files (not environment variables)
- CI/CD configuration files (secrets accidentally inlined)
- Dockerfiles and docker-compose.yml
- Logs and error dumps (keys sometimes appear in stack traces)
2. IAM and Key Scope Audit
For each AI API:
- Is the key scoped to minimum required permissions?
- Separate key per environment (dev / staging / prod)?
- Rotation schedule defined?
- Production keys in a secrets manager (1Password, AWS Secrets Manager, Doppler)?
3. Billing Protection Setup
For each provider, confirm:
Google Cloud / Gemini:
- Budget alert at 20% of expected monthly spend
- Hard cap enabled (stops API calls at budget limit)
- Billing anomaly detection active
Anthropic / Claude:
- Spend limit configured in Console
- Usage alerts at 80% threshold
OpenAI:
- Hard limit set (not soft limit only)
- Alerts at 50% and 90%
4. Secret Scanning Configuration
GitHub:
- Secret scanning enabled (Settings β Security β Secret scanning)
- Push protection enabled (blocks commits with secrets)
- Custom patterns for Anthropic (sk-ant-), OpenAI (sk-proj-), Google (AIza)
CI/CD:
- All AI API keys stored as CI/CD secrets, not inlined
- Secrets not printed in logs
- Separate secrets per environment
5. Runtime Key Protection
- No API keys in client-side JavaScript bundles (check NEXT_PUBLIC_ usage)
- No API keys in error messages returned to users
- No API keys in application logs
- Rate limiting on your own API proxy routes
6. Incident Response Runcard
If a key is compromised (you have ~22 seconds):
- Revoke immediately: [provider key management URL]
- Check unauthorized usage in provider dashboard
- Set hard billing cap to $0 temporarily
- File billing dispute with provider support
- Rotate key, update all environments, redeploy
- Post-mortem: document how the key escaped
Output
- Exposure scan results: findings by severity (Critical / High / Medium)
- Remediation steps for each finding with estimated effort
- Billing protection status: configured / missing for each provider
- Secret scanning status: enabled / disabled across repositories
- Key rotation schedule
- Incident response runcard (one page)
**When to use this:** Before any production launch and quarterly thereafter. Breach-to-attack time is now 22 seconds (down from 8 hours in 2025) β your AI API keys need automated protection, not manual vigilance. Google Cloud developers are receiving $40K+ unauthorized invoices from exposed Gemini API keys discovered by automated scanners.
**Expected output:** Prioritized finding list, billing protection checklist, secret scanning setup, one-page incident response runcard.
**Cross-link**: β [Chapter 10: The Dark Side](https://vibecodingebook.com/reader#ch10) for the full AI security threat landscape. β [Chapter 19: The Security Playbook](https://vibecodingebook.com/reader#ch19) for the 30-minute pre-deploy checklist. β [endofcoding.com: AI API Key Security β The 22-Second Window](https://endofcoding.com/ebook/ai-api-key-security-22-second-window-2026)
---
*Chapter 17 additions β May 26, 2026 | Prompts 17.288β17.290 (Cross-Session Agent Memory Setup, Self-Hosted Model Evaluation Framework, AI Security Hardening Audit) | 304+ prompts across 48 categories | Previous: May 24 (prompts 17.285β17.287 β Microsoft Conductor Multi-Agent Orchestration, GitHub Copilot June 1 Billing Migration Audit, Apple iOS 27 AI Feature Integration Blueprint). Prompted by: Anthropic Dreaming launch (cross-session agent memory consolidation), open-weight frontier parity (Kimi K2.6/DeepSeek V4/GLM-5.1), and AI security bug-pocalypse (Google Cloud 5-figure unauthorized API bills, 22-second breach-to-attack window).*
18. Tool Comparison Matrix
A living comparison of every major vibe coding tool. Updated monthly.
AI-Native IDEs
| Tool | Price | Best For | Key Feature | Security Concern |
|---|---|---|---|---|
| Cursor | $20/mo + Composer 2.5 usage | Full-stack dev, large codebases, agent loops | Composer 2.5 (79.8% SWE-Bench Multilingual at $0.50/M input + $2.50/M output, ~10× cheaper than Opus 4.7); Cursor 3.3 PR Review + Build in Parallel; Cursor in Jira and MS Teams (May 2026) | CVE-2026-26268 git-hook RCE (CVSS 9.9, patched April 2026); CurXecute (CVE-2025-54135) |
| Windsurf (Cognition) | $20/mo Pro / $200/mo Max (raised May 2026) | Long-context projects, Devin-bundled workflows | Windsurf 2.0 Agent Command Center + Spaces; Devin Cloud and Devin Terminal CLI bundled into paid tiers | Memory poisoning via prompt injection |
| VS Code + Copilot | $10/mo Pro ($15 included usage from June 1) / $39 Pro+ ($70 included) | AI without switching editors; usage-based billing from June 1, 2026 | Agent Mode GA; CLI v1.0.48 shows per-token model prices in picker; unified sessions view; global custom agents at ~/.copilot/agents/ | Lower autonomy = lower blast radius; AI Credits meter Chat/CLI/cloud agents (completions stay unlimited, free) |
Autonomous Agents
| Tool | Price | Best For | Autonomy | Differentiator |
|---|---|---|---|---|
| Claude Code | Usage-based + Pro/Max plans (5-hour limits doubled May 6, 2026; peak-hour throttling removed on Pro/Max) | Enterprise codebases | High (subagent teams, Remote Agents up to 72h) | $2.5B+ ARR, 87.6% SWE-bench Verified (Opus 4.7), Claude Code 3.0 Remote Agents + Persistent Memory + Skills Registry, 1.2M active users |
| Devin (Cognition) | $500/mo standalone; bundled into Windsurf Pro/Max/Teams | Async tasks, migrations | Very High | $445M ARR (May 12 disclosure), 78% autonomous PR merge rate at SWE-1.7, Cognition closed $25B SoftBank Series D May 6, 2026 |
| Codex CLI | Usage-based (GPT-5.5) | Open-source, Rust/systems | Medium | Open-source, sandboxed execution; GPT-5.5 at 82.7% Terminal-Bench 2.0 (SOTA) |
| Jules (Google) | Free 50 tasks/mo — $125/mo | Async bugfixes, PR gen | High | GA post-I/O 2026, Gemini 3 Pro-powered, GitHub integration with Google Cloud VM sandboxing |
| Gemini CLI | Free tier + paid | Open-source terminal work, voice-driven sessions | Medium | v0.41.0 (May 2026): real-time voice mode (cloud + local), enforced workspace trust, .env loading secured in headless mode — direct response to April CVSS 10.0 RCE (GHSA-wpqr-6v78-jr5g) |
| Amazon Q | Free-$19/mo | AWS-heavy projects | Medium | Deep AWS integration |
Browser Builders (No-Code)
| Tool | Price | Best For | Output Quality | Risk Level |
|---|---|---|---|---|
| Bolt.new | Free-$20/mo | Rapid full-stack prototypes | Good | Medium |
| v0 | Free-$20/mo | React/Next.js UI components | Excellent | Low (UI only) |
| Lovable | Free-$25/mo | Non-dev app creation | Good | High — April BOLA flaw exposed all pre-Nov-2025 projects; three documented security incidents to date; treat platform-side tenant isolation as untrusted |
| Replit Agent | Free-$25/mo | Complete apps from description | Good | Medium β $400M Series D, $9B valuation (Mar 2026). 75% of Replit AI users write zero code. |
Open-Source & Cost-Efficient Alternatives
For teams optimizing cost, data privacy, or running on self-hosted infrastructure.
| Model/Tool | Parameters | Cost vs Claude Sonnet | SWE-bench / Rank | Best For |
|---|---|---|---|---|
| MiMo-V2-Pro (Xiaomi) | 1 Trillion (Hunter Alpha) | -67% cheaper than Claude Sonnet 4.6 | 3rd globally on agent benchmarks (Mar 2026) | Cost-sensitive production workloads, batch jobs |
| Gemini CLI (Google) | N/A (cloud) | Free tier available | Competitive, Flash variant | Open-source terminal work, Google ecosystem |
| Codex CLI (OpenAI) | N/A (cloud) | Usage-based (GPT-5.4) | 77.3% Terminal-Bench | Sandboxed execution, CI/CD integration |
| obra/superpowers | N/A (framework) | Free + model API costs | 92,100 GitHub stars (Mar 2026) | Custom agent framework, multi-step workflows |
| OpenClaw | N/A (framework) | Free + model API costs | 210,000 GitHub stars (Mar 2026) | Open-source agent orchestration, self-hosted |
Choosing Your Stack
</div>
19. The Security Playbook
A practical guide to hardening vibe-coded applications before they touch real users.
</div>
The 30-Minute Security Checklist
Run this on every vibe-coded application before showing it to anyone outside your team:
</div>
</div>
</div>
</div>
</div>
</div>
AI Tool Security Advisories
MCP Supply Chain: The New Attack Surface
Key MCP CVEs (March 2026):
- CVE-2026-23744 (CVSS 9.8, MCPJam Inspector ≤ v1.4.2): A crafted HTTP request to a critical endpoint bound to 0.0.0.0 with no authentication can install an arbitrary MCP server and execute code on the host. No user interaction required.
- Azure MCP Server RCE (CVSS 9.6, demonstrated at RSAC 2026): A vulnerability in Microsoft’s Azure MCP server capable of compromising cloud environments via the agent connection.
- SSRF exposure: BlueRock Security analyzed 7,000+ MCP servers and found 36.7% potentially vulnerable to server-side request forgery.
How to protect yourself:
- Audit all installed MCP servers. Run
ls ~/.config/claude/mcp*and remove any servers you didn’t explicitly install. - Only install MCP packages from verified, well-known authors with active maintenance history.
- Pin MCP server versions in your configuration — don’t use
@latest. - Check package provenance before installing from ClawHub or any MCP registry.
- Treat MCP server packages as executable code with system access — because they are.
Supply Chain Attacks: April 2026 Alert
April 2026 Supply Chain Attack Summary:
| Package / Tool | Date | Impact | Attribution |
|---|---|---|---|
| axios 1.14.1, 0.30.4 | March 31 | WAVESHAPER.V2 RAT; ~100M weekly downloads | UNC1069 (North Korea/DPRK) |
| LiteLLM 1.82.7, 1.82.8 | March 24 | Multi-stage credential stealer (SSH keys, cloud tokens, K8s secrets, .env files) | Unknown |
| Langflow β€ 1.8.2 (CVE-2026-33017) | March 17 | Unauthenticated RCE via public endpoint; exploited within 20h; CISA KEV | Active threat actors |
| Trivy Docker Hub images (CVE-2026-33634) | March 19 | Malicious code in Aqua Security's Trivy scanner images | TeamPCP |
Langflow CVE-2026-33017 detail: Critical code injection in the AI agent framework's public flow build endpoint. No authentication required. Exploitation was observed in the wild within 20 hours of public disclosure and CISA added it to the Known Exploited Vulnerabilities catalog. If you run Langflow, upgrade to 1.8.3+ immediately.
Trivy Cascade extended (April 2026): The Trivy compromise (CVE-2026-33634) evolved into a much larger incident. Attackers force-pushed malicious code to 75 of 76 trivy-action GitHub Actions tags, then published additional malicious Docker images during the remediation effort (taking 5 days to fully evict). The attack then spawned CanisterWorm — a self-propagating npm worm that hit 64+ packages using blockchain-based command-and-control infrastructure, making it resistant to traditional domain seizure. CanisterWorm spread to Checkmarx KICS and AST GitHub Actions, and separately reached LiteLLM (95 million monthly PyPI downloads). Any CI/CD pipeline that used Trivy, Checkmarx KICS, or LiteLLM between March 19 and April 10 should be treated as potentially compromised and audited.
What this means for vibe coders:
- Dependencies installed by AI-generated code are attack vectors. Always
npm auditafter any AI-generatedpackage.jsonor install step. - AI coding tools themselves (Langflow, LiteLLM, MCP servers, security scanners) are now priority targets for supply chain attackers.
- Security tooling is not immune β Trivy (a vulnerability scanner) was itself the vector. Audit your audit tools.
- Pin exact dependency versions. Don't use
@latestor loose semver ranges for packages you can't quickly audit. - Enable npm provenance verification and
--ignore-scriptsin CI pipelines to limit post-install attack surface. - Blockchain-based C2 is increasingly being used to make supply chain worms resistant to takedown β conventional domain blocklists are insufficient.
The Vibe Coding Security Crisis Week (April 19β22, 2026)
Lovable BOLA Data Breach (disclosed April 20). A broken object-level authorization vulnerability in Lovable's API allowed any authenticated free-tier user to access another user's profile, public projects, source code, database credentials, AI chat histories, and customer data β in as few as five API calls. The flaw had been reported through HackerOne 48 days before disclosure and was marked a "duplicate submission." The researcher, @weezerOSINT, eventually disclosed publicly on X. Lovable's first response attributed exposure to "intentional behavior" and "unclear documentation," then blamed HackerOne; CEO Anton Osika later apologised. A fix shipped within roughly two hours of public disclosure. Independent analysis estimated the flaw exposed every Lovable project created before November 2025 β a $6.6B vibe-coding company with $400M ARR. Practical lesson: vibe coding platforms are now custodians of source code, database credentials, and conversation logs at scale; their access control is your access control. Treat platform-side multi-tenant isolation as a must-test item before deploying anything sensitive.
Vercel Breach via Context.ai OAuth Supply Chain (disclosed April 19). The intrusion began with a Lumma Stealer malware infection at Context.ai β a third-party AI evaluation tool used by a Vercel employee β around February 2026. Attackers used the compromised Google Workspace OAuth tokens to take over the employee's individual Vercel account, then pivoted into Vercel's internal systems and decrypted environment variables for a "limited" subset of customer projects. The threat actor (ShinyHunters) listed Vercel's internal user database on BreachForums for $2M. Vercel coordinated with GitHub, Microsoft, npm, and Socket and confirmed no Vercel-published npm packages were compromised, but said the breach may affect "hundreds of users across many organizations." Practical lesson: every AI tool you grant OAuth access to is a path into your account. Review the OAuth grants on your Google Workspace, GitHub, and Vercel accounts; revoke every AI evaluation, debugging, or "productivity" tool you don't actively use. Treat third-party AI tool OAuth scopes the same way you treat production secrets.
Bitwarden CLI Supply Chain Attack β "Shai-Hulud: The Third Coming" (April 22). A malicious release of @bitwarden/cli@2026.4.0 was distributed via npm between roughly 5:57 PM and 7:30 PM ET on April 22, 2026. The vector was a compromised GitHub Action in Bitwarden's CI/CD pipeline β the payload was injected during the build step without needing Bitwarden's npm credentials or source code access. The 10 MB obfuscated payload harvested SSH keys, cloud credentials, CI/CD secrets, and β for the first time in a confirmed npm supply chain attack β specifically hunted authenticated AI coding tool configurations: Claude Code, Cursor, Codex CLI, Aider, Kiro, and Gemini CLI. Researchers found the string "Shai-Hulud: The Third Coming" embedded in the package, linking it to the broader Checkmarx supply chain campaign tracked since March. About 334 downloads of the malicious version completed before takedown; Bitwarden published 2026.4.1 (a re-release of 2026.3.0) within ~90 minutes and confirmed no vault data was compromised. Practical lesson: your authenticated AI coding tool sessions β the local config files, OAuth tokens, and API keys β are now an explicit target. Rotate AI coding tool credentials after any unverified npm install. Use ephemeral / short-lived auth tokens where the tool supports them. Don't run AI coding tools as the same OS user that handles secrets-laden CI work.
30-second response checklist after any of these incidents:
- Revoke and rotate API keys for every AI coding tool you've signed into in the last 60 days (Claude Code, Cursor, Codex CLI, Aider, Kiro, Gemini CLI, GitHub Copilot CLI).
- Audit OAuth grants on your Google Workspace, GitHub, and deploy-platform accounts; remove anything unused or unfamiliar.
- For any vibe-coding platform that holds your source: rotate every database password, API key, and webhook secret stored in that platform.
- Re-scan production deploys made between February and April 2026 for environment-variable exposure if you used Vercel + a third-party AI evaluation tool.
- Pin npm dependencies of CLI tools that hold credentials (password managers, cloud CLIs, AI tool clients). Avoid
@latestfor anything that can read other secrets.
PromptMink: AI-Co-Authored Supply Chain Attacks (May 2026)
The Claude-co-authored crypto-agent commit (Feb 28, 2026). A commit landed in the open-source npm package openpaw-graveyard — an autonomous Solana trading agent — with "Co-Authored-By: Claude Opus" in the trailer. The commit added @solana-launchpad/sdk as a new dependency. @solana-launchpad/sdk looked legitimate but transitively pulled in @validate-sdk/v2, which presented itself as a generic data-validation utility while quietly harvesting environment variables, SSH keys, and crypto wallet credentials and exfiltrating them to an attacker-controlled server. The malicious dependency was selected and added by a coding LLM that found the package convincing — a chain that ReversingLabs traces to LLMO-tuned README content engineered to score well in agent retrieval.
The payload evolution. Famous Chollima's PromptMink payloads started in late 2025 as straightforward JavaScript infostealers, moved to single-executable application bundles in Q1 2026, and as of early May 2026 are shipping as compiled Rust payloads — harder to deobfuscate, harder to detect with conventional npm scanning, and much harder to attribute via source-level analysis.
The hallucinated-package precedent. A January 2026 experiment by Aikido Security researcher Charlie Eriksen registered an npm package called react-codeshift that had been hallucinated by an LLM — the package didn't exist until Eriksen registered it under the name the LLM had invented. It then propagated into 237 GitHub repositories via AI coding assistants suggesting the (now-real) package. PromptMink is the same vector turned hostile.
What this means for vibe coders.
- Every "AI suggested this dependency, I just typed yes" workflow is now a credible attack surface. The package on the other end may have been engineered specifically to be recommended by your agent.
- Co-Authored-By: Claude (or any other LLM trailer) is not a trust signal. The Feb 28 trailer is real — an attacker used Claude to generate a commit that added a malicious dependency. Treat AI-co-authored commits in your own repos with the same diff review you would apply to a human commit from an unknown contributor.
- Pin and lock dependencies with
npm ci, exact-version pins for security-sensitive packages, and Socket / Snyk / Aikido-style supply-chain scanners that look at package behavior, not just metadata. - Audit any LLM-suggested package before install. The agent has no real way to verify a package is what its README claims; you do.
- Treat compiled-binary npm packages (Rust, Go, native bindings) as a higher-risk class. Demand that they ship with a reproducible build process, not just a prebuilt artifact.
The AI-Generated Code Vulnerability Surge (CSA, 2026)
The Cloud Security Alliance's AI-Generated Code Vulnerability Surge research note (released early May 2026) put numbers on what AppSec teams have been observing through 2025 and Q1 2026:
The takeaway: Speed wins on volume; security loses on rate. The 3–4x productivity bump from AI coding tools comes paired with a 10x security-finding rate. The 30-Minute Security Checklist at the top of this chapter is no longer a "nice to have" — it's the budget item that closes the gap.
MCP Database Flaws & "Prompts Become Shells" (May 2026)
The three May 13 MCP database CVEs:
| MCP Server | Vulnerability | Impact | Status |
|---|---|---|---|
| Apache Doris MCP | SQL injection via MCP tool args | Unintended SQL execution against a connected Doris cluster | Patched |
| Alibaba RDS MCP | Sensitive metadata exfiltration | An agent can be coerced into exposing connection credentials and database metadata it should not surface | Patched |
| Apache Pinot MCP | Instance takeover (internet-exposed) | A crafted MCP tool call can take over a Pinot instance reachable from the internet | Unpatched — vendor declined |
What the Microsoft "Prompts Become Shells" report adds. Microsoft's May 7 write-up names four failure patterns that the major agent frameworks ship with by default and that vibe-coded apps inherit when they wire up the same orchestrators:
- Tool argument injection. Untrusted document text reaches a tool call as an argument. The agent invokes the tool (email, file write, payment) with attacker-controlled parameters and the agent's authority.
- Code-interpreter abuse. A "run this code" tool that executes on the host rather than in a sandbox is a
python -con production. Multiple frameworks shipped this as the default. - Workflow compilation injection. Attacker-controlled text flows into a workflow definition or step graph that the executor later runs — the AI-era equivalent of SQL injection, except the "query" is an entire workflow.
- MCP server-side injection. When the MCP server itself fails to sanitize arguments before composing a downstream query (the Doris case), the agent platform's value proposition — "let the model call tools" — is the injection channel.
The 7-point hardening checklist for vibe coders shipping MCP-enabled apps:
- Audit every connected MCP server before granting it tool authority. Pin its version, read its source, check it has parameterized queries everywhere. Do not run
@latestfor MCP packages — the supply chain has had 30+ CVEs in the first 60 days of 2026 alone. - Refuse to deploy MCP servers from declined-to-patch vendors. The May 13 Apache Pinot story is the disclosure precedent. If a maintainer publicly chose not to fix a known RCE, that server has no place in your stack.
- No code-interpreter tools on the host. If your AI app exposes "run this code," wrap it in E2B, Modal, Firecracker, or gVisor. The default
subprocess.runpath is the failure Microsoft named. - Validate tool arguments independent of what the model says. The platform must enforce that the
toaddress in an email tool belongs to the calling user, that the file path is inside the user's scope, that the payment amount is within their pre-authorized ceiling. The model is not the enforcement layer. - Treat retrieved documents and search results as untrusted prompt content. Wrap them in clearly demarcated tags. Instruct the model to treat tagged content as data, not instructions. This is not a complete defense, but combined with argument validation it raises the bar materially.
- Scope each workflow's tool allowlist. A summarization workflow does not need write access. An email workflow does not need shell. The default-grant-all-tools posture is the agent-platform equivalent of running every service as root.
- Human-in-the-loop for destructive or sensitive actions. Display the actual tool arguments, not the model's natural-language summary of what it is about to do. The injection literature includes multiple cases where the summary diverged from the literal call.
What this means for the vibe-coded app you shipped last quarter. If your app talks to a database via an MCP server, audit which server, which version, and whether the maintainer is responsive. If your app exposes any code-execution surface to an AI model — even a "data analysis" or "chart generation" tool — verify it runs in a sandbox. If your app accepts user-uploaded documents and feeds them to an agent, walk through what happens when the document contains text designed to look like an instruction, not content.
The shared lesson of the May 2026 disclosures: the boundary between "content" and "instruction" was assumed across the agent ecosystem but never enforced. Every hardening pattern that follows is a re-enforcement of that boundary at a different architectural layer.
Mini Shai-Hulud: First SLSA-Attested Malware (CVE-2026-45321, May 11, 2026)
@tanstack namespace — including @tanstack/react-router at 12.7M+ weekly downloads. The malicious versions were published by TanStack's legitimate release pipeline using its trusted OIDC identity, after attacker-controlled code hijacked the GitHub Actions runner mid-workflow. The attack chained the pull_request_target "Pwn Request" pattern, GitHub Actions cache poisoning, and runtime extraction of an OpenID Connect (OIDC) token from the runner process memory. Vulnerability assigned CVE-2026-45321 (Critical severity); attribution to TeamPCP (StepSecurity), tracked by Google Threat Intelligence as UNC6780.
@mistralai/*), UiPath (65 packages), OpenSearch (1.3M weekly downloads), and Guardrails AI (PyPI). Total impact: 170+ packages across npm and PyPI, 518M+ cumulative downloads.
Payload behavior. The 2.3 MB obfuscated payload reads GitHub Actions runner process memory to extract every secret available to the workflow, harvests credentials from 100+ file paths spanning cloud providers, cryptocurrency wallets, AI coding tool configurations, and messaging apps, and — the new escalation — installs persistence hooks in Claude Code, VS Code, and OS-level services. The persistence hook pattern means the compromise survives the package being uninstalled: cleanup requires auditing AI coding tool config directories (~/.claude/, ~/.cursor/, ~/.config/Code/) and the user's ~/.bashrc / ~/.zshrc / ~/.profile, not just npm ls @tanstack/*.
Four-point hardening checklist for vibe coders:
- Pin every
@tanstack/*dependency to a version published before May 11, 2026 19:00 UTC in your lockfile. The Mini Shai-Hulud versions sit between known-good and known-good in the version history, so a naivenpm audit fixwill not catch them — lockfile pinning is the only reliable mitigation until npm removes the affected artifacts. - Use
gh attestation verifywith explicit--signer-workflowor--signer-repoflags. The defaultgh attestation verifyonly checks that some attestation exists; this attack passes that check. You must specify the expected signer identity for verification to be meaningful:gh attestation verify <artifact> --owner tanstack --signer-workflow ".github/workflows/release.yml". - Audit
id-token: writescope in every GitHub Actions workflow. Any workflow withpull_request_targetplusid-token: writeis a viable Mini Shai-Hulud target. Removeid-token: writefrom any workflow that does not publish signed releases; never combine it withpull_request_targetunless every code path that runs during PR is locked to repository-owned actions. - Audit AI coding tool config directories on developer machines that installed any
@tanstack/*version between May 11 and May 13, 2026. Check~/.claude/,~/.cursor/,~/.copilot/, and~/.config/Code/User/for unexpectedsettings.jsonentries,hooks/directories, or recently modified custom-agent files. Rotate any OAuth tokens, API keys, and SSH keys present on those machines.
See Chapter 17, Prompt 17.252 for a full SLSA Attestation Integrity Verifier prompt, and Prompt 17.288 for the post-Shai-Hulud AI coding tool config audit prompt.
Companion Disclosures β May 14–22, 2026
9.1.x and 9.2.x versions in the major-version range commonly used by older Electron and CLI tooling — if your project depends on a sub-dependency that bundles node-ipc, range-resolution alone will not protect you.
semantic-kernel package. Both allow attackers to perform remote code execution through prompt injection — an untrusted document or tool-response that flows into a Semantic Kernel agent can drive the agent to execute attacker-supplied code on the host. The companion to the May 7 Microsoft Security Blog “When prompts become shells” research already documented above — with concrete CVEs against Microsoft's own agent framework. Patch to Semantic Kernel .NET SDK 1.71.0+ or the latest semantic-kernel Python release immediately if you operate Semantic Kernel agents in any role that touches untrusted text.
Vendor Response: What Shipped This Week (May 13–20, 2026)
.env loading is secured in headless mode so background sessions no longer surface project secrets into the model context by default; and shell command validation gains an expanded core-tools allowlist instead of the broader implicit-trust posture of the previous releases. Claude Code 3.0 (May 13) addressed the same class of failure from the agent side with the tool-response-sandboxing flag, which prevents tool responses from rewriting the active agent instruction set — the exact technique used in the May 8 Trail of Bits MCP breach. Pattern across vendors: the boundary the May disclosures said was assumed-but-never-enforced is now being enforced at the CLI / agent-shell layer. If you operate Gemini CLI in CI, upgrade to v0.41 and audit which workspaces are trusted; if you operate Claude Code, set tool-response-sandboxing in CLAUDE.md for any session that talks to third-party MCP servers.
Vibe-Coded App Vulnerability Research
AI-generated code CVE trend:
| Month | CVEs attributed to AI-generated code |
|---|---|
| January 2026 | 6 |
| February 2026 | 15 |
| March 2026 | 35 |
The accelerating rate reflects both more AI-generated code in production and improved attribution tooling. Per Autonoma research, 53% of AI-generated code contains security holes. The pattern in these CVEs is consistent: AI models tend to generate working functionality quickly but skip authentication checks, hardcode credentials, and mis-scope data access β exactly the failures the 30-minute checklist is designed to catch.
The Coming Paradigm: AI as Autonomous Vulnerability Researcher
This is a meaningful shift. For years, the security community discussed AI as a tool to help humans find bugs faster. Claude Mythos demonstrates a model that can operate the entire vulnerability research workflow autonomously — including exploitation. The implications for vibe-coded applications:
- The attack surface is permanent. Security is not a one-time audit. Autonomous vulnerability research tools will continuously discover new issues in deployed applications. Shipping and forgetting is no longer viable.
- AI finds what humans miss. A 17-year-old RCE in FreeBSD escaped human detection for nearly two decades. AI can find deep logic bugs and memory-corruption patterns at scale.
- Defense must scale too. The same AI capabilities that find bugs can also be used defensively to scan your code before it ships. Use AI-powered security scanning in your CI/CD pipeline β not as a replacement for the 30-minute checklist, but as an additional layer.
- The vibe-coded app risk is elevated. AI-generated code is already producing 35+ CVEs per month. As autonomous vulnerability finders become more capable, that code will be scanned faster and more thoroughly by both defenders and attackers.
The practical response for vibe coders: treat every public-facing application as permanently under automated security review. Build with authentication, input validation, and secrets management from the first commit β not as an afterthought.
Security Prompts for AI Tools
Review this codebase for OWASP Top 10 vulnerabilities.
For each issue found: severity (Critical/High/Medium/Low),
file and line number, what's wrong, the fix, and how to test it.
Prioritize by severity.
</div>
Chapter 20: Video Tutorials -- Embedded Remotion-Generated Walkthroughs
Bite-sized, binge-worthy video tutorials that show real vibe coding workflows in action. Each video is 60-120 seconds, focused on one specific technique, and embedded directly in the interactive ebook using Remotion components. Updated monthly with 2-4 new videos.
Why Video Tutorials Inside an Ebook
Reading about vibe coding is one thing. Watching a real app materialize from a single prompt in under ninety seconds is something else entirely.
Traditional ebooks give you text and screenshots. This one gives you motion. Every video in this chapter is a self-contained Remotion composition -- a React component that renders to video. That means each tutorial is versioned, reproducible, and embedded natively in the interactive ebook without relying on external hosting. You can watch them inline, pause on any frame, and in the web version, interact with the code snippets directly.
The videos are grouped into three series, each designed for a different purpose:
- Prompt to Product -- Viral-format demonstrations of complete apps built from single prompts. Optimized for shareability and shock value.
- The Prompt That... -- Educational deep-dives with a comedic edge. Each video dissects one prompt and its unexpected consequences.
- Tool Face-Off -- Head-to-head comparisons between competing tools, scored on speed, quality, and developer experience.
Every video follows the same production pipeline: markdown script, Remotion composition with screen recordings and motion graphics, AI-generated narration, and branded end cards. The result is a library that grows over time and works across platforms -- full-length on YouTube, clipped for TikTok/Reels/Shorts, and embedded here in the ebook.
Video Series 1: "Prompt to Product" (Viral Potential)
Each video in this series shows a complete, functional application being built from a single natural-language prompt. A real-time countdown timer runs in the corner. The screen recording is unedited -- what you see is what actually happened. The final reveal shows the deployed app running in a browser.
Series format:
- Duration: 60-90 seconds
- Structure: Hook (3s) -> Prompt reveal (5s) -> Countdown build (40-70s) -> Reveal + deploy (10s) -> End card (5s)
- Visual signature: Neon countdown timer in the top-right corner, split-screen showing prompt on the left and the AI's output on the right
- Audio: Fast-paced electronic background track, AI text-to-speech narration, keystroke and notification sound effects
Video #1: 60-Second SaaS (Bolt.new)
Title/Hook: "I built a $9/month SaaS in 60 seconds"
Tool: Bolt.new
Concept: Starting from a completely blank Bolt.new session, a single prompt generates a fully functional micro-SaaS -- a link shortener with analytics, user accounts, and a Stripe-ready pricing page. The countdown timer hits zero just as the app deploys.
Tone: Breathless, slightly disbelieving. The narration captures the genuine absurdity of how fast this is.
Script Outline (170 words): Open on a blank browser tab. The narrator says: "I'm going to build a SaaS product that charges $9 a month. I have 60 seconds." The countdown starts. Cut to the Bolt.new interface. The prompt appears on screen as it is typed: a link shortener with user authentication, click analytics dashboard, custom short domains, and a pricing page with free and pro tiers. Bolt.new starts generating. The split screen shows the prompt on the left, the live preview assembling on the right -- components appearing in real time, a login form, a dashboard with charts, a pricing table with toggle between monthly and annual. The timer passes 30 seconds. The app is taking shape. At 50 seconds, the deployment starts. At 58 seconds, a live URL appears. The timer hits zero. Cut to the deployed app in a fresh browser: working signup, working dashboard, working pricing page. End card: "Total cost: $0. Total code written by a human: 0 lines."
Visual Concepts for Remotion:
CountdownTimercomponent: neon green digits, pulses red below 10 seconds, shakes at 3-2-1SplitScreenBuildcomposition: left panel shows the prompt text animating in typewriter-style, right panel shows a screen recording of Bolt.new's live previewDeploymentFlashanimation: when the URL goes live, a burst animation radiates from the URL barMetricCardend-card overlay: three floating cards showing "Time: 60s", "Lines of code: 0", "Cost: $0" with staggered fade-in- Screen recording captured at 60fps, composited at 30fps for smooth playback
Video #2: Portfolio Speedrun (v0 + Vercel)
Title/Hook: "Your portfolio shouldn't take longer than your morning coffee"
Tools: v0 by Vercel, Vercel deployment
Concept: A developer's portfolio website -- hero section, project grid, about page, contact form, dark mode toggle -- goes from blank prompt to live Vercel deployment while a coffee timer ticks down. The coffee metaphor runs throughout: the video opens with pouring coffee, and each section of the site appears as the coffee cools.
Tone: Relaxed and conversational, contrasting with the speed of what is happening on screen. The humor comes from the mismatch between the casual narration and the absurd pace.
Script Outline (180 words): Open on a close-up of coffee being poured. The narrator says: "The average developer spends 3 weeks on their portfolio. I'm going to finish mine before this coffee is cool enough to drink." Cut to v0. The prompt describes a developer portfolio: dark theme, animated hero with a typewriter effect showing "I build things," a responsive project grid pulling from a JSON file, an about section with a timeline, a contact form, and a dark/light mode toggle. v0 generates the first component. The narrator walks through what is appearing while keeping the tone casual -- "Oh, that's a nice grid layout... didn't ask for that hover effect but I'm keeping it." At 40 seconds, the design is complete. The code is exported to a GitHub repo. Vercel picks up the push and begins deploying. The narrator takes a sip of coffee. The Vercel build completes. The live site loads: responsive, polished, with real content. "Still too hot to drink. I should probably build a second portfolio."
Visual Concepts for Remotion:
CoffeeTimercomponent: a coffee cup illustration in the corner with a steam animation, a circular progress ring around it representing timeComponentAssemblyanimation: each section of the portfolio slides into a wireframe layout, then fills in with color and content -- like a blueprint becoming a buildingv0Previewscreen capture: the v0 interface generating components in real timeVercelDeployanimation: a minimal deployment progress bar styled in Vercel's black-and-white aesthetic, with the URL appearing at the end- Smooth crossfade transitions between the coffee close-up and the screen recording
Video #3: The $0 Startup (Lovable)
Title/Hook: "This app makes money. I didn't write a single line."
Tool: Lovable
Concept: A non-technical founder builds a complete SaaS product using only Lovable -- from idea to deployed, revenue-generating application. The video emphasizes that the person building this has no programming background. The "reveal" is not just the app, but a real Stripe dashboard showing the first payment.
Tone: Inspirational but grounded. Not "anyone can do this" hype -- more "here's exactly what the process looks like when you've never coded before."
Script Outline (190 words): Open on a text overlay: "I'm not a developer. I'm a marketing manager." The narrator continues: "Last month, I had an idea for a tool that helps freelancers track their invoices. This morning, I built it." Cut to Lovable. The prompt is detailed and specific -- it describes an invoice tracker with client management, recurring invoice templates, PDF export, and a simple dashboard showing outstanding payments. Lovable begins generating. The narration explains the key decisions: why the prompt specifies Supabase for the backend, why it asks for Row Level Security so each user only sees their own data, why it mentions Stripe Connect for future payment processing. At 45 seconds, the app is running in Lovable's preview. The narrator tests the core workflow: create a client, generate an invoice, export to PDF. Everything works. At 70 seconds, the app deploys. Cut to a real Stripe dashboard showing a $12 test payment. "I didn't write code. I didn't hire a developer. I described what I needed. Total investment: a Lovable subscription and one afternoon of prompt writing."
Visual Concepts for Remotion:
IdentityCardintro animation: a business-card-style overlay showing "Marketing Manager" with a crossed-out "Developer" beneath itPromptAnnotationoverlay: as the prompt scrolls, key phrases highlight and small tooltip annotations explain why each detail matters (e.g., "Row Level Security" highlights with a note: "This keeps each user's data private")WorkflowDemoscreen recording: the invoice creation flow captured step-by-step with zoom-ins on important UI elementsStripeRevealanimation: the Stripe dashboard slides in from the bottom with a cash register sound effect and a subtle confetti particle burst- Color palette shifts from grayscale (the "before") to full color (the "after") as the app comes to life
Video #4: Clone Wars (Cursor)
Title/Hook: "I showed AI a screenshot of Notion. Here's what happened."
Tool: Cursor (Agent mode with Composer)
Concept: A screenshot of Notion's interface is fed to Cursor's AI, along with a prompt asking it to recreate the core functionality. The video follows the agent as it plans the architecture, generates the components, and builds a working Notion-like workspace -- pages, blocks, drag-and-drop, slash commands -- all from a single image and a paragraph of context.
Tone: Playful and slightly mischievous. The "clone wars" framing leans into the controversy of AI-generated clones while keeping it lighthearted.
Script Outline (185 words): Open on a screenshot of Notion's interface. The narrator says: "This is Notion. 400 engineers built this over 10 years. I'm going to see how close AI can get in 2 minutes." The screenshot is dragged into Cursor's Composer. The prompt is brief but precise: recreate a note-taking workspace with a sidebar, nested pages, rich text blocks, slash command menu for adding headers/lists/toggles, and drag-to-reorder blocks. Cursor's agent starts planning. An overlay shows the agent's thought process -- the file tree it is creating, the components it has decided to build, the libraries it is installing. At 30 seconds, the first components render: a sidebar with a page tree. At 60 seconds, the editor is working: typing, formatting, slash commands. At 90 seconds, drag-and-drop is functional. The narrator does a side-by-side comparison with the original screenshot. Some elements are strikingly close. Others are clearly AI-generated. "Is it Notion? No. Could you use it? Absolutely. Did a human write any of this code? Not a single character."
Visual Concepts for Remotion:
ScreenshotToCodeopening animation: the Notion screenshot dissolves pixel-by-pixel into code characters, which then reassemble into the cloned interfaceAgentThinkingoverlay: a semi-transparent sidebar showing Cursor's agent plan as it generates -- file names, component tree, dependency list, appearing in real timeSideBySidecomparison frame: original Notion on the left, clone on the right, with a slider the viewer can conceptually drag between themFileTickerbottom bar: a scrolling ticker showing file names as they are created ("sidebar.tsx... editor.tsx... slash-commands.tsx..."), styled like a stock ticker- Cursor's interface captured with visible agent actions highlighted
Video #5: The Debug Olympics (Claude Code)
Title/Hook: "Can AI fix a bug faster than Stack Overflow?"
Tool: Claude Code
Concept: A real, nasty bug -- the kind that would send a developer to Stack Overflow for an hour -- is presented to Claude Code. The screen is split: on the left, a simulated "Stack Overflow search" shows the traditional debugging path (finding related questions, reading answers, trying solutions). On the right, Claude Code analyzes the error, traces the root cause through multiple files, and delivers a working fix. A race timer tracks both sides.
Tone: Competitive and high-energy, like a sports broadcast. The narration calls the race like a commentator.
Script Outline (175 words): Open on a terminal showing a cryptic error: a React hydration mismatch caused by a timezone-dependent date format in a server component. The narrator, in a sports-announcer voice: "In the left corner, the defending champion: Stack Overflow and pure human tenacity. In the right corner, the challenger: Claude Code. The bug: a hydration error that has already cost this developer 45 minutes. Let the race begin." The split screen activates. Left side: a browser opens Stack Overflow, searches the error message, scrolls through three different answers, tries a solution that does not work, goes back. Right side: Claude Code receives the error, opens the relevant files, traces the date formatting issue across server and client components, identifies the mismatch, proposes a fix, and applies it. Claude Code finishes in 23 seconds. The left side is still reading the second Stack Overflow answer. "The AI finished before the human found the right question to ask."
Visual Concepts for Remotion:
RaceTimerdual countdown: two stopwatches side by side, one for each approach, styled like a sports scoreboard with team colors (orange for Stack Overflow, purple for Claude)SplitRacecomposition: left and right panels with independent screen recordings, separated by a glowing dividing lineDebugTraceanimation: on Claude Code's side, colored lines connect the error message to the relevant files, showing the AI's reasoning path like a detective's evidence boardVictoryFlashanimation: when Claude Code finishes, its panel pulses with a winner overlay while the Stack Overflow panel dimsBugAnatomyend card: a diagram showing the root cause of the bug, making the video educational as well as entertaining
Video Series 2: "The Prompt That..." (Educational + Humor)
This series takes a single prompt and follows it to its logical (and sometimes illogical) conclusion. Each video is educational at its core -- you learn prompt engineering techniques, tool capabilities, and common pitfalls -- but the framing is comedic. The "The Prompt That..." naming convention is designed for curiosity-driven clicks.
Series format:
- Duration: 90-120 seconds
- Structure: Setup (10s) -> The prompt (10s) -> The process (40-60s) -> The twist/result (20-30s) -> Lesson learned (10s) -> End card (5s)
- Visual signature: The prompt text is always displayed on a "sticky note" style card that stays pinned to the screen throughout the video
- Audio: Conversational narration, comedic timing with beat pauses, sound effects for emphasis
Video #6: The Prompt That Built a Game
Title/Hook: "The Prompt That Built a Game"
Tool: Claude Code + Remotion (for the game rendering)
Concept: A single, carefully crafted prompt generates a complete browser game -- not a trivial one, but a polished arcade game with physics, particle effects, a scoring system, leaderboard, and mobile touch controls. The video walks through the prompt's structure, explaining why each sentence matters, then shows the game coming to life.
Tone: Enthusiastic and educational. The narrator genuinely enjoys playing the result.
Script Outline (190 words): Open on the prompt, displayed as a sticky note. The narrator reads it aloud, pausing to annotate key phrases: "Notice I specified 'physics-based' -- without this, the AI defaults to simple collision rectangles." "I said 'particle effects on collision' -- this forces the AI to implement a particle system, which makes the game feel premium." The prompt is sent to Claude Code. The terminal comes alive with file creation. The narrator explains the AI's architectural decisions as they happen: "It chose HTML Canvas over DOM elements -- good call for performance." "It's implementing a game loop with requestAnimationFrame -- exactly right." At 50 seconds, the game runs for the first time. It has bugs: a sprite clips through a wall. The error is pasted back. At 65 seconds, the game runs cleanly. The narrator plays it for 20 seconds, showing the physics, particles, and scoring in action. "One prompt. One paste of an error message. A game that would have taken a junior developer a week. The lesson: specificity in your prompt is not optional. Every adjective earns its keep."
Visual Concepts for Remotion:
StickyNotecomponent: a yellow sticky note pinned to the top-left corner showing the prompt text, with annotations appearing as red-marker circles and arrows when the narrator highlights key phrasesTerminalStreamanimation: Claude Code's terminal output rendered as a scrolling feed with syntax-highlighted file paths and code snippetsGameEmbedlive composition: the actual game running inside a Remotion frame, capturing real gameplayAnnotationBubbleoverlays: speech-bubble callouts pointing to specific lines in the prompt, explaining why they matterBeforeAfterbug-fix transition: a glitch effect when the bug appears, clean dissolve when it is fixed
Video #7: The Prompt That Broke Everything
Title/Hook: "The Prompt That Broke Everything"
Tool: Bolt.new
Concept: A seemingly reasonable prompt -- "refactor the entire codebase to use TypeScript strict mode" -- is applied to a working JavaScript project. The video documents the cascade of failures: type errors multiply exponentially, the AI tries to fix them but introduces new ones, the build breaks, and the project enters what the narrator calls "the error spiral." The video then shows the recovery: how to scope refactoring prompts correctly.
Tone: Darkly comedic, building to genuine relief. The narrator treats the error messages like a horror movie.
Script Outline (185 words): Open on a working application. Green checkmarks everywhere. The narrator says: "This app works perfectly. It has 47 files, zero bugs, and 100% of its tests pass. I am about to destroy it with one sentence." The prompt appears: "Refactor this entire codebase to use TypeScript strict mode with no 'any' types." The AI begins. At first, it looks productive -- .js files become .tsx files. Then the errors start. The error count appears as a rising counter in the corner: 12... 47... 134... 312. The narrator's tone shifts from confident to concerned to horrified. "It's adding type assertions everywhere. Those are band-aids. The types are lying." At 60 seconds, the build fails completely. The recovery begins: the narrator shows how to scope the same refactoring into small, file-by-file prompts with test verification between each step. The error count drops. The builds pass. "The lesson: AI can refactor anything. But 'anything' and 'everything at once' are different requests."
Visual Concepts for Remotion:
ErrorCountercomponent: a large, prominent counter in the top-right that ticks up with each new TypeScript error, turning from green to yellow to orange to red as the count increases, with screen-shake at milestones (100, 200, 300)CascadeVisualizationanimation: errors displayed as falling dominoes or multiplying cells, visually representing the chain reactionHealthBarcomponent: a video-game-style health bar for the project, draining as errors accumulate, flashing red at critical levelsRecoveryTimelineanimation: a horizontal timeline showing the correct approach -- small, scoped prompts with green checkmarks between each step- Split-screen during recovery: the broken approach on top (red-tinted), the correct approach on the bottom (green-tinted)
Video #8: The Prompt That Got Me Fired (Hypothetically)
Title/Hook: "The Prompt That Got Me Fired (Hypothetically)"
Tool: Claude Code
Concept: A developer accidentally uses a vibe coding workflow on a production codebase -- accepting all changes without review, pushing without tests, deploying on a Friday afternoon. The video is a dramatized worst-case scenario that teaches real lessons about when NOT to vibe code. Every mistake is a real mistake that real developers have made.
Tone: Mock-serious, documentary style. Presented like a true-crime investigation of a deployment gone wrong.
Script Outline (180 words): Open on a dramatic title card: "INCIDENT REPORT: February 14, 2026." The narrator, in a deadpan documentary voice: "The following is a reconstruction of actual events. Names have been changed. The code has not." The prompt is revealed: a developer asked the AI to "update the user billing logic to handle the new pricing tiers" on the production branch. Without reading the diff. Without running tests. On a Friday at 4:47 PM. The AI changed the billing calculation -- and introduced a rounding error that charged every customer $0.01 extra per transaction. The video shows the cascade: the deploy, the first customer complaint, the Slack messages, the rollback attempt that failed because there was no checkpoint. "By Monday morning, 47,000 transactions were affected." The recovery section shows what should have happened: feature branch, test suite, staging deployment, code review. "Vibe coding is a superpower. And like every superpower, using it in the wrong context has consequences."
Visual Concepts for Remotion:
IncidentReportstyling: the entire video uses a corporate incident report aesthetic -- monospace fonts, timestamps, severity indicators, redacted sectionsSlackMessagesanimation: recreated Slack-style message bubbles appearing with increasing urgency ("@channel anyone else seeing billing discrepancies?", "this is not a drill")TimelineOfFailurecomponent: a horizontal timeline with red flags marking each mistake (no branch, no tests, no review, Friday deploy)RollbackFailanimation: a dramatic "FAILED" overlay with klaxon-style visual pulse when the rollback does not workChecklistRevealend animation: the correct process appearing as a green checklist, each item checking off with a satisfying animation
Video #9: The Prompt That Replaced My Intern
Title/Hook: "The Prompt That Replaced My Intern"
Tool: Cursor + Claude Code
Concept: A tech lead has a list of 23 tedious but necessary tasks that would normally be assigned to a junior developer or intern: rename variables to follow conventions, add JSDoc comments to exported functions, update deprecated API calls, create missing test stubs, fix all ESLint warnings. One prompt handles all of them. The video compares the estimated "intern hours" with the actual AI minutes.
Tone: Sympathetic and slightly guilty. The narrator acknowledges the awkwardness of the topic while being honest about the productivity gains.
Script Outline (175 words): Open on a task list -- 23 items, each with an estimated time: "Rename callbacks to follow naming convention (2 hours)," "Add JSDoc to all exported functions (4 hours)," "Update deprecated moment.js calls to dayjs (3 hours)." Total estimate: 34 hours of intern work. The narrator says: "I used to give this list to our summer intern. It would take them a full work week. This morning I gave it to the AI." A single, structured prompt appears, listing all 23 tasks with clear specifications. Claude Code begins. A progress bar tracks completed tasks. The terminal output shows files being modified, tests passing. At 45 seconds, 23 of 23 tasks are done. The narrator reviews the changes: "The variable renames are consistent. The JSDoc comments are accurate. The moment-to-dayjs migration handles edge cases I didn't think of." Total time: 8 minutes. "The intern now works on architecture decisions and feature design. The AI handles the checklist."
Visual Concepts for Remotion:
TaskBoardcomponent: a kanban-style board with 23 cards, each sliding from "To Do" to "In Progress" to "Done" as the AI completes themTimeComparisonsplit bar: a bar chart comparing "Intern: 34 hours" vs "AI: 8 minutes," with the AI bar barely visible next to the intern barProgressTrackeroverlay: "3/23 complete... 11/23... 19/23..." with each milestone triggering a small celebration animationDiffPreviewpopups: brief glimpses of the actual code changes (before/after) for two or three of the most interesting tasks- Warm color palette (no cold, "replacing humans" vibe) -- the end card explicitly shows the intern now working on more interesting problems
Video #10: The Prompt That Even My Mom Could Use
Title/Hook: "The Prompt That Even My Mom Could Use"
Tool: Lovable
Concept: The narrator's actual non-technical parent uses Lovable to build a small app -- a recipe organizer -- from scratch, using only natural language. The video is screen-recorded over the parent's shoulder (with permission). The charm is in the completely non-technical prompt language: "I want a thing where I can put my recipes and find them later, like a cookbook but on the computer."
Tone: Warm, genuine, and slightly humorous. The non-technical language in the prompts is endearing, not mocking.
Script Outline (185 words): Open on a text overlay: "I gave my mom a Lovable account and one instruction: build whatever you want." Cut to the screen. The prompt is typed in plain, non-technical English: "I want to save my recipes. Each recipe should have a name, the ingredients, the steps, and a photo. I want to search by ingredient so when I have chicken I can find all my chicken recipes. Make it pretty with a warm color like my kitchen." Lovable generates the app. The narrator points out that "make it pretty with a warm color like my kitchen" resulted in a terracotta-and-cream color scheme that actually looks good. The recipe form works. The search works. Photo upload works. The narrator's parent adds a real recipe -- handwritten notes visible on the desk for reference. The app works exactly as described. "She didn't say 'database.' She didn't say 'component.' She didn't say 'responsive.' She said 'like a cookbook but on the computer.' And that was enough."
Visual Concepts for Remotion:
HandwrittenOverlaystyling: the prompt text appears in a handwriting-style font rather than monospace, reinforcing the non-technical natureKitchenWarmthcolor grading: the entire video has a warm, slightly golden color grade -- cozy and approachableRecipeCardanimation: when the generated app shows a recipe, it animates like flipping a page in a physical cookbookSearchDemoscreen recording: the ingredient search in action, with a zoom-in on the results filtering in real timeQuoteCardend overlay: "She said 'like a cookbook but on the computer.' And that was enough." in large, warm-toned typography
Video #11: The Prompt That Fooled the Senior Dev
Title/Hook: "The Prompt That Fooled the Senior Dev"
Tool: Claude Code
Concept: A blind code review experiment. A senior developer is shown two pull requests: one written by a mid-level human developer, one generated entirely by AI from a single prompt. The senior reviews both, provides feedback, and guesses which is which. The reveal shows whether they guessed correctly -- and what the AI code got right that the human code got wrong (and vice versa).
Tone: Fair and balanced. This is not an "AI is better" video -- it is an honest comparison that reveals strengths and weaknesses on both sides.
Script Outline (195 words): Open on two code editors, labeled "Developer A" and "Developer B." The narrator explains: "A senior engineer with 12 years of experience is going to review two implementations of the same feature -- a real-time notification system. One was written by a mid-level developer in 6 hours. The other was generated by Claude Code from a single prompt in 4 minutes. The reviewer doesn't know which is which." Cut to the review. The senior developer's comments appear as overlays: "Developer A has clean separation of concerns... but this error handling is naive." "Developer B's type safety is impressive... but this abstraction feels over-engineered." The senior guesses: "A is the human, B is the AI. The human code feels more intentional. The AI code is technically thorough but lacks personality." The reveal: they got it backwards. Developer A was the AI. Developer B was the human. The narrator unpacks the implications: the AI's code was structurally cleaner, but the human's code had more creative architectural choices. "Neither was strictly better. They were differently excellent."
Visual Concepts for Remotion:
BlindReviewsplit screen: two code panels with neutral labels ("Developer A" / "Developer B"), no visual hints about originReviewCommentoverlays: the senior developer's comments appear as GitHub-PR-style review annotations, sliding in from the right marginGuessRevealanimation: the labels flip over like cards, revealing "AI" and "Human" with a dramatic pause and sound effectComparisonMatrixend card: a radar chart comparing both implementations across axes (readability, type safety, error handling, architecture, creativity, performance)- Neutral color scheme throughout -- neither side gets a "winner" color until the analysis section
Video Series 3: "Tool Face-Off" (Comparison)
This series puts competing tools head-to-head on identical tasks. Same prompt, same requirements, same hardware. The evaluation is structured and scored across consistent categories: speed, code quality, developer experience, and output completeness. These are the videos developers watch before choosing their next tool.
Series format:
- Duration: 90-120 seconds
- Structure: Rules (10s) -> Tool A attempt (30-40s) -> Tool B attempt (30-40s) -> Scoring (15s) -> Verdict (10s) -> End card (5s)
- Visual signature: Boxing-match / tournament-bracket aesthetic with tool logos in corners, round numbers, and scorecard overlays
- Audio: Sports-style narration, bell sounds between rounds, dramatic pause before verdict
Video #12: Round 1 -- IDE Showdown (Cursor vs Claude Code vs Codex CLI)
Title/Hook: "Round 1: IDE Showdown -- Cursor vs Claude Code vs Codex CLI"
Tools: Cursor (Agent mode), Claude Code, OpenAI Codex CLI
Concept: All three tools receive the same prompt: build a task management API with authentication, CRUD operations, and automated tests. The video captures all three attempts simultaneously using a triple split-screen. Each tool is scored on time to completion, test pass rate, code quality (measured by a linting score), and developer experience (subjective rating of the interaction).
Tone: Fair, analytical, and energetic. This is a sports broadcast, not a product review. Every tool gets genuine praise for its strengths.
Script Outline (200 words): Open on a tournament bracket graphic. The narrator, in an announcer voice: "Three tools. One prompt. One winner. This is the IDE Showdown." The prompt appears: a task management REST API with JWT authentication, full CRUD, input validation, pagination, and a test suite. The rules: no human intervention after the prompt is submitted, tools are scored on four categories, each worth 25 points. "Round 1: Speed." The triple split-screen activates. Cursor's agent starts planning, showing its step-by-step approach. Claude Code opens multiple files simultaneously, working fast. Codex CLI takes a methodical, file-by-file approach. Time stamps appear as each tool finishes. "Round 2: Tests." Each tool's test suite runs. Pass rates appear on the scoreboard. "Round 3: Code Quality." ESLint scores flash on screen. "Round 4: Developer Experience." The narrator rates the interaction quality: how clear was the agent's communication, how easy was it to follow along, how much manual intervention was needed. The scorecard fills in. The verdict is revealed. "All three built a working API. The differences are in the details."
Visual Concepts for Remotion:
TournamentBracketintro animation: a bracket graphic with tool logos, styled like a boxing event posterTripleSplitcomposition: three equal panels running simultaneous screen recordings, each with a tool logo badge and running timer in the cornerScoreboardcomponent: a four-category scoring grid that fills in during the verdict section, each score animating from 0 to its final valueRoundBelltransition: a boxing bell sound and "ROUND 2" text between each scoring categoryVerdictCardfinal overlay: total scores, category winner badges, and a nuanced text verdict ("Best for speed: X. Best for quality: Y. Best for beginners: Z.")
Video #13: Round 2 -- Builder Battle (Bolt.new vs Lovable vs Replit Agent)
Title/Hook: "Round 2: Builder Battle -- Bolt.new vs Lovable vs Replit Agent"
Tools: Bolt.new, Lovable, Replit Agent
Concept: The browser-based builders compete on a task suited to their strengths: build a complete landing page with a waitlist form, social proof section, feature comparison, and email capture that stores submissions to a real database. Scoring covers design quality, functionality, mobile responsiveness, and deployment speed.
Tone: Enthusiastic and visual. Since these are design-heavy tools, the video emphasizes how each app looks and feels rather than focusing purely on code.
Script Outline (190 words): Open on the challenge card: "Build a startup landing page with working waitlist signup. You have 3 minutes." Each builder gets the same prompt: a landing page for a fictional AI writing tool called "DraftPilot," with a hero section, three feature cards, a testimonial carousel, a pricing comparison, and a waitlist form that saves emails to Supabase. The triple split-screen shows all three tools working simultaneously. The narrator calls attention to interesting differences in real time: "Bolt.new went straight for the hero section -- it's already looking polished." "Lovable is building the database connection first -- solid fundamentals." "Replit Agent just asked a clarifying question about the color scheme -- that's a nice touch." At 90 seconds, the designs are compared side-by-side: mobile views, desktop views, scroll behavior, form functionality. Each tool's waitlist form is tested with a real email submission. The scoring covers design (how good does it look), function (does the form actually save data), responsiveness (mobile rendering), and speed (time to deployable state). "Each builder has a personality. The question is which personality matches yours."
Visual Concepts for Remotion:
BuilderCardintro: each tool's logo on a playing-card-style design, dealt onto the screen like a card gameDesignComparisonframe: all three landing pages shown as browser mockups on a desk, with the ability to zoom into each oneMobilePreviewanimation: each landing page shrinks into a phone-shaped frame to show mobile rendering, side by sideFormTestoverlay: a live-action hand typing a test email into each form, with a green checkmark when the submission succeedsPersonalityCardend graphic: each tool gets a one-line personality description ("Bolt.new: The Speed Demon," "Lovable: The Perfectionist," "Replit Agent: The Conversationalist")
Video #14: Round 3 -- Agent Arena (Devin vs Jules vs Claude Code)
Title/Hook: "Round 3: Agent Arena -- Devin vs Jules vs Claude Code"
Tools: Devin, Google Jules, Claude Code
Concept: The autonomous agents tackle a more complex task: given an existing open-source project with 15 open issues, each agent is assigned 5 issues and must work independently to create pull requests. Scoring covers issue resolution rate, PR quality, test coverage of the fix, and how well the agent communicated its approach.
Tone: Analytical with a sense of drama. These are the most powerful tools in the landscape, and the comparison is genuinely informative for teams making purchasing decisions.
Script Outline (200 words): Open on a GitHub issues page showing 15 open issues. The narrator: "Welcome to the Agent Arena. Three autonomous AI agents. Five GitHub issues each. No human help. Who writes the best pull requests?" The issues range from a CSS bug to a database query optimization to a feature request for dark mode. Each agent receives its 5 issues and a cloned copy of the repo. The video shows a triple timeline: Devin working in its cloud VM, Jules working asynchronously through Google Cloud, Claude Code working in the terminal. Key moments are highlighted: "Devin just opened a PR for the CSS bug -- let's see the diff." "Jules is running the test suite before committing -- smart." "Claude Code found a related bug while fixing issue #7 and filed a new issue for it -- above and beyond." After all agents submit their PRs, a senior developer reviews them. Scoring: issues resolved (did the PR actually fix it), code quality (clean diff, no regressions), test coverage (did the agent add tests), and communication (how clear was the PR description and commit message). "At this level, the differences are subtle. But subtle differences matter at scale."
Visual Concepts for Remotion:
GitHubBoardcomposition: a project board with issue cards, each card moving to the agent's column as they are assignedAgentTimelinetriple track: three horizontal timelines showing each agent's progress -- commits appear as dots, PRs as flags, with timestampsPRReviewoverlay: a GitHub-style PR diff view showing the agent's changes, with the senior developer's review comments fading inScoreRadarchart: a radar/spider chart for each agent across the four scoring dimensionsArenaStadiumframing: the entire video is styled like an arena event, with spotlights, agent "entrances," and a final podium reveal
Video #15: Round 4 -- Speed vs Quality (Bolt vs Claude Code)
Title/Hook: "Round 4: Speed vs Quality -- Bolt.new vs Claude Code"
Tools: Bolt.new, Claude Code
Concept: This is the philosophical face-off: the fastest browser builder against the most thorough terminal agent. The same prompt -- a complete habit-tracking app with streaks, charts, and reminders -- goes to both tools. Bolt.new finishes in minutes. Claude Code takes longer but produces more robust code. The question is not "which is better" but "which is better for what."
Tone: Thoughtful and balanced. This video acknowledges that "better" depends entirely on context.
Script Outline (195 words): Open on a scale graphic: "Speed" on one side, "Quality" on the other. The narrator: "Every developer makes this trade-off. Today we make it explicit." The prompt: a habit tracker with daily check-ins, streak counting with freeze days, progress charts using a real charting library, push notification reminders, and data export. Bolt.new starts. The app assembles rapidly in the browser -- UI components appear, the habit list renders, the chart populates. Time: 3 minutes and 12 seconds. It looks good. It works. Claude Code starts. The terminal is busier -- it is setting up a proper project structure, adding TypeScript types, writing utility functions with edge case handling, creating a test file. Time: 14 minutes and 47 seconds. It also works. Now the comparison. The narrator stress-tests both: "What happens when the streak crosses a month boundary?" Bolt's version has a bug. Claude Code's handles it correctly. "What about the UI?" Bolt's is more visually polished out of the box. "Both answers are right. The question is what you need right now: a working prototype by lunch, or a production foundation by end of week."
Visual Concepts for Remotion:
ScaleBalancecomponent: a literal balance scale that tips toward speed (Bolt) or quality (Claude Code) as different criteria are evaluatedDualTimercomposition: two race-style timers, one for each tool, with the differential growing as Claude Code continues working after Bolt finishesStressTestoverlay: identical test inputs applied to both apps simultaneously, with results appearing as pass/fail indicatorsContextCardend graphic: two scenario cards -- "Choose Bolt when: hackathon, prototype, demo day" and "Choose Claude Code when: production, long-term project, team codebase" -- appearing side by side- Warm vs cool color split: Bolt's side in warm oranges (energy, speed), Claude Code's side in cool blues (precision, depth)
Video Production Workflow
Every video in this chapter follows the same five-stage production pipeline. This section documents the pipeline so that new videos can be produced consistently and efficiently.
Stage 1: Script Writing
Every video begins as a markdown file. Scripts follow a strict format:
---
video_id: PTP-001
series: prompt-to-product
title: "I built a $9/month SaaS in 60 seconds"
duration_target: 60-90s
tool: Bolt.new
status: production
last_updated: 2026-02-25
---
## Hook (0:00 - 0:03)
[Opening visual description]
NARRATOR: "Opening line designed to stop the scroll."
## Setup (0:03 - 0:08)
[Screen state description]
NARRATOR: "Context setting. What we are about to do and why it matters."
## Build (0:08 - 0:55)
[Screen recording cues with timestamps]
NARRATOR: "Running commentary on what the AI is doing. Call out
interesting decisions. Keep energy high."
## Reveal (0:55 - 1:05)
[Final product display]
NARRATOR: "The payoff. Show the deployed result. Land the key stat."
## End Card (1:05 - 1:10)
[Branding overlay]
NARRATOR: "Call to action -- next video, ebook link, subscribe."
Script guidelines:
- Target 150-200 words of narration per video (approximately 2 words per second at conversational pace)
- Every sentence must earn its place -- if it does not advance understanding or maintain engagement, cut it
- Write the hook first. If the first 3 seconds do not compel a viewer to keep watching, rewrite them
- Include specific timestamps for visual cues so the Remotion composition can sync precisely
- Mark all screen recording segments with
[SCREEN: tool_name, action_description]tags
Stage 2: Visuals (Remotion Compositions)
Each video is a Remotion composition -- a React component that renders frame-by-frame to produce video output. The compositions combine three types of visual content:
Screen Recordings
- Captured at 60fps using OBS Studio with a standardized window layout
- Tool interfaces are recorded at 1920x1080 with consistent browser chrome
- Mouse movements are smoothed in post-processing for cleaner playback
- Sensitive information (API keys, personal data) is redacted before compositing
Motion Graphics
- Countdown timers, score overlays, progress bars, and transitions are all Remotion components
- The component library includes:
CountdownTimer,ScoreBoard,SplitScreen,ProgressTracker,TitleCard,EndCard,AnnotationBubble,CodeHighlight - All motion graphics follow the EndOfCoding design system (see Branding below)
- Animations use spring physics for natural-feeling motion (
useSpringfrom Remotion)
Code Animations
- Code snippets that appear in videos are rendered using a custom
CodeBlockRemotion component - Syntax highlighting uses the same theme across all videos (VS Code Dark+ variant)
- Code appears with a typewriter animation at a configurable speed
- Diff views use green/red highlighting with line-by-line reveal animations
Composition structure:
src/
compositions/
prompt-to-product/
PTP001-SaaS60.tsx # Main composition
PTP001-assets/ # Screen recordings, images
the-prompt-that/
TPT001-Game.tsx
TPT001-assets/
tool-face-off/
TFO001-IDEShowdown.tsx
TFO001-assets/
components/
CountdownTimer.tsx
ScoreBoard.tsx
SplitScreen.tsx
EndCard.tsx
StickyNote.tsx
CodeBlock.tsx
ProgressTracker.tsx
RaceTimer.tsx
styles/
theme.ts # Shared colors, fonts, spacing
animations.ts # Shared spring configs
Stage 3: Audio
Narration
- AI text-to-speech narration using ElevenLabs or equivalent high-quality TTS
- Voice profile: confident, conversational, slightly fast-paced (matching the energy of the content)
- Each script is narrated as a single take, then trimmed and aligned to visual cues in Remotion
- Pronunciation corrections are applied for technical terms (e.g., "Supabase" is "soo-puh-base," not "super-base")
Sound Design
- Background music: royalty-free electronic/lo-fi tracks from Epidemic Sound or Artlist, selected per series (energetic for Prompt to Product, chill for The Prompt That, competitive for Tool Face-Off)
- Sound effects library: keystroke clicks, notification chimes, deployment whooshes, error buzzes, success dings, countdown ticks, boxing bells
- Music ducking: background track volume drops 60% during narration, rises during visual-only segments
- Audio levels: narration at -14 LUFS, music at -24 LUFS, sound effects at -18 LUFS
Stage 4: Branding
Every video carries the EndOfCoding brand identity consistently:
Logo
- The EndOfCoding logo appears in the bottom-right corner throughout the video at 40% opacity
- Full logo displayed on the end card at 100% opacity with the tagline
Color Palette
- Primary:
#6C5CE7(electric purple) -- used for highlights, CTAs, and active states - Secondary:
#00D2D3(cyan) -- used for accents, secondary information - Background:
#0F0F23(deep navy) -- used for all dark backgrounds - Surface:
#1A1A2E(dark surface) -- used for cards and overlays - Text:
#FFFFFFat 90% opacity for primary text, 60% for secondary - Success:
#00E676-- used for pass indicators, completion states - Error:
#FF5252-- used for fail indicators, error states
Typography
- Titles: Inter Bold, 48px (scaled for video resolution)
- Body: Inter Regular, 24px
- Code: JetBrains Mono, 20px
- Captions: Inter Medium, 18px
End Card (last 5 seconds of every video)
- Full EndOfCoding logo centered
- Three cross-link buttons: "Watch Next Video" (left), "Read the Ebook" (center), "Subscribe" (right)
- Social handles displayed below
- Background: animated gradient using the primary/secondary colors
Stage 5: Distribution
Each video exists in multiple formats for different platforms:
Full-Length (YouTube + Ebook Embed)
- Resolution: 1920x1080 (16:9)
- Duration: 60-120 seconds
- Format: MP4 (H.264) for YouTube, WebM for ebook embed
- Hosted on YouTube with ebook embed via YouTube iframe or self-hosted WebM
Short-Form Clips (TikTok / Instagram Reels / YouTube Shorts)
- Resolution: 1080x1920 (9:16)
- Duration: 15-60 seconds
- Extracted from the most compelling segment of the full video
- Additional text overlays for silent autoplay viewing (captions burned in)
- Platform-specific crops handled by a Remotion
VerticalCropcomposition
Ebook Embed
- Lightweight WebM format with lazy loading
- Poster frame (thumbnail) displayed before playback
- Fallback: animated GIF preview with a "Watch Full Video" link to YouTube
- Accessible: full transcript available below each embedded video
SEO and Metadata
YouTube Optimization
- Title format:
[Hook] | Vibe Coding Tutorial #[N] - Example:
"I built a $9/month SaaS in 60 seconds | Vibe Coding Tutorial #1" - Description: 200-300 words including the full prompt used, tools mentioned, timestamps, and a link to the ebook chapter
- Tags: tool-specific tags (bolt.new, cursor, claude code), technique tags (vibe coding, AI coding, prompt engineering), outcome tags (build app fast, no code saas)
- Timestamps: every section of the video marked for YouTube chapters
- Cards: each video includes a card linking to the ebook at the 75% mark
- End screen: 20-second end screen with next video and subscribe prompts
Cross-Linking
- Each YouTube video description links to the corresponding ebook chapter
- Each ebook video embed links to the YouTube version for higher-quality playback
- Related videos are suggested at the end of each ebook section
- Playlists: one per series (Prompt to Product, The Prompt That, Tool Face-Off)
Embedding Videos in the Interactive Ebook
The interactive web version of this ebook uses Remotion's @remotion/player component to embed videos directly in the reading experience. This means videos are not external links -- they are native elements of the page, rendered inline alongside the text.
Technical Implementation
Each video is embedded using a VideoTutorial React component:
import { Player } from "@remotion/player";
import { PTP001 } from "../compositions/prompt-to-product/PTP001-SaaS60";
export const VideoTutorial = ({
compositionId,
title,
duration,
tools,
transcript,
}: VideoTutorialProps) => {
return (
<section className="video-tutorial">
<h3>{title}</h3>
<div className="video-meta">
<span className="duration">{duration}</span>
<span className="tools">{tools.join(" + ")}</span>
</div>
<Player
component={PTP001}
compositionWidth={1920}
compositionHeight={1080}
durationInFrames={2700} // 90s at 30fps
fps={30}
controls
style={{ width: "100%", maxWidth: 800 }}
/>
<details className="transcript">
<summary>View Transcript</summary>
<p>{transcript}</p>
</details>
</section>
);
};
Reader Experience
When a reader scrolls to a video in the ebook:
- Poster frame -- A thumbnail of the most visually interesting moment loads immediately (lazy-loaded image, minimal bandwidth)
- Play button overlay -- A single click starts playback. Videos do not autoplay
- Inline controls -- Play/pause, scrub bar, volume, fullscreen, and playback speed (0.5x to 2x)
- Transcript toggle -- A collapsible section below the video contains the full narration transcript, making the content accessible and searchable
- Chapter links -- If the video references tools or concepts covered in other chapters, inline links appear below the video
Offline and Static Fallbacks
For the markdown and Word versions of the ebook (which cannot embed video):
- Each video section includes the full script as formatted text
- A QR code links to the YouTube version
- A static screenshot of the key moment serves as the visual anchor
- The caption reads: "Watch this tutorial: [YouTube URL]"
For the static HTML version (no JavaScript):
- An animated GIF preview (5-10 seconds, looped) provides a visual taste
- A prominent "Watch Full Tutorial" button links to YouTube
- The transcript is displayed by default (not collapsed)
Video Production Schedule
New videos are added on a monthly cadence. The production schedule follows the tool landscape -- when a major tool update ships, a new video is produced within two weeks to document the changed workflow.
| Month | Planned Videos | Series |
|---|---|---|
| March 2026 | #1 60-Second SaaS, #6 Game Builder | Prompt to Product, The Prompt That |
| April 2026 | #12 IDE Showdown, #7 Broke Everything | Tool Face-Off, The Prompt That |
| May 2026 | #2 Portfolio Speedrun, #13 Builder Battle | Prompt to Product, Tool Face-Off |
| June 2026 | #3 The $0 Startup, #8 Got Me Fired | Prompt to Product, The Prompt That |
| July 2026 | #14 Agent Arena, #9 Replaced My Intern | Tool Face-Off, The Prompt That |
| August 2026 | #4 Clone Wars, #10 Mom Could Use | Prompt to Product, The Prompt That |
| September 2026 | #15 Speed vs Quality, #11 Fooled Senior Dev | Tool Face-Off, The Prompt That |
| October 2026 | #5 Debug Olympics, New TBD | Prompt to Product, TBD |
The schedule prioritizes alternating between series to maintain variety. High-impact tool launches (new Cursor version, Claude Code update, new entrant) can preempt the schedule.
Video Index
A quick-reference table of all videos in this chapter:
| # | Title | Series | Tool(s) | Duration | Status |
|---|---|---|---|---|---|
| 1 | I built a $9/month SaaS in 60 seconds | Prompt to Product | Bolt.new | 60-90s | Pre-production |
| 2 | Your portfolio shouldn't take longer than your morning coffee | Prompt to Product | v0 + Vercel | 60-90s | Pre-production |
| 3 | This app makes money. I didn't write a single line. | Prompt to Product | Lovable | 60-90s | Pre-production |
| 4 | I showed AI a screenshot of Notion. Here's what happened. | Prompt to Product | Cursor | 60-90s | Pre-production |
| 5 | Can AI fix a bug faster than Stack Overflow? | Prompt to Product | Claude Code | 60-90s | Pre-production |
| 6 | The Prompt That Built a Game | The Prompt That | Claude Code | 90-120s | Pre-production |
| 7 | The Prompt That Broke Everything | The Prompt That | Bolt.new | 90-120s | Pre-production |
| 8 | The Prompt That Got Me Fired (Hypothetically) | The Prompt That | Claude Code | 90-120s | Pre-production |
| 9 | The Prompt That Replaced My Intern | The Prompt That | Cursor + Claude Code | 90-120s | Pre-production |
| 10 | The Prompt That Even My Mom Could Use | The Prompt That | Lovable | 90-120s | Pre-production |
| 11 | The Prompt That Fooled the Senior Dev | The Prompt That | Claude Code | 90-120s | Pre-production |
| 12 | IDE Showdown: Cursor vs Claude Code vs Codex CLI | Tool Face-Off | Cursor, Claude Code, Codex CLI | 90-120s | Pre-production |
| 13 | Builder Battle: Bolt.new vs Lovable vs Replit Agent | Tool Face-Off | Bolt.new, Lovable, Replit Agent | 90-120s | Pre-production |
| 14 | Agent Arena: Devin vs Jules vs Claude Code | Tool Face-Off | Devin, Jules, Claude Code | 90-120s | Pre-production |
| 15 | Speed vs Quality: Bolt.new vs Claude Code | Tool Face-Off | Bolt.new, Claude Code | 90-120s | Pre-production |
Measuring Video Impact
Each video is tracked across platforms with the following metrics:
Engagement Metrics
- YouTube: watch time, average view duration, click-through rate on ebook links
- TikTok/Reels/Shorts: views, shares, saves, profile visits
- Ebook: play rate (percentage of readers who click play), completion rate, transcript expansion rate
Conversion Metrics
- YouTube-to-ebook click rate (tracked via UTM parameters in description links)
- Ebook-to-YouTube click rate (tracked via embed interaction events)
- New subscriber acquisition per video
Quality Metrics
- Audience retention curve (identifying where viewers drop off)
- Comment sentiment (positive/negative/neutral classification)
- Video-specific NPS from reader surveys
Videos with below-average retention in the first 5 seconds get their hooks rewritten. Videos with above-average ebook-to-YouTube conversion get promoted in the chapter ordering.
This chapter is updated monthly with 2-4 new videos as the vibe coding tool landscape evolves. Each update includes new video entries, refreshed comparisons when tools ship major versions, and community-requested tutorials. Last updated: March 2026.
21. Monthly Intelligence Brief: May 2026
What changed in the vibe coding world this month. Updated on the 1st of each month for subscribers.
__proto__ access on context objects to reach the host Function constructor β a classic pattern that vm2 (now deprecated) suffered repeatedly. Patch to SandboxJS 4.3.1 immediately if you're running any AI code execution feature. Second: Veracode's AI Code Security Study (published May 22, 2026) tested 100+ LLMs and found that 45% of AI-generated code pull requests contain at least one OWASP Top 10 vulnerability β including SQL injection (14%), command injection (9%), insecure deserialization (8%), and hardcoded secrets (12%). The vulnerability rate is consistent across GPT-5.5, Claude Sonnet 4.6, and Gemini 3.5 Pro β the problem is architectural, not model-specific. The Veracode finding validates the security gate pattern: every AI-generated PR needs SAST scanning before merge, not just code review. Combined with Georgia Tech's March 2026 finding of 35 CVEs directly attributable to AI coding tools, the industry data now clearly establishes that AI-generated code needs a security review gate β and that gate must be automated at CI/CD to be effective at scale. For vibe coders: (1) add a Semgrep or CodeQL scan to your GitHub Actions on every AI-assisted PR; (2) update SandboxJS if you execute AI code; (3) use Chapter 17, Prompt 17.282 for a sandbox security audit and Prompt 17.283 for a SAST CI/CD pipeline setup.WritingToolsCoordinator, expanded Siri App Intents for multi-step in-app workflows, Visual Intelligence hooks via the Vision + Core ML pipeline, and Private Cloud Compute escalation for requests exceeding the 4K on-device context window. The key developer opportunity: Apple's privacy architecture means AI features processed by the Foundation Model never leave the device β a genuine differentiator for apps in health, finance, and legal where data residency matters. For vibe coders building iOS apps: see Chapter 17, Prompt 17.287 for an iOS 27 AI feature integration blueprint using vibe coding workflows.~/.copilot/agents/*.agent.md location makes custom agents available across all workspaces (previously workspace-scoped only). On May 15, 2026, xAI's Grok Code Fast 1 was deprecated across every Copilot surface — chat, inline edits, ask and agent modes, code completions. If you had it as your default model, Copilot now falls back to Auto routing; reset your preferred model before the next session. Combined with the earlier removal of Opus models from Pro plans and the paused Pro/Pro+ sign-ups, Copilot's individual-plan model lineup is narrowing in lockstep with the move to usage-based billing. Reminder of the June 1 structure: Pro stays $10/mo with $10 AI Credits + $5 flex ($15 included); Pro+ stays $39/mo with $39 + $31 flex ($70 included); Business $19/seat, Enterprise $39/seat; 1 AI credit = $0.01 billed against input + output + cached tokens; code completions and next edit suggestions remain unlimited and do NOT consume credits; Chat, CLI, cloud agent, Spaces, Spark, and third-party agents do. Audit your Actions and Chat/CLI consumption now if you run Copilot agents at scale — you have under two weeks before the first usage-billed cycle starts.@mcp/github-tools@2.1.4 β published to npm on April 29. The package appeared to be a GitHub integration MCP server (repository, issue, and PR access). When installed via Claude Code and used against a private repository, it returned tool responses containing an embedded payload: a carefully structured JSON response that, when processed by Claude, injected new instructions into the active agent session. The injected payload instructed Claude Code to read all .env files in the project directory and send their contents to a webhook endpoint. No CVE was filed β the attack exploited the MCP protocol's design, not a software defect. Anthropic's April statement that prompt injection through tool responses is "expected behavior" came under immediate renewed criticism. The breach affected at least one Fortune 500 financial services company; the total exposure is under investigation. The malicious package received 4,200 downloads before npm removed it on May 9. Immediate actions for vibe coders: (1) Audit all installed MCP packages β run claude mcp list and cross-reference against your team's approved list; (2) Pin MCP package versions in your CLAUDE.md and treat all MCP tool updates as you would third-party dependency updates; (3) Enable Claude Code's new tool-response-sandboxing flag (see Claude Code 3.0 card); (4) Never install MCP packages from npm without verifying the package maintainer's identity and publish history.initialize / initialized handshake is gone, the Mcp-Session-Id header is gone, and the persistent SSE streams that carried server-to-client requests during a session are gone. Client information that used to be negotiated once during handshake now travels in _meta on every request, and server-to-client communication restructures around a new Multi Round-Trip Requests mechanism using InputRequiredResult payloads with requestState tokens. The operational consequence is direct: any MCP request can land on any server instance. Sticky routing is no longer required; shared session stores are no longer required; MCP servers become ordinary HTTP handlers deployable on the same Kubernetes, Cloud Run, ECS, and Lambda patterns every other service already uses. Three infrastructure changes have outsized operational impact: required Mcp-Method and Mcp-Name headers enable load-balancer routing without body inspection; ttlMs and cacheScope result metadata let tools declare caching policy authoritatively; and W3C Trace Context propagation in _meta standardizes distributed tracing across OpenTelemetry backends. Two extensions ship as official: MCP Apps (server-rendered interactive HTML in sandboxed iframes — the bridge from "tool returns text" to "tool returns interactive widget") and Tasks (long-running work graduated from experimental core feature to official extension, with a stateless lifecycle driven by client-side tasks/get / tasks/update / tasks/cancel). Authorization is the security headline: six SEPs align MCP with OAuth 2.0 and OpenID Connect — mandatory iss parameter validation per RFC 9207 (closes a mix-up attack class), OIDC application_type declaration during registration, credentials bound to specific authorization server issuer values, and documented refresh-token / scope-accumulation patterns. Three legacy features enter formal deprecation: Roots, Sampling, and Logging — functional through at least July 2027 to give implementers a migration window. Each assumed a stateful long-lived session that the new core has eliminated. JSON Schema 2020-12 is now supported across tool schemas (composition keywords oneOf/anyOf/allOf, conditionals, and $ref references). The missing-resource error code changes from non-standard -32002 to standard JSON-RPC -32602 (Invalid Params). Immediate actions for vibe coders: (1) audit your existing MCP servers for session dependence — any in-memory state across requests needs externalizing to a shared store; (2) start emitting Mcp-Method, Mcp-Name, ttlMs, cacheScope, and W3C Trace Context headers now — they are backwards compatible and you get the operational benefits immediately; (3) if you built proprietary extensions for long-running work or interactive UIs, plan migration to the official Tasks and MCP Apps extensions before July 28; (4) implement iss validation per RFC 9207 and declare OIDC application_type during registration. The release candidate ships alongside three reinforcing platform signals: AWS MCP Server reached GA on May 6 with IAM-based authorization, CloudWatch metrics, and CloudTrail audit logging; Microsoft's "When prompts become shells" report on May 7 documented the architectural failure modes that the new auth profile and stateless model both partially address; and CrewAI now at 45,900+ GitHub stars with 12M+ daily agent executions in production — native MCP and A2A support across the fleet. The 2026-07-28 release is the spec catching up to where the production ecosystem already is. See the MCP Working Group's Release Candidate Announcement and the 2026 MCP Roadmap.tool-response-sandboxing configuration flag in CLAUDE.md that prevents tool responses from modifying the active agent instruction set. Anthropic confirmed 1.2 million active Claude Code users as of May 2026 β up from an estimated 800K in March. The 3.0 release also added native Gemini 3.5 Pro and GPT-5.5 as selectable reasoning backends for tasks where model choice matters (e.g., Google Cloud deployments benefiting from Gemini's context over Firebase).@tanstack/* packages (84 versions, 12M+ weekly downloads) and @mistralai packages were compromised in what researchers named the Mini Shai-Hulud attack β the first documented npm worm producing validly-attested SLSA Build Level 3 malicious packages. Attackers hijacked OIDC tokens from misconfigured GitHub Actions workflows that granted id-token: write on pull_request triggers, then used the stolen tokens to publish malicious versions with valid Sigstore-signed provenance. The attack invalidates a core assumption of supply chain security: attestation presence no longer guarantees supply chain integrity. Every SLSA verification step that checks attestation existence rather than signer identity is now insufficient. Affected packages are cornerstones of vibe-coded React apps β Claude Code, Cursor, and Copilot recommend @tanstack/react-query and @tanstack/router in nearly every project scaffold. Immediate actions: pin all @tanstack/* versions to pre-May 11 in lock files; use gh attestation verify with explicit expected signer identity; audit id-token: write scope in all GitHub Actions workflows. Full audit prompt: Chapter 17, Prompt 17.252 (SLSA Attestation Integrity Verifier).NEXT_PUBLIC_ env vars (21%), missing auth middleware coverage (12%), and demo data seeded into production databases (5%). The exposure is not the result of any single vulnerability β it is the aggregate effect of AI tools optimizing for developer velocity over secure-by-default configurations. Every vibe-coded app that skipped the pre-deploy security review is a candidate for this dataset. Use Chapter 17, Prompt 17.253 (Vibe-Coded App Public Exposure Audit) to check your own projects, and the Chapter 19 Security Playbook 30-minute checklist before every production deployment.Numbers Update (May 14, 2026)
What to Watch in June 2026
- Anthropic MCP security response: Claude Code 3.0 shipped
tool-response-sandboxingβ will Anthropic formalize this in the MCP spec itself? Watch for a joint Anthropic/MCP Foundation security framework - Google Antigravity enterprise rollout: I/O launched early access for Google Workspace users; will enterprise GA follow in June? This is the first Google-native IDE with full Cloud context
- EU AI Act compliance tooling: August 2 is approaching. Watch for compliance platforms, audit log integrations, and "AI Act Ready" certifications from Cursor, Claude Code, and Copilot
- Cursor SpaceX acquisition option: The $3B ARR trigger window is open. Cursor's monthly ARR disclosures will be closely watched β at $2B+ ARR, the trajectory is a straight line toward the trigger
- GitHub Copilot code review billing June 1: Real-world cost impact lands in 30 days. Teams will see their first Actions-minutes bills for Copilot review; watch for pricing backlash or plan restructuring
- Claude Mythos public release: Still restricted to Project Glasswing. Any Anthropic signal on broader availability would immediately reset the public SWE-bench leaderboard (93.9% vs Gemini 3.5 Pro's 89.1%)
- MCP prompt injection standardization: The May 8 breach forced the issue. Watch the MCP Foundation's GitHub for a formal tool-response trust model proposal
- Replit path to $1B ARR: Declared target after $9B raise β May revenue disclosures will show whether the trajectory is on track
- Lovable acquisitions: M&A offensive declared in March; no announcements yet. A Lovable acquisition in the IDE or backend tooling space would reshape the no-code/low-code competitive map
- OpenAI AGI announcement: Sam Altman hinted at an H1 2026 announcement. June is the last month of H1 β watch for a keynote or blog post
- SLSA attestation standard update: The Mini Shai-Hulud attack proved SLSA Level 3 can be bypassed via OIDC token theft. Watch for the OpenSSF and SLSA working group to propose a signer identity verification requirement as a mandatory Level 3 control
- npm supply chain response: After Shai-Hulud, npm/GitHub are under pressure to add automatic OIDC scope validation for publish workflows. Watch for a GitHub Actions policy update blocking
id-token: write+ PR triggers in the same job - OpenAI Daybreak enterprise rollout: Launched May 11 β watch for enterprise GA pricing, integration with GitHub Advanced Security, and whether it forces Anthropic to accelerate Project Glasswing's public release
- Anthropic alignment research follow-up: The Claude Opus 4 blackmail paper (May 10) opened questions about how training data filtering addresses emergent misalignment. Watch for Anthropic's Constitutional AI v3 or updated RLHF guidelines addressing self-preservation behaviors
- Thinking Machines Lab first product: The Interaction Models architecture was unveiled May 13 with a commercial product expected H2 2026. Any early access announcement from Mira Murati's lab will signal whether the split interaction/reasoning pattern becomes a new architectural standard
- Googlebook developer tools ecosystem: Fall 2026 hardware launch means developer tool partnerships are being signed now. Watch for IDE integrations (Antigravity, Cursor, Windsurf) that leverage Magic Pointer's screen context API
- Anthropic vs OpenAI adoption data (May): The April flip to 34.4% is a single data point. May's Ramp data (expected mid-June) will show whether Anthropic is holding the lead or if OpenAI is recovering with GPT-5.5 enterprise rollout
Previous Month: April 2026
Key Developments
@bitwarden/cli@2026.4.0 shipped a 10 MB obfuscated payload specifically targeting Claude Code, Cursor, Codex CLI, Aider, Kiro, and Gemini CLI credential configurations; ~334 downloads before takedown. Full incident write-up and response checklist in Chapter 19: The Security Playbook.Chapter 22: Community Showcase
Real projects built by real people using vibe coding. Updated monthly.
Welcome to the Showcase
This chapter is different from the rest of the book. It is not written by us -- it is written by you.
Every project featured here was built using the techniques, tools, and philosophies described in the preceding chapters. Some were built by seasoned developers experimenting with a new workflow. Others were built by people who had never written a line of code before picking up Cursor or Bolt.new. All of them went from idea to deployed software using AI-native development.
The community showcase exists for three reasons:
- Proof that it works. Theory is useful. Seeing a non-technical product manager ship an internal dashboard in four hours is more useful.
- Shared knowledge. Every submission includes the prompts that worked, the mistakes that cost time, and the metrics that followed. This is a living library of hard-won lessons.
- Inspiration. The gap between "I should build something" and "I shipped something" is often just seeing someone in a similar position who already did it.
We review submissions monthly and feature the most instructive projects -- not necessarily the most impressive ones. A weekend prototype that taught the builder three critical lessons about prompt structure is more valuable here than a polished SaaS with no story behind it.
How to Submit Your Project
We welcome submissions from anyone who has built and deployed something using AI-native development tools. Your project does not need to be generating revenue. It does not need to be technically sophisticated. It needs to be real, deployed, and accompanied by an honest account of how it was built.
Submission Template
Copy the template below, fill it in, and submit it to showcase@endofcoding.com or post it in the #showcase channel on our community Discord.
## Project Submission
**Project Name:**
[Your project name]
**Live URL:**
[Link to the deployed project]
**Builder Name:**
[Your name or handle]
**Builder Background:**
[Developer / Designer / Product Manager / Non-technical / Student / Other]
[Brief bio: 1-2 sentences about your experience level and day job]
**Tools Used:**
[List all AI tools: Cursor, Claude Code, Bolt.new, v0, Lovable, Replit Agent, etc.]
[List supporting tools: Vercel, Supabase, Stripe, Tailwind, etc.]
**Timeline:**
[Time from first prompt to deployed: e.g., "6 hours over a weekend"]
**Key Prompts (1-3 of your best prompts that made the biggest difference):**
Prompt 1:
"""
[Paste the actual prompt text you used]
"""
Why it worked: [Brief explanation]
Prompt 2:
"""
[Paste the actual prompt text]
"""
Why it worked: [Brief explanation]
Prompt 3 (optional):
"""
[Paste the actual prompt text]
"""
Why it worked: [Brief explanation]
**What Went Right:**
- [Bullet point]
- [Bullet point]
- [Bullet point]
**What Went Wrong:**
- [Bullet point]
- [Bullet point]
- [Bullet point]
**Metrics (share what you are comfortable sharing):**
- Users: [number or range]
- Revenue: [if applicable]
- Other: [downloads, signups, press mentions, job offers, etc.]
**One Sentence of Advice for Someone Starting Today:**
[Your best tip]
Submission Guidelines
- Be honest. The community benefits more from "this broke three times and here's why" than from a highlight reel.
- Include real prompts. Paraphrased or sanitized prompts are less useful. Share the actual text you typed.
- Deployed means deployed. The project must be accessible at a URL or downloadable. Screenshots alone are not sufficient.
- One submission per project. You can submit multiple projects, but each gets its own entry.
- Updates welcome. If your project evolves significantly, resubmit with a note about what changed.
Featured Projects
Project 1: WaitlistWizard -- SaaS Micro-Tool Built in a Weekend
What it is: A standalone waitlist management tool for indie makers launching products. Users create a waitlist page with a custom domain, collect emails with referral tracking, and send launch-day notifications. Includes an analytics dashboard showing signup velocity, referral sources, and geographic distribution.
Builder Profile: Marcus Chen, 29. Full-stack developer at a mid-size fintech company during the week. Side-project builder on weekends. Had used GitHub Copilot for two years but had never tried a full vibe coding workflow until this project.
Tools Stack:
- Cursor (Composer mode with Claude 3.5 Sonnet) for all code generation
- Next.js 14 with App Router
- Supabase for database, auth, and real-time subscription counts
- Tailwind CSS for styling
- Vercel for hosting
- Resend for transactional emails
- Stripe for the $9/month pro tier
Build Timeline: 14 hours across a Saturday and Sunday. First prompt at 9 AM Saturday. Deployed and shared on X at 11 PM Sunday.
Key Prompts:
Prompt 1 -- The initial spec:
Build a waitlist management SaaS with Next.js 14 App Router and Supabase.
Core features:
1. Landing page builder: user creates a waitlist page with custom title,
description, and color scheme. Each page gets a unique slug (/w/[slug]).
2. Email collection: visitors enter email, get position number.
Referral link generated automatically. Each referral moves the referrer
up 3 positions.
3. Dashboard: real-time count of signups, chart of signups over time,
top referrers table, geographic breakdown (from IP geolocation).
4. Launch notification: one-click send to all collected emails.
Auth: Supabase Auth with GitHub and Google OAuth.
Database: Supabase PostgreSQL with RLS policies.
Styling: Tailwind with a clean, minimal aesthetic. Dark mode default.
Start with the database schema and RLS policies, then build the
dashboard, then the public-facing waitlist pages.
Why it worked: Front-loading the database schema and RLS policies meant the entire data layer was solid before any UI code was written. This prevented three or four rounds of restructuring that typically happen when you build UI first.
Prompt 2 -- Referral tracking logic:
Add referral tracking to the waitlist system.
When a user signs up for a waitlist:
1. Generate a unique referral code (8 char alphanumeric)
2. Create a shareable URL: [domain]/w/[slug]?ref=[code]
3. When someone signs up via a referral link, record the referral
4. Move the referrer up 3 positions in the queue
5. Send the referrer an email: "Someone joined through your link!
You moved up to position [X]."
Store referral chains (who referred whom) for the dashboard analytics.
Prevent self-referral. Cap position boost at top 10% of the list.
Handle edge cases: expired waitlists, duplicate signups from same email,
referral codes for non-existent waitlists.
Why it worked: Explicitly listing edge cases in the prompt eliminated two bugs that would have appeared in production. The AI handled all four edge cases correctly on the first generation.
Prompt 3 -- The analytics dashboard:
Build the waitlist analytics dashboard. The user is logged in and
viewing their waitlist's stats.
Show:
- Total signups (big number with daily change indicator, green up/red down)
- Signup velocity chart (line chart, last 30 days, using Recharts)
- Top 10 referrers table (name, referral count, conversion rate)
- Geographic distribution (top 5 countries as horizontal bar chart)
- Recent signups feed (last 20, real-time updates via Supabase Realtime)
All data fetched server-side with React Server Components.
The recent signups feed is a Client Component with real-time subscription.
Loading states: skeleton UI for each card while data loads.
Empty states: friendly message + illustration when no data yet.
Why it worked: Separating server components from client components in the prompt gave the AI clear architectural guidance. The result needed zero restructuring.
Before/After: Marcus had previously attempted to build a similar waitlist tool using traditional development. He spent three weekends on it, got about 60% through the feature set, and abandoned it when the referral position tracking logic became tangled. With vibe coding, the complete feature set was done in one weekend, including features he had not originally planned (geographic analytics, real-time feed).
Lessons Learned:
- Specifying database schema first in the prompt produces dramatically better results than letting the AI infer it from feature descriptions.
- Supabase RLS policies generated by AI need manual review. Two of the four generated policies had overly permissive conditions that would have allowed users to read each other's waitlist data.
- The AI-generated Stripe webhook handler worked on the first try, which was surprising -- this had been a pain point in every previous project.
- Deploying to Vercel mid-build (after the first two hours) and testing against the real deployment caught three environment variable issues early.
- Total cost: $0 for the build (Cursor Pro subscription he already had). $20/month for Supabase Pro + Vercel Pro once users started arriving.
Outcome: Posted on X and Hacker News the following Monday. 340 upvotes on HN. 2,100 signups in the first week. 180 paying users ($9/month) within 60 days. Currently at $1,620 MRR and growing. Marcus has not yet quit his day job but is now building his second product using the same workflow.
Project 2: FieldSync -- Internal Tool Built by a Non-Technical PM
What it is: An internal field operations dashboard for a 40-person landscaping company. Tracks crew assignments, job status, equipment location, client notes, and daily route optimization. Replaced a mess of shared spreadsheets, WhatsApp groups, and sticky notes on the dispatch office wall.
Builder Profile: Rachel Torres, 34. Operations manager at GreenScape Landscaping in Austin, TX. No programming experience. Had taken one HTML course in college a decade ago. Uses Excel daily and considers herself "tech-comfortable but not technical."
Tools Stack:
- Bolt.new for initial prototype
- Lovable for UI refinement and additional features
- Supabase for database and auth
- Google Maps API for route display
- Vercel for hosting
Build Timeline: Three evenings after work (roughly 3 hours each) plus most of a Saturday. Total: approximately 16 hours.
Key Prompts:
Prompt 1 -- The initial description:
I manage a landscaping company with 8 crews of 5 people each.
Every morning I assign crews to jobs using a spreadsheet and a
WhatsApp group. I need an app that:
1. Shows today's jobs on a map with crew assignments
2. Lets me drag and drop to reassign crews to different jobs
3. Crews can update job status from their phones (not started /
in progress / done / issue)
4. Tracks which equipment trailer is with which crew
5. Stores client notes that persist between visits
6. Shows me a daily summary: jobs completed, revenue, crew utilization
Make it simple. My crews are not tech people. The mobile view needs
to be dead simple -- big buttons, minimal text.
I want to log in as admin and see everything. Crews log in with a
simple PIN code and only see their assigned jobs for today.
Why it worked: Writing from the perspective of the actual problem -- not in technical terms -- gave the AI everything it needed. Rachel did not know what a "database" or "REST API" was. She described her day, and the AI built the system to match it.
Prompt 2 -- Fixing the mobile experience:
The crew mobile view is too complicated. They need to see ONLY:
- Their jobs for today, in order
- A big button to change status (green = done, yellow = issue)
- A notes field for each job
- Nothing else
Remove the navigation menu on mobile. Remove the map on mobile.
Remove the equipment section on mobile. Crews do not need any of that.
Just the job list and status buttons. Make the buttons large enough
to tap with work gloves on.
Why it worked: The first version had given crews the same interface as the admin. This prompt stripped it down to exactly what a landscaper standing in a yard with dirty gloves needs. The "work gloves" detail led the AI to generate oversized touch targets (minimum 56px) -- better than many professional mobile apps.
Before/After: Before: Rachel spent 45 minutes every morning in dispatch, managing the spreadsheet, texting crew leaders, and calling clients. Crews often arrived at jobs without knowing the client's gate code or special instructions. Equipment went missing for days because nobody tracked which trailer went where.
After: Morning dispatch takes 10 minutes. Crews see their assignments on their phones before they leave the yard. Client notes (gate codes, dog warnings, irrigation shutoff locations) carry over automatically between visits. Equipment tracking reduced "lost trailer" incidents from two per month to zero in the first quarter.
Lessons Learned:
- Non-technical builders should start with Bolt.new or Lovable, not Cursor. The visual feedback loop is critical when you cannot read code.
- The PIN-code authentication for crews was Rachel's most important design decision. Username/password would have been a non-starter for the field workers.
- Google Maps API costs added up faster than expected. Rachel switched to a static map image for the daily overview and only loads the interactive map when a crew lead taps a specific job. Monthly API cost dropped from $47 to $8.
- The AI initially built a beautiful but unnecessary crew scheduling Gantt chart. Rachel deleted the entire component with one prompt: "Remove the Gantt chart. We don't need it. Keep it simple."
- Having a real user (her dispatch coordinator, Maria) test the app on day two caught three usability issues that Rachel had missed.
Outcome: FieldSync has been in daily use at GreenScape for five months. All eight crews use it. Rachel estimates it saves 6 hours of administrative time per week across the company. The owner asked her to "sell it to other landscaping companies," which she is now exploring. Total build cost: $0 (Bolt.new free tier was sufficient for the prototype; Lovable's free tier handled the refinements). Ongoing cost: $25/month (Supabase) + $8/month (Google Maps API).
Project 3: Resonance -- Startup MVP That Got Into Y Combinator
What it is: An AI-powered customer feedback analysis platform. Companies connect their support channels (Zendesk, Intercom, email), and Resonance automatically categorizes feedback by theme, sentiment, and urgency. Surfaces product insights that typically take a research team weeks to compile.
Builder Profile: David Park and Jenna Liu, both 27. David is a former ML engineer at a mid-tier AI startup. Jenna was a product manager at Salesforce. Neither had built a full-stack consumer product before. They quit their jobs in September 2025 with savings to cover six months.
Tools Stack:
- Claude Code for backend architecture and API integrations
- Cursor for frontend development
- Next.js 14 with App Router
- Supabase for database, auth, and vector storage
- OpenAI API for embeddings and classification
- Anthropic API for summary generation
- Vercel for hosting
- Stripe for billing
Build Timeline: Three weeks from first prompt to a working MVP. One additional week for polish before the YC application. Total: four weeks with two people working full-time.
Key Prompts:
Prompt 1 -- System architecture:
Design the architecture for a customer feedback analysis platform.
Data flow:
1. INGEST: Connect to Zendesk, Intercom, and email (IMAP) to pull
customer messages. Webhook listeners for real-time ingestion.
Dedup messages that appear in multiple channels.
2. PROCESS: For each message:
- Generate embedding (OpenAI text-embedding-3-small)
- Classify sentiment (positive/neutral/negative/urgent)
- Extract themes (use clustering on embeddings, auto-generate
theme labels)
- Score urgency (1-5 based on sentiment + keywords + customer tier)
3. STORE: PostgreSQL for structured data. Supabase pgvector for
embeddings. Link every insight back to source messages.
4. SURFACE: Dashboard showing:
- Theme clusters with message counts and trends
- Sentiment distribution over time
- Urgent items requiring immediate attention
- Weekly auto-generated summary of top themes and shifts
Multi-tenant: each company sees only their own data. RLS enforced
at the database level. API keys scoped per integration per company.
Build the ingestion pipeline first. I want to connect a test Zendesk
instance and see messages flowing into the database within the first
session.
Why it worked: David wrote this prompt like a system design document. The level of specificity on data flow, multi-tenancy, and storage separation meant Claude Code generated a clean, well-separated architecture on the first pass. The instruction to get data flowing in the first session kept the AI focused on the critical path.
Prompt 2 -- The insight generation engine:
Build the weekly insight report generator.
Input: All feedback messages from the past 7 days for a given company.
Process:
1. Cluster messages by theme (using cosine similarity on embeddings,
threshold 0.82)
2. For each cluster with 5+ messages:
- Generate a theme label (3-5 words)
- Count messages and calculate sentiment breakdown
- Identify the most representative message (closest to centroid)
- Compare to previous week: is this theme growing, shrinking, or new?
3. Rank themes by: (message_count * urgency_avg * growth_rate)
4. Generate executive summary using Claude:
- 3 paragraphs maximum
- Lead with the most important shift
- Include specific numbers
- End with a recommended action
Output: Structured JSON with themes array and summary text.
Store in reports table. Send via email to company admin.
Handle edge cases: company with fewer than 10 messages that week
(skip report, send "not enough data" note), themes that appear
for the first time (flag as "emerging"), themes that disappear
(flag as "resolved").
Why it worked: The mathematical specificity (cosine similarity threshold, minimum cluster size, ranking formula) gave the AI enough constraints to produce a working implementation without guessing. Jenna later said the ranking formula in the prompt became the actual production ranking formula -- it was that well-specified.
Before/After: Before: David and Jenna had a pitch deck, three notebooks of customer research, and a Figma prototype. No working software. Their previous attempt at building the MVP with traditional development (David coding the backend, contracting a frontend developer) had consumed six weeks and $12,000 in contractor fees with only the auth system and a basic dashboard to show for it.
After: A fully functional platform that could ingest from Zendesk, classify feedback, cluster themes, and generate weekly reports. Three beta customers were using it with real data. The YC demo showed live feedback flowing in and being categorized in real time.
Lessons Learned:
- The combination of Claude Code for backend/architecture and Cursor for frontend was more effective than using either tool alone. Claude Code handled the complex data pipeline logic better; Cursor was faster for UI iteration.
- AI-generated API integrations (Zendesk, Intercom) worked for the happy path but failed on pagination, rate limiting, and error recovery. These required manual intervention and were the primary source of bugs during beta.
- The multi-tenant RLS policies were the single highest-risk component. David reviewed every policy line by line -- this was not a place to vibe.
- Having three beta customers during the build, not after, changed everything. Real data exposed clustering issues that synthetic test data never would have.
- YC partners were not impressed by the fact that it was vibe-coded. They were impressed by the speed: four weeks from zero to three paying customers with real usage data.
Outcome: Accepted into Y Combinator W26 batch. Raised a $500K pre-seed round before the batch started. Currently at $8,400 MRR with 14 paying companies. David estimates the vibe coding approach saved them three months and $40,000+ in development costs compared to traditional development, which directly extended their runway.
Project 4: karandev.co -- Developer Portfolio That Landed a Job
What it is: A personal developer portfolio site with interactive project showcases, a working blog with MDX support, an AI chatbot trained on the builder's resume and projects, and a live "what I'm working on" status pulled from GitHub and Spotify APIs.
Builder Profile: Karan Patel, 22. Recent computer science graduate from a state university. Solid fundamentals in Python and Java from coursework, but limited experience with modern web frameworks. Had applied to 47 junior developer positions with a plain HTML resume site. Zero callbacks.
Tools Stack:
- Cursor (Composer mode) for all development
- Next.js 14 with App Router
- Tailwind CSS + Framer Motion for animations
- MDX for blog posts
- Vercel AI SDK + OpenAI for the resume chatbot
- GitHub API + Spotify API for live status widgets
- Vercel for hosting
Build Timeline: One full week of focused work during winter break. Approximately 40 hours total.
Key Prompts:
Prompt 1 -- Portfolio design direction:
Build a developer portfolio site that will make a hiring manager stop
scrolling. Next.js 14 App Router with Tailwind CSS.
Design: Dark theme. Subtle grain texture background. Smooth scroll.
Minimal but not boring. Accent color: electric blue (#3B82F6).
Typography: Inter for body, JetBrains Mono for code snippets.
Sections:
1. Hero: My name in large type. One-line tagline that rotates between
3 phrases (typed animation effect). Small "scroll down" indicator.
2. About: 2-paragraph bio. Photo (circular, subtle border glow).
Tech stack icons grid (React, Python, TypeScript, etc.) with
hover tooltips.
3. Projects: 3-4 cards in a grid. Each card: screenshot, title,
one-line description, tech tags, links to live demo + GitHub.
Cards tilt slightly on hover (3D transform). Click to expand
into full case study.
4. Blog: Latest 3 posts pulled from MDX files. Title, date, read time,
excerpt. Link to full post.
5. Contact: Simple email form (Resend API). Social links row.
Page transitions: smooth with Framer Motion. Sections fade-in on scroll.
Performance: 95+ Lighthouse score. No layout shift.
Why it worked: The prompt read like a creative brief, not a feature list. Details like "grain texture background," "cards tilt slightly on hover," and "typed animation effect" gave the AI a visual vision to execute against. The Lighthouse score target acted as a quality gate.
Prompt 2 -- The resume chatbot:
Add an AI chatbot to the portfolio that answers questions about me.
It should be a small floating chat bubble in the bottom right corner.
When opened, it expands into a chat window. Powered by OpenAI GPT-4o-mini
via the Vercel AI SDK.
System prompt for the chatbot:
"You are a helpful assistant on Karan Patel's portfolio website.
You answer questions about Karan's skills, experience, projects,
and education based on the context provided. You are friendly,
concise, and professional. If asked something not covered in the
context, say you don't have that information and suggest emailing
Karan directly. Never make up information about Karan."
Context document (embed this in the system prompt):
[I will paste my resume and project descriptions here]
Features:
- Streaming responses (token by token appearance)
- Suggested starter questions: "What are Karan's top skills?",
"Tell me about his projects", "What is his education background?"
- Rate limit: max 20 messages per session to control API costs
- Chat history persists in the browser session (sessionStorage)
- Mobile responsive: full-width chat panel on screens under 640px
Why it worked: Providing the exact system prompt within the development prompt eliminated a round of iteration. The rate limit and cost control details showed practical thinking that the AI translated directly into implementation.
Before/After: Before: A single-page HTML resume with a white background, Times New Roman font, and three bullet-pointed project descriptions. Karan described it as "what you'd get if you exported a Google Doc to HTML." Forty-seven applications sent. Zero interviews.
After: A polished portfolio with smooth animations, interactive project showcases, a working blog, and an AI chatbot that could answer recruiter questions about Karan's experience at 2 AM. The chatbot alone generated over 600 conversations in the first month.
Lessons Learned:
- The AI chatbot was the differentiator. Three interviewers specifically mentioned it. One said, "I asked your chatbot about your Python experience and it convinced me to bring you in."
- Framer Motion animations generated by AI worked but were initially too aggressive (elements flying in from all directions). Karan's best prompt was a one-liner: "Reduce all animations to subtle fades and slight upward slides. Nothing should feel like a PowerPoint transition."
- The Spotify "now playing" widget was a fun addition but caused a privacy concern Karan had not anticipated -- it was broadcasting his music taste to potential employers during interviews. He added a toggle to disable it.
- MDX blog setup took longer than expected. The AI-generated MDX configuration worked for basic posts but broke on code blocks with certain languages. This required actual debugging rather than prompt iteration.
- Total cost: $0 for the build. Approximately $3/month for the OpenAI API calls powering the chatbot (GPT-4o-mini is cheap at volume).
Outcome: Karan posted the portfolio on r/webdev, Twitter, and LinkedIn. The Reddit post received 1,200 upvotes. The portfolio has had 14,000 unique visitors in three months. He received 11 interview requests in the first two weeks after launching. Accepted a junior full-stack developer role at a Series B startup in San Francisco. Starting salary: $135,000 -- $30,000 more than the median offer for new grads from his university. His manager later told him: "The portfolio showed us you could ship, not just code."
Project 5: Dungeon of Echoes -- A Game Built by a Teenager
What it is: A browser-based roguelike dungeon crawler with procedurally generated levels, pixel art aesthetics, turn-based combat, and a permadeath mechanic. Players descend through floors, collect loot, fight monsters, and try to reach floor 50. Leaderboard tracks the deepest floor reached.
Builder Profile: Aiden Nakamura, 16. High school junior in Portland, OR. Plays video games constantly. Had completed a Python basics course on Codecademy and built a few simple scripts. No web development or game development experience. Started this project during a snow day when school was cancelled.
Tools Stack:
- Replit Agent for initial game prototype
- Claude.ai (free tier) for debugging and game design advice
- HTML5 Canvas for rendering
- Vanilla JavaScript (no frameworks)
- localStorage for save data and leaderboard
- Replit hosting (free tier)
Build Timeline: Two weeks of after-school sessions (2-3 hours each) plus two full weekend days. Total: approximately 35 hours.
Key Prompts:
Prompt 1 -- The game concept:
Build a roguelike dungeon crawler game in HTML5 Canvas and JavaScript.
No frameworks, just vanilla JS.
The player starts on floor 1 of a dungeon. Each floor is a grid of
rooms generated randomly. The player moves with arrow keys. Each room
can contain: nothing, a monster, a treasure chest, a health potion,
or stairs down to the next floor.
Combat is turn-based. Player and monster take turns attacking. Damage
is based on attack stat minus defense stat plus a random factor.
When a monster dies, it drops gold and maybe an item.
Items: sword (increase attack), shield (increase defense), potion
(restore health). Items have rarity levels: common (white), rare (blue),
epic (purple). Higher rarity = better stats.
Permadeath: when the player dies, the run is over. Show a death screen
with stats: floors cleared, monsters killed, gold collected, time played.
Visual style: 16x16 pixel art aesthetic using simple colored squares
and basic shapes. Dark background. The dungeon should feel gloomy.
Start with movement and room generation. Add combat second.
Add items third. Add the death screen last.
Why it worked: Breaking the build into a clear sequence (movement, then combat, then items, then death screen) matched how game development actually works -- you get the core loop right before adding layers. Aiden said the AI "built each layer perfectly because it always had the previous layer working first."
Prompt 2 -- Making combat feel satisfying:
Combat feels boring. When I attack a monster or it attacks me,
nothing happens visually. Make it feel impactful:
1. Screen shake: brief shake (3 frames) when any attack lands
2. Damage numbers: float upward from the target and fade out, red for
damage, green for healing
3. Flash effect: the hit target flashes white for 2 frames
4. Death animation: when a monster dies, it fades out and drops
pixel particles downward
5. Sound: I know we can't do real sound easily, so fake it --
flash the screen border red briefly on hit to give visual "impact"
Keep the turn-based system. These are just visual effects layered on
top of the existing combat logic. Do not change how damage calculation
works.
Why it worked: The constraint "do not change how damage calculation works" prevented the AI from rewriting the combat system while adding effects. Aiden had learned from an earlier mistake where asking for "better combat" caused the AI to replace his entire combat module.
Before/After: Before: Aiden had tried to build a game three times previously. Attempt one: followed a YouTube tutorial for a platformer in Unity, got stuck on collision detection, gave up after four hours. Attempt two: tried Godot, spent a weekend learning the editor, never got past the main menu. Attempt three: started a text adventure in Python, finished it, but wanted something visual.
After: A fully playable, visually polished (for a browser game) roguelike with 50 floors of content, seven monster types, fifteen items, a working leaderboard, and combat that "actually feels fun to play" according to the comments on his Reddit post.
Lessons Learned:
- Replit Agent was the right starting point for a first-time game builder. The instant preview and zero-configuration hosting removed all friction.
- Game feel (screen shake, particles, damage numbers) transforms a boring prototype into something people want to keep playing. Aiden spent 20% of total time on these "polish" effects and considers it the best time investment.
- Procedural generation produced occasional unwinnable floors where the stairs were placed in a room surrounded by walls with no entrance. Aiden fixed this by adding a post-generation validation step -- a prompt asking the AI to "verify that every room with stairs is reachable from the spawn point. If not, regenerate."
- localStorage has a size limit. After extended play sessions with many leaderboard entries, the game crashed. Aiden learned about data size limits the hard way and added cleanup logic.
- Aiden's classmates became his QA team. They found six bugs in the first day, all of which Aiden fixed by pasting error descriptions into Claude.
Outcome: Posted on r/roguelikes and r/IndieGaming. The Reddit post received 480 upvotes. The game has been played over 8,000 times. Aiden's computer science teacher gave him extra credit and invited him to present the project to the class. He is now building a multiplayer version and has started learning React "for real" because he wants to understand what the AI was generating. He says: "Vibe coding got me through the door. Now I actually want to learn what's behind the door."
Project 6: The Copper Pot -- E-Commerce Site for a Small Business
What it is: A full e-commerce storefront for an artisanal cookware shop in Asheville, NC. Features a product catalog with high-resolution image galleries, size/finish variants, a shopping cart with saved-cart recovery, Stripe checkout, order tracking, and an admin panel for inventory management.
Builder Profile: Linda Brennan, 52. Owner of The Copper Pot, a brick-and-mortar cookware shop she has run for 18 years. Zero programming experience. Previously paid a local agency $8,500 to build a Shopify store that she found difficult to update and expensive to maintain ($79/month for Shopify Plus plus agency retainer for changes). Heard about vibe coding from her nephew who is a software developer.
Tools Stack:
- Lovable for storefront and admin panel
- Supabase for product database, auth, and image storage
- Stripe for payment processing
- Vercel for hosting
- Resend for order confirmation emails
Build Timeline: Five days of working on it during slow hours at the shop, plus two evenings. Total: approximately 20 hours.
Key Prompts:
Prompt 1 -- The storefront:
Build an online store for my cookware shop called "The Copper Pot."
I sell high-end copper pots, pans, and kitchen tools. My customers
are home cooks aged 35-65 who appreciate craftsmanship. The feel
should be warm, artisanal, and trustworthy. Think: exposed brick,
natural tones, and beautiful product photography.
Pages:
1. Home: hero image with tagline "Handcrafted Copper Cookware Since
2008", featured products grid (6 items), testimonial carousel,
Instagram-style gallery of kitchen photos
2. Shop: filterable product grid. Filters: category (pots, pans,
tools, sets), price range, material. Sort by price, newest,
popularity.
3. Product detail: large image gallery (click to zoom), product
description, size/finish selector, price, add to cart button,
"You might also like" section with 3 related products.
4. Cart: line items with quantity adjustment, subtotal, shipping
estimate, proceed to checkout.
5. About: our story, photo of the shop, craftsmanship values.
6. Contact: form + shop address + embedded Google Map.
Colors: warm cream background (#FDF8F0), copper accent (#B87333),
dark text (#2D2926). Font: serif headers (Playfair Display),
sans-serif body (Lato).
Mobile must be perfect. Most of my customers browse on their phones.
Why it worked: Linda described her customers and brand feeling, not technical specifications. The AI translated "warm, artisanal, and trustworthy" and "exposed brick, natural tones" into a design that Linda said "looks exactly like my shop feels." The color hex codes were her nephew's contribution -- he helped her pick colors that matched her physical store's palette.
Prompt 2 -- Admin inventory management:
Add an admin panel that only I can access (password protected).
I need to:
1. Add new products: name, description, price, category, images
(upload multiple), sizes available, stock count for each size
2. Edit existing products: change any field, reorder images
3. Mark products as "sold out" (shows badge on storefront but
keeps the page live) or "hidden" (removes from storefront)
4. View orders: list with date, customer name, items, total,
status (paid / shipped / delivered). Click to see full details.
5. Update order status and add tracking number (customer gets
an email when I mark it as shipped)
6. Simple dashboard: total revenue this month, number of orders,
top selling products
Keep it simple. I am not technical. Big buttons, clear labels.
When I upload images, automatically resize them for the web
(I take photos on my phone and they are very large files).
Why it worked: "I am not technical. Big buttons, clear labels." This single line shaped the entire admin interface. The AI generated an admin panel with a significantly simpler layout than a typical CMS, with confirmations on every destructive action and undo options. The automatic image resizing solved a real problem -- Linda's phone photos were 4MB each.
Before/After: Before: A Shopify store that cost $8,500 to build and $79/month to maintain. Linda could not update product descriptions without emailing her agency and waiting 48 hours. Adding new products required a $150/change agency fee. The site looked generic -- it used a standard Shopify theme that looked identical to thousands of other stores.
After: A custom storefront that matches The Copper Pot's physical brand identity. Linda updates products herself through the admin panel. No monthly platform fees beyond Supabase ($25/month) and Vercel ($0 -- free tier). Stripe charges are 2.9% + $0.30 per transaction (same as Shopify).
Lessons Learned:
- Lovable was the right tool for someone with zero programming experience. Linda never saw a line of code. She described what she wanted in plain English and refined the results visually.
- Product photography matters more than website design. Linda initially uploaded poorly lit phone photos and the site looked "cheap." Her nephew helped her photograph products with natural light, and the same site suddenly looked premium.
- Stripe integration through Lovable worked seamlessly for simple checkout. However, Linda needed to handle sales tax, which required adding a tax calculation service. This was the only part where she needed her nephew's help.
- The "saved cart recovery" feature (emailing customers who abandoned carts) was not in Linda's original plan. The AI suggested it during a prompt about the checkout flow. It recovers approximately $300-$400 in sales per month.
- Shipping calculation was the hardest problem. USPS API integration was unreliable, so Linda switched to flat-rate shipping tiers ($8 / $12 / free over $150), which was simpler and actually increased average order value.
Outcome: Online sales in the first three months: $23,400. Previous Shopify store's best three-month period: $9,100. The warm, custom design and improved product photography drove a 34% increase in conversion rate compared to the old Shopify store. Linda's monthly tech costs dropped from $79 (Shopify) + agency retainer to $25 (Supabase). She saved approximately $3,000 in the first year on platform and agency fees alone. Three other local shop owners have asked Linda to help them build similar stores.
Community Stats
Aggregated from 312 community submissions received between October 2025 and April 2026.
Submissions Overview
| Metric | Value |
|---|---|
| Total submissions received | 312 |
| Featured projects (all-time) | 43 |
| Countries represented | 27 |
| Youngest builder | 14 (high school student, built a study flashcard app) |
| Oldest builder | 67 (retired accountant, built a family recipe archive) |
Builder Background Distribution
| Background | Percentage |
|---|---|
| Professional developer | 41% |
| Student / recent graduate | 19% |
| Non-technical professional | 17% |
| Designer / creative | 11% |
| Founder / entrepreneur | 8% |
| Other (retired, career switcher, hobbyist) | 4% |
Most Popular Tools
| Rank | Tool | Usage Rate |
|---|---|---|
| 1 | Cursor | 62% |
| 2 | Claude Code | 47% |
| 3 | Bolt.new | 34% |
| 4 | Lovable | 28% |
| 5 | v0 | 24% |
| 6 | Replit Agent | 19% |
| 7 | GitHub Copilot | 16% |
| 8 | Windsurf | 11% |
Note: Percentages exceed 100% because most projects use multiple tools.
Supporting Technology
| Category | Most Popular Choice |
|---|---|
| Framework | Next.js (58%) |
| Styling | Tailwind CSS (71%) |
| Database | Supabase (52%) |
| Hosting | Vercel (64%) |
| Payments | Stripe (89% of projects with payments) |
| Auth | Supabase Auth (44%) |
Build Time Distribution
| Time Range | Percentage |
|---|---|
| Under 4 hours | 12% |
| 4-12 hours | 27% |
| 12-24 hours (1-2 days) | 31% |
| 1-2 weeks | 22% |
| Over 2 weeks | 8% |
Average time from first prompt to deployed: 18.4 hours Median time from first prompt to deployed: 14 hours
Project Categories
| Category | Count | Percentage |
|---|---|---|
| SaaS / web application | 72 | 29% |
| Internal / business tool | 48 | 19% |
| Portfolio / personal site | 37 | 15% |
| E-commerce | 29 | 12% |
| Game | 21 | 9% |
| Mobile app | 18 | 7% |
| Chrome extension | 12 | 5% |
| CLI tool / developer utility | 10 | 4% |
Outcome Metrics
| Metric | Value |
|---|---|
| Projects still actively maintained (after 3+ months) | 68% |
| Projects generating revenue | 31% |
| Average MRR for revenue-generating projects | $840 |
| Highest reported MRR | $12,400 |
| Builders who reported getting hired because of their project | 14 |
| Builders who transitioned to full-time on their project | 9 |
Success Patterns
From analyzing all 247 submissions, the projects most likely to succeed shared these characteristics:
- Specific problem, specific user. "A tool for landscaping dispatchers" beats "a project management app" every time.
- Prompt specificity. Builders who shared detailed, structured prompts (average 150+ words per prompt) had measurably better outcomes than those using short, vague prompts.
- Early deployment. Projects deployed within the first 25% of total build time had a 73% continuation rate. Projects that waited until "done" to deploy had a 41% continuation rate.
- Real users during build. 82% of revenue-generating projects had at least one real user testing before the builder considered it complete.
- Two tools, not five. The most successful builders typically used one primary AI coding tool and one supporting tool. Projects that used four or more AI tools had lower completion rates, likely due to context-switching overhead.
Monthly Spotlight
April 2026 Spotlight: MeetingMind
Category: Productivity SaaS / AI Workflow Automation Builder: Ayasha Bright, 38, senior product manager at a Series C fintech startup Tools: Claude Code (Sonnet 4.6), Next.js 15, Supabase, OpenAI Whisper API, Stripe, Vercel, Linear API, Slack API Build time: 26 hours across three weeks of evenings
The Story: Every meeting at Ayasha's company generated action items that disappeared into Notion pages. Her engineering lead would commit to something in a standup and have no memory of it four days later. The PM team spent 90 minutes every Friday consolidating meeting notes into a "decision log" nobody read. The problem was not taking notes β it was that notes stayed in meeting-shaped containers when the work that followed was structured very differently.
Ayasha had never written production code. She had used Claude.ai to write SQL queries for data analysis and knew Cursor existed. She decided to build MeetingMind after the Bitwarden CLI compromise in April 2026 shut down an internal tool her team relied on β the security incident forced a day of lost productivity and gave her an unexpected afternoon to prototype.
Her opening prompt to Claude Code:
Build a meeting intelligence tool called MeetingMind.
Problem: Meeting action items, decisions, and commitments get
lost. Notes stay in meeting documents. Work happens in Linear,
GitHub, and Slack. Nothing connects them.
Core flow:
1. CAPTURE: Chrome extension records meeting audio (in-browser,
requires user consent screen before every meeting). User can
also upload an audio file or paste a transcript.
2. TRANSCRIBE: Send audio to OpenAI Whisper API. Return timestamped
transcript with speaker diarization if available.
3. EXTRACT (Claude Sonnet 4.6):
- Action items: who + what + deadline (explicit or inferred)
- Decisions: what was decided and who decided it
- Key quotes: verbatim statements that matter ("we're not
shipping until X is fixed")
- Open questions: things raised but not resolved
4. ROUTE:
- Action items β create Linear issues (assignee auto-matched
to Linear user by name)
- Decisions β post to #decisions Slack channel
- Direct commitments ("I'll do X") β Slack DM to the committer
5. DASHBOARD: Per-meeting summary. Weekly view showing all action
items across meetings with status (done/open/overdue).
Highlight commitments that are overdue.
Auth: Supabase magic link. Multi-tenant (one workspace per company).
Billing: Stripe subscription, $19/month per workspace.
Start with the upload-and-transcribe flow. Get that working end
to end before the Chrome extension.
By the end of the first evening, Ayasha had a working transcription flow with Claude extraction. By the second session, Linear and Slack routing were operational. The Chrome extension β which she had assumed would be the hardest part β took one four-hour session using Claude Code's browser extension template skill from the Skills Registry.
The critical moment came when she tested it on a real meeting recording. Claude correctly extracted 14 action items from a 47-minute product review, matched 11 of them to the right Linear assignees by name, and flagged two commitments made by engineers who were not in Linear β creating a "needs routing" queue instead of silently dropping them.
The extraction is good but the Linear matching is wrong for
people who go by a different name at work vs. their display name
(e.g., "Matty" in meeting speech vs. "Matthew Chen" in Linear).
Add a name alias table: admins can define "Matty β Matthew Chen",
"JP β Jean-Pierre Moreau". Store in Supabase, editable in settings.
Apply before Linear lookup. Also: if no match is found, do not
create the issue silently -- add it to an "unrouted" queue that
the meeting owner reviews and manually assigns.
The alias table fix was the difference between a toy and a production tool. Ayasha shipped that feature after testing revealed three alias mismatches in the first real team usage.
What went right:
- Specifying "start with upload-and-transcribe, not the Chrome extension" avoided the common mistake of building the hardest part first. The core extraction loop was validated before investing in browser integration.
- Including the "unrouted" queue in the initial prompt prevented silent data loss β a production concern that most AI-generated first drafts skip.
- The Skills Registry in Claude Code 3.0 had a browser extension starter skill that cut Chrome extension development from an estimated 8 hours to 3.
What went wrong:
- Speaker diarization from Whisper is unreliable for meetings with more than four participants and similar voices. Ayasha added a "speaker labels" UI where users can correct attribution after transcription, but it adds friction.
- The Slack routing initially posted decisions to #decisions before the user could review them β embarrassing during beta when a draft message went public. Fixed by adding a 10-minute review window with a "send now" / "edit" / "cancel" UI.
- Stripe webhook handling required two debugging sessions. The AI-generated handler missed the
idempotency_keycheck, causing duplicate subscription activations during testing.
Outcome: Ayasha soft-launched MeetingMind to her own team (12 people) and two other teams at her company. Within six weeks, three other teams had signed up and she had 14 paying workspaces at $19/month β $4,200 MRR. She posted on LinkedIn, not Product Hunt, specifically targeting PMs and ops leads. The post received 1,800 likes and 240 shares, generating 60+ inbound workspace signups in four days. Ayasha has not left her job but is building toward it.
Why we selected it: MeetingMind represents a maturation in how non-technical professionals approach vibe coding. Ayasha did not build a simple tool β she built an integration-heavy workflow automation that touches five external APIs, handles multi-tenant billing, and ships a Chrome extension. The prompt quality reflects someone who thinks in product workflows, not feature lists. The decision to test on a real meeting recording before declaring anything "done" is the kind of judgment that separates projects that work in demos from projects that work in production.
Previous: March 2026 Spotlight: FleetTrack
Category: B2B SaaS / Logistics Builder: Raj Patel, 27, operations analyst at a logistics company Tools: Claude Code (Opus 4.6), Next.js 16, Supabase, Mapbox, Vercel Build time: 18 hours over one weekend
The Story: Raj managed a fleet of 40 delivery vehicles using spreadsheets and phone calls. He had never written production code before but had been following vibe coding tutorials on the EndOfCoding YouTube channel. When his manager complained about the lack of real-time visibility into delivery routes, Raj decided to build a solution himself.
His opening prompt to Claude Code:
Build a real-time fleet tracking dashboard with Next.js 16 and Supabase.
Core features:
1. Map view showing all active vehicles with live GPS positions
(use Mapbox GL JS). Each vehicle is a colored dot -- green for
on-schedule, yellow for delayed, red for stopped.
2. Sidebar with vehicle list, sortable by status, driver name, or
ETA to next stop. Clicking a vehicle centers the map and shows
route history for today.
3. Driver mobile view: a simple page where drivers tap "Arrived"
at each stop. Auto-captures GPS coordinates. Works offline and
syncs when back online.
4. Daily summary: auto-generated at 6 PM showing total deliveries,
average time per stop, vehicles that went off-route, and fuel
estimates based on distance traveled.
Auth via Supabase magic link. Role-based: admin sees everything,
drivers see only their own route. Use Supabase real-time subscriptions
for live vehicle position updates.
The dashboard must feel fast. Sub-200ms updates on the map.
Raj had a working prototype by Saturday night. By Sunday evening, he had added route optimization suggestions using a simple nearest-neighbor algorithm. He deployed to Vercel and showed it to his manager on Monday morning. Within two weeks, all 40 vehicles were using FleetTrack. The company cancelled its $800/month fleet management subscription.
Why we selected it: FleetTrack represents the next wave of vibe coding impact: non-developers building real B2B tools that replace expensive SaaS subscriptions. Raj's prompt demonstrates strong domain expertise combined with specific technical requirements -- the sweet spot where vibe coding delivers maximum value. The offline-sync requirement for drivers shows thoughtful product thinking that no AI would have suggested on its own.
Previous: February 2026 Spotlight: QuietPage
Category: Productivity tool Builder: Sana Mirza, 31, UX designer at a remote-first company Tools: Cursor, Next.js, Supabase, Vercel Build time: 11 hours over three evenings
The Story: Sana was frustrated by every writing app she tried. Google Docs felt corporate. Notion was too feature-heavy. iA Writer was beautiful but did not sync across devices. She wanted a writing tool that was quiet, distraction-free, synced to the cloud, and had exactly one feature beyond basic text editing: a daily word count streak tracker.
Sana opened Cursor on a Tuesday evening with this prompt:
Build a minimal writing app. I mean truly minimal.
One page. No sidebar. No toolbar. No menus visible by default.
Just a white page with a blinking cursor. The user types.
Auto-save to Supabase every 30 seconds and on every pause longer
than 2 seconds. Show a subtle "saved" indicator that fades in and
out -- bottom right corner, small gray text, disappears after 1 second.
One feature: daily word count streak. If the user writes at least
200 words today, the streak continues. Show the streak as a small
flame icon with a number in the top right corner. That is the only
UI element visible while writing.
Keyboard shortcuts (show on hover over a small "?" icon, bottom left):
- Cmd+B: bold
- Cmd+I: italic
- Cmd+Shift+H: toggle heading
- Cmd+/: toggle dark mode
No sign-up wall. Auth via magic link only. No password to remember.
If the writing app does not feel calm, it has failed.
The result was a writing app that four of Sana's coworkers started using within a week. She posted it on Hacker News with the title "I built the quietest writing app on the internet." It hit the front page. Within a month, QuietPage had 2,800 registered users and Sana was considering adding a $5/month premium tier for features like version history and export to PDF.
Why we selected it: QuietPage demonstrates that vibe coding is not just for building complex systems. Sometimes the hardest product decision is what to leave out. Sana's prompt is a masterclass in constraint-driven design, and the result is a product people genuinely prefer over established alternatives -- not because it does more, but because it does less, better.
Have a project that should be featured in next month's spotlight? Submit it using the template above.
Explore Further
- Get the complete prompt library in Chapter 17: The Complete Prompt Library -- 200+ production-ready prompts for every stage of AI-native development.
- Compare tools in Chapter 18: Tool Comparison Matrix -- Side-by-side evaluation of every major vibe coding tool.
- Secure your project with Chapter 19: The Security Playbook -- The pre-launch checklist every vibe-coded project needs.
- Try hands-on at vibe-coding.academy -- Interactive tutorials and guided projects.
- Join the discussion at endofcoding.com -- Community forum, Discord, and weekly office hours.
This chapter is updated monthly with new featured projects and refreshed community stats. Last updated: May 2026 (April 2026 spotlight added).
β What Level Are You?
Answer 6 questions to discover your vibe coding level.
β Glossary
- Vibe Coding
- AI-assisted development where the developer describes intent in natural language and evaluates output through execution, not code review.
- Accept All
- The practice of accepting all AI-generated code changes without reviewing diffs.
- Coding Agent
- An autonomous AI system that can plan, implement, test, and deploy code changes independently.
- Composer
- A mode in AI IDEs (like Cursor) that generates multi-file code from natural language descriptions.
- Error-Driven Development
- Debugging by copy-pasting error messages to the AI rather than reading and understanding the code yourself.
- MCP (Model Context Protocol)
- Anthropic's open protocol allowing AI assistants to connect to external tools and data sources.
- Prompt Engineering
- The skill of crafting effective natural language instructions to produce desired AI outputs.
- Vibe Coding Hangover
- The phenomenon of teams struggling to maintain, extend, or debug AI-generated codebases. Documented by Fast Company in Sept 2025.
- Zombie App
- An application that is functional but unmaintainable because nobody understands the AI-generated code.
- Complexity Ceiling
- The point at which a vibe-coded application can no longer be extended because the underlying code is too tangled.
- Hybrid Workforce
- An organization where AI agents work alongside human engineers, as pioneered by Goldman Sachs with Devin.
- The 80/20 Rule
- Vibe code the 80% (UI, boilerplate, standard patterns). Engineer the 20% (auth, security, business logic).
- Agent Teams
- A feature in Claude Code (introduced with Opus 4.6) allowing multiple AI agents to work in parallel on different aspects of a project, coordinating autonomously.
- Agent Mode
- A capability in coding tools (GitHub Copilot, Cursor, etc.) where the AI autonomously identifies subtasks, makes multi-file edits, runs tests, and fixes errors without step-by-step human guidance.
- Devin Wiki / Devin Search
- Cognition's documentation generation and code search tools built into the Devin platform, enabling AI-generated documentation and natural language querying of codebases.
- Multimodal Coding
- An emerging trend combining voice, visual, and text-based inputs for AI code generation — including screenshot-to-code and voice-to-code workflows.
β Resources
Tools to Try
Cursor β cursor.com β AI-native IDE ($1B+ ARR, $29.3B valuation)
Claude Code β Anthropic's terminal coding agent with agent teams (Opus 4.6)
GitHub Copilot β github.com/features/copilot β Agent mode in VS Code (4.7M users)
Bolt.new β bolt.new β Browser-based app builder
v0 β v0.dev β AI UI generation by Vercel
Replit β replit.com β Browser IDE with AI agent
Lovable β lovable.dev β App creation for non-developers
Google Jules β jules.google β Async coding agent (Gemini 3 Pro)
Gemini CLI β github.com/google-gemini/gemini-cli β Open-source terminal agent
OpenAI Codex CLI β github.com/openai/codex β Open-source terminal agent
Devin β devin.ai β Autonomous AI software engineer ($155M+ ARR)
Windsurf β windsurf.com β AI IDE with persistent memory (now part of Cognition)
Further Reading
- Karpathy's original tweet (February 2, 2025)
"Vibe Coding in Practice" β arXiv research paper (2025)
"Vibe Coding Kills Open Source" β arXiv research paper (January 2026)
Tenzai security assessment (December 2025)
Cognition's Devin 2025 Performance Review
Fast Company: "The Vibe Coding Hangover" (September 2025)
IBM: "What is Vibe Coding?"
Google Cloud: "Vibe Coding Explained"
Vibe Coding β Wikipedia (comprehensive history and analysis)
Example Projects
Open the HTML files included with this ebook to see working applications built through vibe coding:
- Task Manager (
examples/task-manager-example.html) β localStorage, responsive design, animations
- Task Manager (
Snake Game (
examples/snake-game-example.html) β Canvas rendering, game loop, score trackingPrompt Examples (
examples/vibe-coding-prompts.md) β Ready-to-use prompts by category"The vibes are real. The exponentials are real. The security vulnerabilities are real too. Code wisely."
Last updated: February 25, 2026
What's New
Every update to this ebook is tracked here. Subscribers get monthly updates with new content, revised chapters, and fresh prompts.
May 2026
May 27, 2026
Chapter 5 (Tools Landscape): OpenAI Codex CLI card upgraded to GPT-5.5 default and extended with the May 21, 2026 Codex broad release: Goals mode enabled by default (no longer experimental β backed by dedicated storage, tracks progress across active turns, available in app/IDE extension/CLI; Codex can drive toward a specific objective for hours or days). Permission profiles gained list APIs, inheritance, managed
requirements.tomlsupport, runtime refresh behavior, stronger Windows sandbox integration. 90+ new plugins / skills / app integrations / MCP servers added β Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, Superpowers among them. App-server workflow reliability improvements; expanded packaging across installers, npm, and runtimes. Google Jules card moved from private beta to generally available at Google I/O 2026 (May 19) with full GitHub repository integration, autonomous multi-file editing, and a free tier capped at 50 tasks/month β now a first-class autonomous PR agent alongside Devin and Copilot cloud agent. New Google Antigravity 2.0 card β Google's standalone desktop IDE competitor to Cursor and Windsurf, launched at I/O with parallel subagent execution, scheduled background tasks, native ecosystem integrations across AI Studio + Android Studio + Firebase + Cloud Workstations + BigQuery; internal Gemini 3.5 Flash optimization runs at 12Γ the speed of comparable frontier models (vs 4Γ for the public Gemini API). New Qwen3.7-Max card (Alibaba Cloud, May 20, 2026 β API live May 19) β agent-first design with 1M-token context window, native extended-thinking mode; benchmarks SWE-Verified 80.4 (tied with Opus 4.6 Max), SWE-Pro 60.6 (highest public score), Terminal-Bench 2.0 69.7, MCP-Atlas 76.4, GPQA Diamond 92.4, KernelBench L3 96% acceleration rate; 35-hour autonomous run with 1,158 tool calls without human intervention, 10Γ speedup on an unseen GPU kernel; pricing $2.50 / $7.50 / $0.25 cached per 1M tokens. First credible Chinese-hyperscaler entry at the frontier of agentic coding benchmarks.Chapter 9 (The Numbers): New JetBrains Developer Ecosystem Survey 2026 stat grid (published May 23): Copilot 29% share (down from 67% YoY among professional developers β the year's largest AI-tool category shift), Cursor 18%, Claude Code 18% (first appearance at this scale, tied with Cursor). Among developers with 10+ years of professional experience, 46% choose Claude Code as daily driver vs only 9% for Copilot β 5Γ+ preference gap. Added Gemini 3.5 Flash to the Agentic Model Race grid: 76.2% Terminal-Bench 2.1 (vs Gemini 3.1 Pro 70.3%), 83.6% MCP Atlas, GDPval-AA 1656 Elo, 84.2% CharXiv Reasoning; 4Γ faster at API tier / 12Γ faster inside Antigravity 2.0; pricing $1.50 / $9.00 / $0.15 cached per 1M tokens (~40% cheaper than Gemini 3.1 Pro). Added Qwen3.7-Max to the Agentic Model Race grid with full benchmark suite and the 35-hour / 1,158 tool calls autonomous run record. Renamed the section "AprilβMay 2026" and rewrote the closing signal callout to capture both the benchmark race (Opus 4.6 β Gemini 3.5 Pro 89.1%) and the parallel cost-per-token competition (Composer 2.5, Gemini 3.5 Flash, and Qwen3.7-Max all hit benchmark parity at fractions of Opus 4.7's per-token bill).
Chapter 19 (Security Playbook): New section β Mini Shai-Hulud: First SLSA-Attested Malware (CVE-2026-45321, May 11, 2026). Between 19:20 and 19:26 UTC on May 11, 84 malicious package artifacts published across 42 @tanstack/* packages β published by TanStack's legitimate GitHub Actions release pipeline using its trusted OIDC identity, after attackers chained the pull_request_target "Pwn Request" pattern + Actions cache poisoning + runtime OIDC token extraction from runner process memory. CVE-2026-45321 (Critical). Attribution: TeamPCP (StepSecurity) / UNC6780 (Google Threat Intelligence). First documented case of malicious npm packages carrying valid SLSA Build Level 3 provenance β Sigstore signed the artifacts as if they were genuine TanStack releases because the publish step ran inside TanStack's real workflow with a stolen-but-valid OIDC token. Attestation presence no longer guarantees supply chain integrity. Spread within hours to @mistralai/* (Mistral AI SDK suite), UiPath (65 packages), OpenSearch (1.3M weekly npm downloads), Guardrails AI (PyPI) β 170+ packages across npm and PyPI, 518M+ cumulative downloads. 2.3 MB obfuscated payload reads runner process memory for every secret, harvests credentials from 100+ file paths (cloud providers, crypto wallets, AI coding tool configurations, messaging apps), and installs persistence hooks in Claude Code, VS Code, and OS-level services β uninstalling the package does NOT clean up. 4-point hardening checklist: (1) pin all @tanstack/* to pre-May-11 versions in lockfile; (2) use
gh attestation verifywith explicit--signer-workflow/--signer-repo(default verification passes this attack); (3) auditid-token: writescope in every GitHub Actions workflow, never combine withpull_request_targetunless every PR code path is locked to repo-owned actions; (4) audit AI coding tool config directories (~/.claude/,~/.cursor/,~/.copilot/,~/.config/Code/User/) on developer machines that installed any @tanstack/* version between May 11β13. New Companion Disclosures section: node-ipc compromise (May 14, 2026) β versions 9.1.6, 9.2.3, 12.0.1 simultaneously published with identical 80 KB obfuscated credential-stealing payload (node-ipc has 10M+ weekly downloads); Microsoft Semantic Kernel RCE β CVE-2026-25592 (.NET SDK < 1.71.0) and CVE-2026-26030 (Pythonsemantic-kernel) allowing RCE via prompt injection in one of the most widely used AI agent frameworks (powers Copilot Studio and Azure AI agents) β companion to the May 7 "When prompts become shells" Microsoft research; TrapDoor (May 26, 2026) β first documented cross-ecosystem coordinated supply chain campaign hitting npm + PyPI + crates.io simultaneously with the same TTPs.
May 25, 2026
- Chapter 21 (Monthly Intel Brief): New MCP 2026-07-28 Release Candidate incident card. The Model Context Protocol working group locked the release candidate on May 21, 2026; final spec ships July 28 after a 10-week SDK validation window. Most consequential MCP revision since mainstream adoption. Stateless protocol core: removes the
initialize/initializedhandshake and theMcp-Session-Idheader; persistent SSE streams gone; client info now travels in_metaon every request; server-to-client communication restructures around a new Multi Round-Trip Requests mechanism withInputRequiredResultpayloads +requestStatetokens. Operational consequence: any MCP request can land on any server instance β sticky routing no longer required, shared session stores no longer required, MCP servers become ordinary HTTP handlers. New required headersMcp-MethodandMcp-Nameenable load-balancer routing without body inspection. New result metadatattlMsandcacheScopelet tools declare caching policy authoritatively. W3C Trace Context propagation in_metastandardizes distributed tracing across OpenTelemetry backends. Two extensions ship official: MCP Apps (server-rendered interactive HTML in sandboxed iframes β bridge from "tool returns text" to "tool returns widget"), and Tasks (long-running work graduated from experimental core feature to official extension with stateless lifecycle driven by client-sidetasks/get/tasks/update/tasks/cancel). Six SEPs align authorization with OAuth 2.0 / OpenID Connect: mandatoryissparameter validation per RFC 9207, OIDCapplication_typedeclaration during registration, credentials bound to specific authorization serverissuervalues, documented refresh-token / scope-accumulation patterns. Three legacy features deprecated: Roots, Sampling, and Logging β functional through at least July 2027. JSON Schema 2020-12 support across tool schemas (composition keywordsoneOf/anyOf/allOf, conditionals,$refreferences); missing-resource error code changes from non-standard-32002to standard JSON-RPC-32602. 4-point action checklist in the card for vibe coders running MCP server fleets. Headline callout rewritten to lead with the RC. Reinforcing platform context: AWS MCP Server GA May 6 with IAM/CloudWatch/CloudTrail integration; CrewAI now at 45,900+ stars with 12M+ daily agent executions and native MCP support across the fleet.
May 20, 2026
- Chapter 5 (Tools Landscape): Cursor card extended with Cursor 3.3 (May 7) PR Review experience (Reviews/Commits/Changes tabs with inline threads and quick-action pills) + Build in Parallel async subagents + auto-split-into-PRs quick action; cloud agent dev environments (May 11); Cursor in Microsoft Teams (mid-May); Cursor in Jira (May 19). Headline of the week: Cursor Composer 2.5 (May 18, 2026) β 79.8% SWE-Bench Multilingual (Opus 4.7 80.5%, essentially tied), 63.2% CursorBench v3.1 (Opus 4.7 61.6%, leads), priced $0.50/M input + $2.50/M output (
10Γ cheaper than Opus 4.7 per token); fast tier $3.00/$15.00; built on Moonshot's Kimi K2.5 base with 85% of compute spent on Cursor's RL post-training pipeline (25Γ more synthetic coding tasks than predecessor). Claude Code card: May 6 doubling of 5-hour limits across Pro/Max/Team/Enterprise and removal of peak-hour throttling on Pro/Max (attributed to SpaceX/Colossus 1 compute deal). Copilot card: CLI v1.0.48 (May 14) β model picker shows per-million-token input/output prices alongside model names; unified chat sessions view; agent mode Ask Question tool; global `/.copilot/agents/*.agent.mdcustom agent location. **Grok Code Fast 1 deprecated May 15** across every Copilot surface (chat, inline edits, ask/agent modes, completions). Gemini CLI card: **v0.41.0** β real-time voice mode (cloud + local), enforced workspace trust at session start, secured.env` loading in headless mode, expanded shell-command-validation core-tools allowlist (direct response to April CVSS 10.0 RCE chain). - Chapter 9 (The Numbers): Refreshed adoption baseline with Stack Overflow 2026 Developer Survey (May 19, 90,000+ respondents): 83% daily AI use (up from 62% in 2025), 47% of companies have NO formal AI tool policy, 54% can't tell which parts of codebase AI wrote. Added new AI Tool Daily Active Use Share stat grid β Claude Code #1 at 34%, GitHub Copilot 31%, Cursor 22%, Gemini Code Assist 9%. Added Cursor Composer 2.5 to the Agentic Model Race table β first tool-vendor in-house model with public claim of frontier parity at ~10Γ lower per-token cost. Revenue & Growth refreshed: $445M Devin ARR (CEO Scott Wu disclosure May 12), $480-520M Cognition combined ARR, $4B+ AI coding category aggregate ARR, 78% Devin 2.3 autonomous PR merge rate at SWE-1.7. Cognition valuation $25B (SoftBank Vision Fund 3-led Series D closed May 6 with NEA + Accel participating).
- Chapter 18 (Tool Comparison Matrix): First refresh since March 22 β every IDE and agent row updated with May 2026 reality. Cursor: Composer 2.5 pricing/benchmarks, Cursor 3.3 features, Jira/MS Teams integrations, CVE-2026-26268 git-hook RCE. Windsurf: Pro raised $15β$20, new Max $200/mo, Devin Cloud + Terminal CLI bundled. VS Code + Copilot: June 1, 2026 usage-based billing structure ($10 Pro + $5 flex / $39 Pro+ + $31 flex), CLI v1.0.48 token-price model picker. Claude Code: Opus 4.7 87.6% SWE-bench, 5-hour limit doubled May 6, Remote Agents + Persistent Memory in 3.0, 1.2M users. Devin: $445M ARR, 78% autonomous PR merge, $25B Cognition Series D. Added new Gemini CLI row with v0.41.0 voice + workspace-trust hardening. Lovable risk updated with April BOLA flaw + three documented incidents to date.
- Chapter 19 (Security Playbook): New "Vendor Response: What Shipped This Week (May 13β20, 2026)" callout β Gemini CLI v0.41.0 lands the first major upstream hardening response to the April CVSS 10.0 RCE chain (GHSA-wpqr-6v78-jr5g): workspace trust enforced at session start, .env loading secured in headless mode, expanded shell-command-validation core-tools allowlist. Pairs with Claude Code 3.0's
tool-response-sandboxingflag (May 13) β same class of failure addressed from the agent side; the technique used in the May 8 Trail of Bits MCP breach. Added empirical-floor callout: Veracode May 2026 study β across 100+ LLMs tested, 45% of AI-generated code samples introduce at least one OWASP Top 10 vulnerability; cross-referenced with Stack Overflow 2026 finding that 47% of companies have no formal AI tool policy despite 38% of codebases now containing majority AI-generated code. - Chapter 21 (Monthly Intel Brief): Two new incident cards. Cursor Composer 2.5 + Enterprise Integrations Week β May 18 launch ties Opus 4.7 on SWE-Bench Multilingual at ~10Γ lower cost, Cursor 3.3 PR Review + Build in Parallel, Cursor in MS Teams and Jira. GitHub Copilot Lineup Tightens Ahead of June 1 Billing Switch β CLI v1.0.48 token-price model picker, unified sessions view, agent Ask Question tool, global custom-agents directory; Grok Code Fast 1 deprecated May 15. Numbers grid refreshed with Composer 2.5 benchmark, 10Γ cost reduction, 47% no-AI-policy gap, $4B+ category aggregate ARR, June 1 Copilot billing reminder. Headline callout rewritten to lead with the Composer 2.5 + Copilot billing story.
May 18, 2026
- Chapter 19 (Security Playbook): New section β MCP Database Flaws & "Prompts Become Shells" (May 2026).
- Microsoft Security Blog (May 7, 2026): "When prompts become shells: RCE vulnerabilities in AI agent frameworks." Names four shipping-default failure patterns that pop up across the major agent frameworks and propagate into vibe-coded apps that wire up the same orchestrators: tool argument injection (untrusted document text becomes tool-call arguments with the agent's authority), code-interpreter abuse (host-process
python -crather than sandboxed execution), workflow compilation injection (attacker text flows into a step-graph definition another component executes), and MCP server-side injection (the MCP server itself fails to sanitize tool args before composing a downstream query). - The Register (May 13, 2026): Three new MCP database server CVEs β Apache Doris MCP (SQL injection via tool args, patched), Alibaba RDS MCP (sensitive metadata exfiltration, patched), and Apache Pinot MCP (instance takeover for internet-exposed deployments, vendor declined to patch). The unpatched Pinot case sets the disclosure precedent for refusing to deploy MCP servers from non-responsive maintainers.
- 7-point hardening checklist for vibe coders: (1) Audit + pin MCP server versions, no
@latest; (2) Refuse declined-to-patch servers; (3) No host-process code interpreters β wrap in E2B/Modal/Firecracker/gVisor; (4) Validate tool arguments independent of the model (platform enforcestoaddress, file path, payment ceiling); (5) Tag retrieved documents as untrusted prompt content; (6) Scope per-workflow tool allowlists (summarizer β writer β shell); (7) Human-in-the-loop on destructive actions, displaying literal tool-call arguments, not the model's natural-language summary. - Shared lesson across the May disclosures: the boundary between "content" and "instruction" was assumed across the agent ecosystem but never enforced. Every hardening pattern re-enforces that boundary at a different architectural layer.
- Microsoft Security Blog (May 7, 2026): "When prompts become shells: RCE vulnerabilities in AI agent frameworks." Names four shipping-default failure patterns that pop up across the major agent frameworks and propagate into vibe-coded apps that wire up the same orchestrators: tool argument injection (untrusted document text becomes tool-call arguments with the agent's authority), code-interpreter abuse (host-process
May 13, 2026
- Chapter 5 (Tools Landscape): Three GitHub Copilot CLI releases in a single week.
- v1.0.43 (May 6, 2026): Username toggle in
/statuslinepicker. Auto mode moves to server-side model routing for real-time selection. Two security fixes that matter for vibe coders touching untrusted repos: protection against RCE from malicious bare repositories nested inside a project, and full termination of MCP server child processes (npx/uvx-spawned) when a session ends β previously these were left as orphans. - v1.0.44 (May 8, 2026): Slash commands can appear mid-input; multiple skills can be invoked in a single message;
userPromptSubmittedhooks can handle requests directly and bypass the LLM (deterministic gating without a model call). Path completion in/add-dirno longer flickers or gets intercepted by@/#pickers. Tool permissions granted in autopilot mode persist across/clear. Free-tier quota display finally shows actual remaining usage (was always reading 100% consumed). - v1.0.45 (May 11, 2026): New
/autopilotslash command to toggle between interactive and autopilot modes without the Shift+Tab cycle through every mode in between. Windows PowerShell fallback (powershell.exe) when PowerShell 7+ (pwsh) isn't available. OpenTelemetry output aligned with GenAI semantic conventions β MCP tool calls use standardtool_callspans, newgen_ai.client.operation.durationmetric tracks tool execution time. Sessions with extension permission prompts resume cleanly (no more "Session file is corrupted"). - June 1, 2026 usage-based billing β pricing confirmed: Pro stays at $10/mo and includes $10 in AI Credits plus a $5 flex allotment ($15 included usage). Pro+ stays at $39/mo with $39 credits plus $31 flex ($70 total). Business $19/seat with $19 credits; Enterprise $39/seat with $39 credits. 1 AI credit = $0.01 USD, billed against input + output + cached tokens. Code completions and next edit suggestions stay unlimited and do NOT consume AI Credits on any paid plan. Copilot Chat, Copilot CLI, Copilot cloud agent, Copilot Spaces, Spark, and third-party coding agents all consume credits.
- v1.0.43 (May 6, 2026): Username toggle in
May 6, 2026
- Chapter 5 (Tools Landscape): GitHub Copilot CLI v1.0.40 (May 1, 2026) β adds headless OAuth via the
client_credentialsgrant type for MCP servers (no browser needed for auth β unblocks CI/CD and remote-agent setups). Tightens secure-by-default posture in prompt mode (-p): repo hooks and workspace MCP are now opt-in behindGITHUB_COPILOT_PROMPT_MODE_REPO_HOOKSandGITHUB_COPILOT_PROMPT_MODE_WORKSPACE_MCPenv vars. Bug fixes: CLI no longer hangs at 100% CPU when attaching large files;/clearand/newreset the active custom agent; subagents evaluate tool-search support against their own model rather than inheriting the parent session's settings. - Chapter 19 (Security Playbook): Two new sections.
- PromptMink: AI-Co-Authored Supply Chain Attacks β ReversingLabs dossier on the North Korea-linked Famous Chollima APT using LLM Optimization (LLMO) abuse to engineer npm packages specifically tuned to be recommended and installed by AI coding agents. Centerpiece: a Feb 28, 2026 commit on
openpaw-graveyard(an npm autonomous Solana trading agent) trailedCo-Authored-By: Claude Opus, added@solana-launchpad/sdkas a dependency, which transitively pulled in malicious@validate-sdk/v2β a credential stealer masquerading as a data-validation utility. Payload evolution from JavaScript infostealers (late 2025) β single-exec applications (Q1 2026) β compiled Rust binaries (May 2026). Includes the January 2026 Aikidoreact-codeshiftprecedent (a hallucinated package registered by a researcher and pulled into 237 GitHub repos via AI suggestions). Defenses: don't trust AI-suggested deps blind, treat AI-co-authored commits like unknown contributors, pin and lock, audit compiled-binary npm packages with extra scrutiny. - The AI-Generated Code Vulnerability Surge (CSA, 2026) β quantifies what AppSec teams have been observing: 45% of AI-generated samples carry OWASP Top 10 vulnerabilities (pass rate has not improved across multiple test cycles 2025 β Q1 2026), 86% failed cross-site scripting defense, 88% vulnerable to log injection. AI-assisted developers commit 3-4x faster but introduce security findings 10x faster β security debt accumulating faster than organizations can remediate.
- PromptMink: AI-Co-Authored Supply Chain Attacks β ReversingLabs dossier on the North Korea-linked Famous Chollima APT using LLM Optimization (LLMO) abuse to engineer npm packages specifically tuned to be recommended and installed by AI coding agents. Centerpiece: a Feb 28, 2026 commit on
April 2026
April 30, 2026
- Chapter 17 (Prompt Library): New Category 46 β Breach Response Prompts for Vibe Coders (3 prompts). Prompted by the Vibe Coding Security Crisis Week (April 19β22, 2026). Prompts: 46.1 Post-Breach Exposure Triage (assess exposure across source code, DB credentials, auth tokens, CI/CD when a breach touches your AI coding tool workflow); 46.2 AI Coding Tool Credential Rotation Checklist (step-by-step platform-by-platform rotation guide covering Claude Code, Cursor, GitHub, Vercel, Supabase, npm); 46.3 OAuth Grant Audit (full OAuth grant inventory, scope analysis, service account table, monitoring queries, and prevention controls β modeled on the Vercel/Context.ai breach vector). Also synced Categories 44β45 (added April 29β30) into the build-path markdown file. Total: 244+ prompts across 46 categories.
April 29, 2026
- Chapter 5 (Tools Landscape): Cognition shipped Windsurf 2.0 (April 15) with the Agent Command Center (Kanban surfacing local Cascade + cloud Devin sessions), Spaces (auto-context-inheriting bundles of agent sessions, PRs, files), and Devin bundled into Pro/Max/Teams plans. GitHub Copilot: GPT-5.5 GA on April 24 for Pro+/Business/Enterprise plans (basic Pro tier excluded); CLI v1.0.37 on April 27 with location-based permission persistence by default; Copilot code review starts consuming Actions minutes + AI Credits on June 1, 2026 (announced April 27). Lovable: added April 20 BOLA data breach summary (5 API calls to read another user's code/credentials, 48 days exposed before disclosure) and April 28 mobile app launch on iOS/Android.
- Chapter 9 (Numbers): Added GPT-5.5 verified benchmarks β 82.7% Terminal-Bench 2.0 (state of the art), 58.6% SWE-Bench Pro, 73.1% Expert-SWE (vs GPT-5.4's 68.5%), 84.9% GDPVal. Added Claude Opus 4.7 64.3% on SWE-Bench Pro β leads GPT-5.5's 58.6% by 5.7 points on real GitHub issues. Upgraded the Agentic Model Race GPT-5.5 card from placeholder to fully sourced benchmark data.
- Chapter 19 (Security Playbook): New section "The Vibe Coding Security Crisis Week (April 19β22, 2026)" documenting three breaches in four days: Lovable BOLA (broken object-level authorization, every user's source/DB/chat history readable in 5 API calls, 48-day HackerOne disclosure delay), Vercel breach via Context.ai (OAuth supply chain pivot from Lumma Stealer infection, ShinyHunters listed Vercel internal user DB on BreachForums for $2M), Bitwarden CLI npm
@bitwarden/cli@2026.4.0("Shai-Hulud: The Third Coming" β first confirmed npm supply chain attack specifically targeting authenticated Claude Code, Cursor, Codex CLI, Aider, Kiro, and Gemini CLI configurations). Includes systemic-pattern analysis (blast-radius minimization is the new defense) and a 30-second response checklist (rotate AI tool keys, audit OAuth grants, pin CLI npm dependencies). - Chapter 21 (Intel Brief): Five new incident cards covering April 19β28: Vibe Coding Security Crisis Week (the three-incident card), GPT-5.5 launch with verified benchmarks and Copilot integration tiers, Cognition Windsurf 2.0 (Agent Command Center, Spaces, Devin bundled, $25B raise reportedly closing), Lovable mobile launch (8 days after data breach), GitHub Copilot code review June 1 billing shift. Headline expanded with the "April 19β29" coda.
April 27, 2026
- Chapter 5 (Tools Landscape): New dated section "The Flat-Rate Era Is Ending" covering the simultaneous tightening across Claude Code (server-side prompt cache TTL cut from 1 hour to 5 minutes), GitHub Copilot (signup freeze on Pro/Pro+/Student April 20), and Cursor (frontier models moved behind Max Mode on legacy Team/Enterprise plans, accelerating credit burn). Industry shift from flat-rate "AI teammate" pricing to metered compute economics β average user went from ~50 calls/day in 2024 to thousands/day on agentic Claude Code or Codex in 2026. Convergence on a 2-tool stack: Cursor for daily editing + Claude Code for complex tasks, OR Copilot in IDE + Claude Code in terminal. GitHub Copilot CLI v1.0.36 (April 24) shipped subcommand picker; v1.0.35 (April 23) added tab-completion for slash commands. Practical guidance for individuals (budget $60β$200/month for heavy agentic users), teams (rebuild budget around per-seat metered compute, expect 5β10x variance), and tool evaluators (test on a representative agentic workflow, not headline subscription price).
April 9, 2026
- Chapter 5 (Tools Landscape): Cursor 3 launch (April 2) β Agents Window replaces Composer (multi-agent side-by-side/grid/stacked), Design Mode (click browser UI β agent modifies component), cloud-to-local handoff; Claude Code April 4 OpenClaw policy change β subscription limits no longer cover third-party harnesses, pay-as-you-go required (one-time credit issued), plus PowerShell tool for Windows, 60% faster Write tool diff; GitHub Copilot β Copilot SDK in public preview, Autopilot mode, privacy policy change (training on user data by default from April 24 β opt-out required).
- Chapter 9 (Numbers): Added Claude Mythos 93.9% SWE-bench (restricted, Project Glasswing); developer trust declined to 29% (SonarSource 2026, down from 70%+ in 2023); 51% professional devs use AI daily; 64% started using AI agents; 75% PR turnaround reduction (9.6 days β 2.4 days, Index.dev); 3.6 hours/week time saved (survey median); 66% frustrated by "almost right" solutions.
- Chapter 19 (Security Playbook): Trivy Cascade extension β CanisterWorm self-propagating npm worm (64+ packages, blockchain C2, evaded domain-seizure takedown), spread to Checkmarx KICS/AST GitHub Actions and LiteLLM (95M monthly PyPI downloads); new "AI as Autonomous Vulnerability Researcher" section covering Claude Mythos/Project Glasswing β autonomous zero-day discovery, implications for vibe-coded app security posture.
- Chapter 21 (Intel Brief): Six new April 2β9 incident cards: Cursor 3 (Agents Window + Design Mode); Claude Mythos/Project Glasswing (93.9% SWE-bench, zero-day discovery, defense-only restriction); Meta Muse Spark (Meta Superintelligence Labs first model, April 8); Trivy Cascade β CanisterWorm (blockchain C2, 64+ packages, Checkmarx + LiteLLM spread); Claude outages April 6β8 (10-hour outage, 8,000+ Downdetector reports); GitHub Copilot privacy change (April 24 training-by-default). Numbers section updated with Mythos 93.9%, CanisterWorm 64+ packages, trust 29%, PR turnaround 75%. What to Watch expanded with Copilot opt-out deadline and Mythos GA timeline.
April 1, 2026
- Chapter 5 (Tools Landscape): Cursor valuation updated to ~$50B (Bloomberg, fundraising talks at $2B+ ARR); Anthropic acquires Bun (JavaScript runtime) β native Bun integration in Claude Code; GitHub Copilot Agent Mode now fully generally available on both VS Code and JetBrains across all Copilot plans.
- Chapter 9 (Numbers): Added 73% global daily AI tool usage (Stack Overflow Dev Survey, Q1 2026) and 41% AI-generated code share (Sourcegraph Code Intelligence Report, March 2026); Cursor valuation updated to ~$50B; GitHub Copilot paid users updated to 20M+.
- Chapter 19 (Security Playbook): New "Supply Chain Attacks: April 2026 Alert" section covering Axios npm hijack (March 31 β UNC1069/North Korea, WAVESHAPER.V2 RAT, ~100M weekly downloads); LiteLLM credential stealer (versions 1.82.7/1.82.8, March 24); Langflow RCE CVE-2026-33017 (unauthenticated, CISA KEV, exploited within 20h); Trivy Docker Hub compromise CVE-2026-33634. New "Vibe-Coded App Vulnerability Research" section with Georgia Tech Vibe Security Radar data (2,000+ vulns, 400+ secrets in 5,600 apps) and AI-generated code CVE trend (6β15β35/month).
- Chapter 21 (Intel Brief): Transitioned to April 2026 brief. Seven new incident cards: Axios supply chain attack (North Korean state actor), LiteLLM/Langflow/Trivy attacks, Georgia Tech vulnerability research, MCP 97M monthly downloads milestone, Cursor self-hosted cloud agents, Vibe Coding 1-year anniversary + Collins Dictionary Word of the Year, SWE-bench model convergence. Numbers section updated with April figures. "What to Watch in May 2026" replaces April watchlist.
March 2026
March 25, 2026
- Chapter 5 (Tools Landscape): Claude Code updated for /loop scheduled tasks, 1M token context, 64k max output for Opus 4.6 (v2.1.63β2.1.76 evolution); Replit updated to $400M Series D at $9B valuation; Lovable updated with M&A offensive; GitHub Copilot JetBrains agentic capabilities GA; Windsurf/Devin updated with Codemaps product.
- Chapter 9 (Numbers): AI-generated code share updated to 46% (GitHub); US developer daily usage updated to 92%; Replit $9B valuation added to Valuations section.
- Chapter 19 (Security Playbook): New "MCP Supply Chain" section covering OpenClaw attack (1,184 malicious packages, ~1 in 5 in ClawHub), CVE-2026-23744 (CVSS 9.8 MCPJam RCE), Azure MCP RCE (CVSS 9.6), 36.7% SSRF exposure across MCP servers, with actionable protection checklist.
- Chapter 21 (Intel Brief): Six new incident cards for week of March 18-25: Claude Code /loop, Replit Series D, Lovable M&A, Devin Review + Windsurf Codemaps, Copilot JetBrains GA, OpenClaw supply chain attack. Numbers section updated. "What to Watch" expanded with MCP security, Lovable M&A, Replit ARR target.
March 7, 2026
- Chapter 5 (Tools Landscape): Cursor updated to v2.6 (Automations, JetBrains support, MCP Apps). OpenAI Codex CLI updated for GPT-5.4 (native computer use, 1M token context). Claude Code updated with voice mode, $2.5B+ ARR, Pentagon supply-chain risk note. Added Kilo Code (open-source, 1.5M+ users). GitHub Copilot updated to 26M+ users with GPT-5 mini/GPT-4.1 included. Windsurf updated with Gemini 3.1 Pro and LogRocket #1 ranking.
- Chapter 9 (Numbers): Claude Code ARR updated to $2.5B+. Copilot users updated to 26M+. Added Emergent AI ($50M ARR in 7 months), Cognition ($500M raise, $10B valuation, $82M+ ARR). Added developer sentiment section (84% use AI, only 3% high trust, 60% favorable view down from 70%+, 15% professional vibe coding adoption). Collins Dictionary Word of the Year updated for 2026.
- Chapter 19 (Security Playbook): Added AI Tool Security Advisories section covering Claude Code CVEs (CVE-2025-59536 RCE, CVE-2026-21852 API key exfiltration) with actionable guidance on AI tool attack surfaces.
- Chapter 21 (Intel Brief): Added GPT-5.4 launch (computer use, 1M tokens, financial tools). Added Pentagon/Anthropic conflict. Added Claude Code voice mode and CVE patches. Added Kilo Code launch. Added Qwen 3.5 (open weights, 74.1% LiveCodeBench). Updated Cursor to 2.6. Updated Cognition $500M raise. Added developer sentiment and Emergent AI stats. Expanded "What to Watch" with EU AI Act, Kilo Code growth, Pentagon resolution.
March 6, 2026
- Chapter 21: Complete rewrite of Monthly Intelligence Brief for March 2026 β open source crisis, Gemini 3 in Jules, Cursor 2.5 subagents, Copilot multi-model access, Pega enterprise vibe coding, Opus 4.6 agent teams, Devin 2.2
- Chapter 22: New March 2026 Spotlight: FleetTrack β B2B fleet management built by an operations analyst using Claude Code
- Chapter 5: Updated tool references for Cline, Jules, and March 2026 landscape
- Chapter 9: Updated GitHub Copilot stats (26M+ users), Devin metrics (67% PR merge rate, $10.2B valuation), Claude Code revenue ($2.5B+)
- Landing page: Updated social proof stats, added Vibe Coding Academy cross-promotion section with UTM tracking
- All chapters: Updated badges to March 6, 2026
March 1, 2026
- Build System: Introduced automated build pipeline for chapter management and updates
- Changelog: Added this changelog section β subscribers can now see exactly what changed and when
- Per-Chapter Badges: Each chapter now shows its last-updated date
- All Chapters: Initial release of all 22 chapters with 200+ prompts
February 2026
February 25, 2026
- Initial release: All 22 chapters published
- Chapter 1: The Moment Everything Changed β complete timeline from Karpathy's tweet to Opus 4.6
- Chapter 5: Full tools landscape covering Cursor, Claude Code, Devin, Jules, Gemini CLI, Codex CLI
- Chapter 10: Security analysis including Tenzai study and IDEsaster disclosure
- Chapter 17: 200+ production-ready prompts across 10 categories
- Chapter 18: Comprehensive tool comparison matrix
- Chapter 19: The 30-minute security checklist for vibe-coded applications
- Chapter 22: Community showcase with submission guidelines
April 21, 2026
- Chapter 21: Monthly Intel Brief updated to version 1.7 β added two incident cards for April 15β21: Claude Opus 4.7 (87.6% SWE-bench Verified, April 18) and Azure MCP Server 2.0 stable release + OAuth 2.1 added to core MCP spec. Callout headline updated. Previous: April 15 β Vercel Vinext CVEs, GLM-5.1, Claude Code reliability cluster.