OCDevel
Walk
EpisodesResources

MLA 022 Vibe Coding

Feb 09, 2025 (updated Feb 21, 2026)

Click to Play Episode

Andrej Karpathy coined "vibe coding" in February 2025 - a year later, 41% of all code is AI-generated, agents run multi-hour tasks autonomously, and the developer role has shifted from writing code to orchestrating systems.

Vibe Coding Mini Series

Resources

Resources best viewed here
Loading...

Show Notes

CTA
Learn Faster with a Walking DeskWalk While You Learn
Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits

In February 2025, Andrej Karpathy posted a tweet describing how he'd stopped reading diffs, hit "Accept All" on every suggestion, and just copy-pasted error messages back into the chat. He called it "vibe coding" - fully giving in to the vibes and forgetting the code even exists. The post got 4.5 million views. By late 2025, Collins Dictionary named it Word of the Year.

But this wasn't a sudden invention. It was the culmination of a four-year arc that started with GitHub Copilot's line-by-line autocomplete in 2021 and accelerated through GPT-4, 192K+ token context windows, reasoning models, and tool-use architectures. The result: AI shifted from suggesting the next line to autonomously planning, editing, testing, and committing across entire codebases.

The tool landscape has stratified fast

The ecosystem now breaks into three categories:

Terminal-native agents like Claude Code and Gemini CLI give power users direct environment access, scriptability, and Unix-style composability. Claude Code runs on models up to Claude Opus 4.5, supports 200K tokens (1M in beta), and spawns subagents for parallel work. Gemini CLI counters with a 1M-token context window and the most generous free tier in the space - 60 requests/minute, 1,000/day.

IDE-integrated agents like Cursor and Windsurf meet developers where they already work. Cursor hit $1B+ annualized revenue and a $29.3B valuation by going agent-first - its 2.0 release runs up to 8 parallel agents via git worktrees. Windsurf was acquired by Cognition (Devin AI) for $3B.

Cloud-based agents like OpenAI Codex take a different approach entirely - each task spins up an isolated sandbox with your repo, enabling true parallel execution. GPT-5.1-Codex-Max was the first model natively trained for multi-context operation, capable of 24+ hours of independent work.

Open-source pioneers still matter too. Aider (39K GitHub stars) introduced RepoMap for structural code context and now writes 50-88% of its own code. Cline (56K stars) established the human-in-the-loop approval pattern. GPT-Engineer evolved into Lovable, now a $6.6B unicorn.

Three pillars define the emerging stack

MCP (Model Context Protocol) solves the integration problem. Released by Anthropic in November 2024 and now hosted by the Linux Foundation, it's the "USB-C for AI" - a standard protocol replacing N×M custom integrations with N+M implementations. It has 97M monthly SDK downloads and clients across Claude, Cursor, Windsurf, Zed, and VS Code.

Skills turn prompt engineering into reusable packages. They're markdown files that extend agent capabilities through instruction injection - structured recipes telling an agent how to perform specific tasks. They can be shared, version-controlled, and scoped from global to project-level.

Harnesses are the real differentiator. Two agents running the same model differ entirely based on harness quality - the infrastructure governing context bridging, progress tracking, and environment management across sessions. The recommended pattern uses a two-agent architecture: an initializer sets up the environment, and a coding agent makes incremental progress one feature at a time.

Context engineering is the new critical skill

The practical constraint isn't model intelligence - it's what fits in the attention window. The discipline of context engineering has three strategies: reduce (compact older tool calls), offload (save results to filesystem), and isolate (spawn sub-agents for token-heavy subtasks). KV-cache optimization alone delivers 10x cost reduction on repeated context.

What's next

Dario Amodei claimed AI would write 90% of code within 3-6 months of March 2025. Gartner projects 40% of enterprise apps will use AI agents by end of 2026. The near-term trajectory includes repository intelligence (AI understanding code relationships and history, not just lines), production MCP deployments, and agent monitoring with ROI measurement.

The practical takeaway: developers are becoming AI conductors - using agents for boilerplate and rapid prototyping while applying judgment for architecture, direction, and safety. Reviewing AI-generated code effectively requires deeper understanding, not less. The teams winning are those treating infrastructure as lightweight scaffolding around rapidly evolving model capabilities, and expecting to re-architect as models improve monthly.

Transcript

Vibe Coding

Vibe coding has fundamentally changed how developers interact with AI-assisted tools, shifting from autocomplete suggestions to full agentic workflows where natural language drives software creation. Coined by Andrej Karpathy on February 6, 2025, the term captures a paradigm where developers "fully give in to the vibes, embrace exponentials, and forget that the code even exists." This represents the culmination of four years of evolution - from GitHub Copilot's 2021 line completions to today's autonomous agents capable of multi-hour tasks across entire codebases. The implications are profound: 41% of global code is now AI-generated, Y Combinator reports 25% of startups have 95%+ AI-generated codebases, and tools like Cursor generate nearly a billion lines of code daily.


Karpathy's tweet that named a movement

The term emerged from a viral X (Twitter) post by Andrej Karpathy - former OpenAI co-founder and Tesla AI director - describing his experience using Cursor Composer with Claude Sonnet and SuperWhisper voice transcription. His full definition:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists... I ask for the dumbest things like 'decrease the padding on the sidebar by half' because I'm too lazy to find it. I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it."

Karpathy explicitly noted this approach is "not too bad for throwaway weekend projects, but still quite amusing" - a caveat often lost in subsequent coverage. The tweet garnered 4.5 million views and coverage in The New York Times, Ars Technica, and The Guardian within weeks. By late 2025, Collins Dictionary named "vibe coding" its Word of the Year.

The conceptual foundation traces back to Karpathy's "Software 2.0" essay (2017), which argued neural network weights constitute a new form of code learned from data rather than written by humans. His later "Software 3.0" concept (2024) positioned LLMs as systems programmable via natural language - prompts as the new source code. Vibe coding operationalized this vision: English replaced Python as the primary interface for specifying intent.


GitHub Copilot set the stage over four years

GitHub Copilot's evolution illustrates the technological progression that enabled vibe coding. Released as a technical preview in June 2021 and reaching general availability in June 2022 at $10/month, early Copilot offered line-by-line completions powered by OpenAI Codex (a GPT-3 descendant trained on 159GB of Python from 54 million GitHub repositories). At launch, approximately 40% of Python code in enabled files was AI-generated, with first-suggestion accuracy around 43%.

The transformation accelerated through several key milestones. GPT-4 integration arrived in November 2023, followed by multi-model support in 2024 allowing users to choose between OpenAI, Claude, and Gemini models. The critical shift came on February 6, 2025 (the same day as Karpathy's tweet) with Agent Mode - enabling autonomous multi-file editing with self-correction. By May 2025, GitHub launched its Coding Agent: fully autonomous background workers that spin up cloud environments via GitHub Actions, create draft PRs, and commit asynchronously.

The architectural evolution from "enhanced IntelliSense" to "autonomous agent" required several technical advances: context windows expanding from 4K to 192K+ tokens, reasoning models like o1 and o3 with chain-of-thought capabilities, tool use enabling terminal command execution, and agentic architectures implementing plan-act-observe-iterate loops.


Early tools that pioneered agentic workflows

Before major commercial players dominated headlines, open-source tools established the foundational patterns now standard across the industry.

Aider, created by Paul Gauthier (former Groupon CTO, VP Engineering at Geomagical Labs acquired by IKEA), launched in mid-2023 as a terminal-based AI pair programmer. Its core innovation is RepoMap - using tree-sitter to parse source code ASTs across 100+ languages, providing LLMs with structural context about code relationships without overwhelming context windows. Aider pioneered multiple edit formats (diff, whole-file, unified diff) and introduced Architect Mode, separating planning from execution using a two-model approach. With 39K GitHub stars and 3.9M+ pip installs, Aider now writes 50-88% of its own code in recent releases while maintaining top scores on SWE-Bench.

Cline (originally "Claude Dev"), created by Saoud Rizwan in July 2024, took a different approach as a VS Code extension emphasizing human-in-the-loop approval. Every file change and terminal command requires explicit user consent. The tool features browser automation via Claude's Computer Use capability, checkpoint systems for safe experimentation, and was among the first to implement MCP (Model Context Protocol) support for extensibility. With 56K GitHub stars and enterprise offerings via Cline Bot Inc., it established the approval-gate pattern now common across tools.

Roo Code forked from Cline to add power-user features: custom interaction modes (Code, Architect, Ask, Debug), configuration profiles for different project types, and sophisticated context condensing for large projects. Its Roo Mode Gallery enables sharing custom modes, while Memory Bank integration provides cross-session context.

Other significant early entrants include GPT-Engineer (Anton Osika, June 2023), which evolved into Lovable, now a $6.6B unicorn; Continue (YC S23), offering model-agnostic IDE extensions with agent mode, chat, edit, and autocomplete features; and Smol Developer (Swyx, May 2023), influential for its minimal ~200-line implementation demonstrating "human-centric whole program synthesis."


Claude Code delivers terminal-native agentic development

Claude Code, Anthropic's flagship agentic tool, launched as a research preview on February 24, 2025 alongside Claude 3.7 Sonnet, reaching general availability in May 2025. It's designed as a "low-level, unopinionated" power tool following Unix philosophy - composable, scriptable, and intentionally flexible. Example: tail -f app.log | claude -p "Slack me if you see any anomalies".

The architecture centers on direct environment access: Claude Code inherits your bash environment, gaining access to all installed CLI tools. It reads and writes files directly, executes shell commands (with permission controls), and maintains conversation context across interactions. The tool supports multiple Claude models - Claude Opus 4.5 for complex tasks, Claude Sonnet 4.5 for balanced performance (maintaining 30+ hour focus), and Haiku 4.5 for lightweight tasks - with mid-session switching via /model.

Context management employs sophisticated engineering: 200K tokens standard (1M in beta), automatic compaction triggered at ~75% utilization, CLAUDE.md files for project-specific context loaded on session start, just-in-time context loading using lightweight identifiers rather than front-loading, and subagents spawning separate Claude instances with isolated context windows for parallelized work. The /clear and /compact commands provide manual control.

Operation modes include standard interactive mode, Auto Mode (Shift+Tab) executing without per-action prompts, Plan Mode (Shift+Tab twice) creating plans before coding, headless mode (-p flag) for CI/automation, and background tasks (Ctrl+B) running subagents asynchronously. Integration spans VS Code extensions, browser interface at claude.ai/code, GitHub (@claude in issues/PRs), Slack, and enterprise platforms (AWS Bedrock, Vertex AI, Azure Foundry).

Key differentiators include extended thinking with graduated reasoning budgets ("think" < "think hard" < "think harder" < "ultrathink"), a permission-by-default safety model with allowlist systems supporting wildcard patterns like Bash(npm *), and the Skills system (October 2025) packaging domain expertise loaded automatically by semantic matching. The tool is open-source on GitHub with community plugins.

Pricing offers flexibility: Pro at $20/month provides basic access with Sonnet 4, Max tiers at $100-200/month unlock Opus 4.5 with 5-20× usage, while API pricing runs $3-15/million tokens depending on model. Real-world costs average $6 per developer per day, with 90th percentile under $12. Prompt caching offers 90% savings on repeated context.

User feedback highlights superior tool-calling accuracy (Claude is "post-trained with the same tools it uses"), excellent TDD workflows, and codebase Q&A as a "core onboarding workflow at Anthropic." Google engineer Jaana Dogan reported Claude Code "recreated a year of my team's work in one hour." However, weaknesses include context degradation in long sessions, rate limits frustrating power users, learning curve for terminal workflows, and tendency toward over-engineering requiring explicit "minimal" prompting.


OpenAI Codex operates as a cloud-based parallel agent

OpenAI's new Codex (distinct from the deprecated API model) launched in May 2025 as a cloud-based software engineering agent. Each task runs in an isolated sandbox preloaded with your repository, enabling multiple tasks simultaneously - a fundamentally different architecture from terminal-based tools.

The underlying models have evolved rapidly: codex-1 (o3 optimized for SE), GPT-5-Codex (August 2025), GPT-5.1-Codex-Max (first model natively trained for multi-context operation via "compaction," capable of 24+ hours of independent work), and GPT-5.2-Codex (December 2025) with enhanced long-horizon capabilities and cybersecurity features. Tasks typically take 1-30 minutes, with real-time progress monitoring and verifiable citations from terminal logs and test outputs.

Integration spans the ChatGPT sidebar (Code button for tasks, Ask for questions), Codex CLI for terminal users, VS Code extension, and direct GitHub integration for PR proposals. The tool supports AGENTS.md for repository-specific guidance and MCP for third-party tools. A December 2025 Skills System follows the open Agent Skills standard created by Anthropic and adopted by Microsoft and Cursor.

Codex is included with existing ChatGPT tiers (Plus, Pro, Business, Enterprise) with no separate subscription - a significant pricing advantage over competitors requiring dedicated plans.


Gemini CLI offers the most generous free tier

Google's Gemini CLI, announced June 25, 2025, takes an open-source approach (Apache 2.0) with an architecture using a ReAct loop and built-in tools for filesystem, shell, web fetch, search, memory, and todos. Running on Gemini 2.5 Pro with a 1 million token context window (Gemini 3 Pro rolling out January 2026), it integrates deeply with Google's ecosystem - Cloud Shell, Vertex AI, and VS Code via Gemini Code Assist.

The standout feature is an industry-leading free tier: 60 requests per minute, 1,000 requests per day with personal Google accounts. This makes it the most accessible option for evaluation and personal projects. Paid options include Google AI Studio API keys, Vertex AI for enterprise, and Gemini Code Assist licenses.

Technical capabilities include GEMINI.md context files, headless mode for automation, MCP server integration, and multi-modal support generating videos with Veo and images with Imagen. Being fully open-source enables community inspection and contribution.


Cursor built the dominant agent-first IDE

Cursor, created by Anysphere (founded 2022 by MIT undergraduates), has achieved remarkable scale: $1B+ annualized revenue, 1M+ daily active users, and a $29.3B valuation after raising ~$3.5 billion total. Half of Fortune 500 companies plus OpenAI, Stripe, Spotify, and Midjourney use it.

As a VS Code fork, Cursor maintains extension and keybinding compatibility while adding AI-native capabilities. Cursor 2.0 (October 29, 2025) centered the experience on agents with the proprietary Composer model - trained via reinforcement learning in real codebases, claiming 4x speed over similarly intelligent models with optimization for low latency in agentic loops.

Agent Mode enables autonomous multi-step coding with planning, file edits, and terminal commands - up to 8 agents in parallel via git worktrees. Additional features include Tab completions with a proprietary model, codebase indexing, Plan Mode supporting Mermaid diagrams, an embedded browser tool for DOM inspection and screenshot capture, and Background Agents with 99.9% reliability. The tool supports model mixing - plan with one model, build with another - across OpenAI, Anthropic, Google, and xAI.

Pricing has evolved to a credit-based system causing some user controversy: Free (limited), Pro at $20/month with a $20 credit pool, Pro+ at $60/month (3x usage), Ultra at $200/month (20x usage), Teams at $40/user, and Enterprise with custom pricing. The credit system depletes based on underlying API costs, making costs less predictable than flat subscription competitors.


Windsurf, Augment, and Copilot round out the landscape

Windsurf (originally Codeium) was acquired by Cognition (makers of Devin AI) in July 2025 for $3 billion in OpenAI's first major acquisition. Named a Leader in Gartner's 2025 Magic Quadrant for AI Code Assistants, it features the Cascade AI system with Write, Chat, and Turbo modes, plus in-house SWE-1 models optimized for software engineering. Pricing runs $15-60/month with a credit system, and SWE-1 Lite is free.

Augment Code focuses on large codebases, indexing 400,000+ files with a Context Engine processing 200,000+ tokens. It maintains live understanding across code, dependencies, architecture, and history - optimal for enterprise multi-service architectures. AI Code Review benchmarked highest precision/recall across seven tools. Pricing runs $20-50/month for individuals with enterprise options.

GitHub Copilot continues evolving with Agent Mode (GA in VS Code), Coding Agent for autonomous background work via GitHub Actions, and specialized CLI agents (Explore, Task). The deepest platform integration - Issues, PRs, Actions - plus mature enterprise controls make it the default for organizations already invested in GitHub. Pricing ranges from $10-39/month with premium request systems for advanced models.


MCP standardizes how agents connect to tools

Model Context Protocol, released by Anthropic in November 2024 and now hosted by the Linux Foundation under the Agentic AI Foundation, provides a standardized way for AI applications to connect to external data sources and tools. Think "USB-C for AI" - it solves the N×M integration problem by reducing N applications connecting to M tools from N×M custom integrations to N+M implementations.

The client-server architecture features three core primitives: Tools (model-controlled functions like gdrive.getDocument), Resources (app-controlled data sources), and Prompts (user-controlled pre-crafted instructions). Transport supports both local (STDIO) and remote (HTTP/SSE) communication.

MCP clients now include Claude Desktop, Claude Code, Cursor, Windsurf, Zed, Sourcegraph Cody, and VS Code extensions. Pre-built servers cover Google Drive, Slack, GitHub, Git, Postgres, Puppeteer, and hundreds of community implementations. SDKs exist for Python, TypeScript, C#, and other major languages. With 97M monthly SDK downloads, MCP has become the de facto standard - though competing protocols from IBM (ACP) and Google (A2A) exist.

For context efficiency, Claude Code implements MCP Tool Search: when tool definitions exceed 10% of context, tools are deferred and discovered on-demand rather than preloaded, significantly reducing context bloat.


Skills turn prompt engineering into reusable packages

Skills are prompt-based meta-tools implemented as markdown files that extend agent capabilities through instruction injection. Unlike function calling, skills operate through prompt expansion and context modification - essentially structured SKILL.md files telling Claude how to perform specific tasks.

A skill definition includes YAML frontmatter (name, description, invocation settings) and markdown instructions. Skills can bundle scripts, reference documentation, and templates in a directory structure. Storage spans global (~/.claude/skills/), project-level (.claude/skills/), and plugin-provided locations.

The selection mechanism uses pure LLM reasoning - no regex or keyword matching. The system formats available skills into text, embeds this in the Skill tool's prompt, and lets Claude decide which skill to invoke. Key frontmatter options include disable-model-invocation: true (user-only invocation for side-effect workflows like /deploy) and user-invocable: false (background knowledge only Claude can access).

Practical skills include frontend-design (production UIs avoiding "AI slop" aesthetics), fix-issue (GitHub issue → PR workflow), planning-with-files (Manus-style persistent markdown tracking), and research (parallel subagent spawning for documentation exploration).


Harnesses govern how agents operate reliably

An agent harness is the infrastructure wrapping around an AI model to manage long-running tasks - not the agent itself, but the software system governing reliability, efficiency, and steerability. Phil Schmid's analogy: the model is the CPU, the framework is the operating system, and the harness is the application layer with batteries included.

The core challenge: agents work in discrete sessions with no memory between them. Without a harness, each session "arrives with no memory of what happened on the previous shift." Harnesses solve context bridging, progress tracking, environment cleanup, and tool orchestration.

Anthropic's recommended architecture uses a two-agent solution: an Initializer Agent sets up the environment on first run (creating init.sh, establishing progress logs, generating comprehensive feature lists, initial git commit), while the Coding Agent makes incremental progress each session (reading progress files first, working on ONE feature at a time, testing with browser automation, committing with descriptive messages, updating progress before ending).

Context engineering strategies include: Reduce (compact older tool calls, trajectory summarization), Offload (save results to filesystem, use atomic tools), and Isolate (multi-agent architectures for token-heavy subtasks). KV-cache optimization keeps prompt prefixes stable for 10x cost reduction ($0.30/MTok cached vs $3/MTok uncached on Claude).


The near horizon promises repository intelligence and production MCP

Industry predictions coalesce around several near-term developments. Dario Amodei (Anthropic CEO) claims "AI will be writing 90% of code within 3-6 months" of his March 2025 statement, with "essentially all code within the next year." Gartner projects 40% of enterprise apps will use AI agents by end of 2026, up from under 5% in 2025. METR benchmarks suggest strongest models will hit 50% reliability for 20-hour software tasks if trends hold.

Capabilities arriving in the next 6-12 months include agent-first systems of record (AI taking on CRM, support, intake roles), repository intelligence (AI understanding code relationships and history, not just lines), production MCP deployments moving from demos to daily practice, enhanced voice-first interactions, agent monitoring tools with synthetic user generation and ROI measurement, and agentic commerce via Stripe's API.

Longer-term speculation (2-5 years) encompasses multi-agent systems at scale with specialized agents working in concert, physical AI integration (Hyundai targeting 30,000 humanoid robots by 2028), post-context-window architectures with true long-term memory, and developer role evolution from "writing code" to "orchestrating systems."


Conclusion: The orchestrator role emerges

Vibe coding has crystallized from Karpathy's weekend-project meme into a genuine paradigm shift in software development. The hybrid reality emerging by January 2026 positions developers as AI conductors - using AI for boilerplate and rapid prototyping while applying judgment for direction and safety. This role elevates rather than obsoletes traditional skills: effective review of AI-generated code requires deeper understanding, not less.

The technical ecosystem has matured rapidly around three pillars: MCP as the standardized connection layer (97M monthly SDK downloads), Skills as reusable prompt programs that can be shared across platforms, and harnesses as the real product differentiator - two agents with identical models differ entirely based on harness quality. Context engineering has emerged as the critical discipline, optimizing what enters the model's attention budget at each step.

For experienced developers, the practical path forward combines terminal-native tools like Claude Code or Aider for maximum control with IDE-based options like Cursor when visual feedback accelerates iteration. Build for change - expect to re-architect as models improve monthly - and invest in persistent progress tracking that survives across sessions. The cost of rebuilding is dramatically lower now, and the teams winning are those treating infrastructure as lightweight scaffolding around rapidly evolving model capabilities.