AI Engineering Insights

Moore's Law for AI Agents: 70-Day Doubling in Code

Scott Wu (Cognition CEO) presents a framework for understanding exponential AI agent growth, with capabilities doubling every 70 days. Learn about the evolution from tab completion to full AI engineer and why we'll see another 16-64x improvement in the next year.

The amount of work that an AI agent can do in code goes something between 16 and 64x in a year every year at least for the last couple years that we've seen. The doubling time is about every 70 days.

— Scott Wu, CEO of Cognition (00:01:24)

70 days

AI capability doubling cycle in code

16-64x

Annual improvement per year

18 months

From tab completion to AI engineer

The 4-Tier Evolution of AI Coding Capabilities

In just 18 months, AI coding capabilities evolved from simple autocomplete to autonomous software engineers. Here's the timeline Scott Wu presented:

Tab Completion

18 months ago

The only product experience with PMF in code was single-line completion. Tools like GitHub Copilot dominated, predicting the next line based on context.

"18 months ago, I would say the only really the only product experience that had PMF in code was just tab completion... It was just like here's what I have so far. Predict the next line for me."

Watch (00:02:15)

Repetitive Migrations

Summer 2024

AI agents excelled at large-scale, repetitive tasks—JavaScript to TypeScript conversions, version upgrades, migrating 10,000+ files. These were "the easiest thing for AI, which was cool actually because it was also the most annoying thing for humans to do."

"10,000 file migrations were the easiest thing for AI, which was cool actually because it was also the most annoying thing for humans to do."

Watch (00:04:10)

Isolated Bugs & Features

Fall 2024

AI reached "intern-level" capability for isolated tasks—fixing bugs, implementing features that spanned multiple files but were well-defined. This is where Devin 2.0's "Playbooks" system became critical for reliable instruction following.

Key capability: Understanding multi-file changes, knowing when to ask for help, and maintaining confidence estimation.

Watch (00:13:20)

Full AI Engineer

Current - Winter 2024

The turning point: "when you could just tag Devin in Slack and say 'we've got this bug, please take a look' or 'could you build this thing'—and it would just do it." Autonomous execution of complex multi-hour tasks with minimal supervision.

"Often you want to be able to have points where you closely monitor Devin for 10% of the task, 20% of the task and then have it do work on its own for the other 80-90%."

Watch (00:12:42)

Understanding "Moore's Law for AI Agents"

The Framework

Scott Wu introduces a simple but powerful metric: measure AI capability by "how much uninterrupted work can the agent do before human intervention is needed?"

The Math:

• Doubling every 70 days = ~5 doublings/year
• 2⁵ = 32x average (within 16-64x range)
• Consistent for "at least the last couple years"

Why Code is Faster

"The doubling time is about every seven months which already is pretty crazy actually. But in code it's actually even faster. It's every 70 days."

Comparison:

• General AI: ~7 month doubling cycle
• Code-specific AI: ~70 day cycle (3x faster)
• Structural advantages in code (tests, clear objectives)

Watch (00:01:24)

The Bold Prediction: Next 12 Months

Scott doesn't see this slowing down. In fact, he predicts another 16-64x improvement over the next year.

"We're going to see another 16 to 64x over the next 12 months as well."

Implications: If 70-day doubling continues, by December 2025 we could see AI agents capable of handling tasks that would take 32x longer than what's possible today. Projects requiring weeks of autonomous work might become hours.

Watch (00:15:55)

The Shifting Bottleneck Problem

One of Scott's most important insights: each capability jump creates entirely new bottlenecks. What matters changes every 2-3 months.

Every time you get to the next tier, the bottleneck that you're running into or the most important capability or the right way you should be interfacing with it... actually change at each point.

Watch Scott explain (00:02:53)

Tier 1 Bottleneck: Text Prediction

Challenge: Accurately predicting next tokens given limited context

Solved

Tier 2 Bottleneck: Scale & Repetition

Challenge: Maintaining consistency across thousands of files

Solved

Tier 3 Bottleneck: Instruction Following

Challenge: Reliable execution of complex multi-step tasks (solved with Playbooks)

Largely Solved

Tier 4 Bottleneck: Knowledge & Memory

Challenge: Learning from feedback across tasks, organizational context

Active Focus

Tier 5 Bottleneck: Testing & Validation

Challenge: Self-testing, interpreting results, knowing what to test

Active Focus

Tier 6 Bottleneck: Project-Level Orchestration

Challenge: Coordinating entire projects, "what goes after that?"

Next Frontier

Technical Architecture: How Devin 2.0 Works

The Playbooks System

Reliable instruction following for complex multi-step tasks. Playbooks encode procedural knowledge that ensures consistent execution.

Critical for Tier 3 capabilities—moving from single-file changes to multi-file coordination.

Knowledge & Memory

Learning from human feedback across tasks. Devin improves over time by remembering corrections and adapting to organizational patterns.

Addresses the "30th day is dramatically better than day one" problem.

Confidence Estimation

Agents must know when they understand a task well enough to execute autonomously vs when to ask for help.

"Rather than just going off and doing things immediately, you have to be able to say, okay, I'm quite sure that this is the task and I'm going to go execute it now versus I don't understand what's going on. Human, please give me help."

Watch (00:13:44)

Self-Testing & Iteration

The era where testing "gets really really important." Agents need iterative loops to validate their own work before delivering PRs.

"It's just a much higher context problem to solve... is this testing itself?"

Watch (00:14:05)

Human-AI Collaboration: The 10-20% Supervision Rule

Often you want to be able to have points where you closely monitor Devin for 10% of the task, 20% of the task and then have it do work on its own for the other 80-90%.

This is the optimal collaboration pattern for AI agents in 2024/2025: close supervision at key decision points, autonomy for execution. Not "hands-off" but "selective oversight."

Watch (00:12:42)

Key Decision Points

Monitor at 10-20%: task understanding, architecture decisions, approach selection

Autonomous Execution

Let agent work independently for 80-90%: implementation, testing, refinement

Final Review

Human validates final output, provides feedback for learning

What's Next: Beyond Single Tasks

The Next Frontier: Project-Level Orchestration

Scott teases what comes after Tier 4 (full AI engineer): moving from single autonomous tasks to entire project execution.

"Now what we're thinking about is hey maybe if instead of doing it just one task it's you know how how do we think about tackling an entire project right and after we do a project you know what what goes after that"

Implication: The shift from "build this feature" to "own this project." AI agents that can coordinate multiple features, manage dependencies, and understand project-level objectives.

Watch (00:14:40)

Short-Term (Next 6 Months)

• Improved self-testing and validation
• Better knowledge/memory systems
• Enhanced confidence estimation
• More sophisticated debugging

Medium-Term (6-12 Months)

• Project-level task orchestration
• Multi-repo understanding
• Architectural decision-making
• "16-64x" capabilities realized

Key Takeaways for Engineering Leaders

1. Exponential Growth is Real

The 70-day doubling cycle has been consistent for "at least the last couple years." Plan for 16-64x capability improvements annually.

Action: Update roadmap quarterly

2. Bottlenecks Shift Every 2-3 Months

What limits AI capabilities changes rapidly. Don't over-optimize for today's bottleneck—focus on adaptable infrastructure.

Action: Build flexible systems

3. Validation is the Limiter

"The limiter is not the capability of the coding agent. The limit is your organization's validation criteria." Invest in tests, linting, types.

Action: Strengthen validation first

4. 10-20% Supervision Model

Optimal human-AI collaboration: close oversight at decision points, autonomy for execution. Not hands-off, but selective.

Action: Train teams on supervision patterns

5. From Tab Completion to Teammate

18 months from autocomplete to autonomous engineer. The shift from "tool" to "teammate" happened faster than anyone predicted.

Action: Reorganize teams around AI collaboration

6. Testing is the New Bottleneck

"The era where testing and this asynchronous testing gets really really important." Self-testing capabilities are now critical.

Action: Prioritize automated testing infrastructure

Source Video

Devin 2.0 and the Future of SWE

Scott Wu, CEO of Cognition • AI Engineer Summit

Video ID: MI83buT_23o•Duration: ~16 minutes

Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with fact-checking against external sources.

Disclaimer: "Moore's Law for AI Agents" is Scott Wu's framework, not an industry-standard metric. Quantitative claims (70-day doubling, 16-64x growth) represent Cognition's internal observations and should be attributed to the speaker rather than presented as objective facts.