Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free
Nathan Lambert • Allen Institute for AI / Interconnects.ai
A four-pillar framework for understanding AI reasoning progress: Skills (achieved), Calibration (critical), Strategy (emerging), Abstraction (frontier). Reveals the 10-100x token waste problem from overthinking and the post-training RL compute revolution growing from 1% to 10%+.
"Gains aren't free."Nathan Lambert (00:03:57)
Executive Summary
The era of "free lunch" AI improvements is over. Nathan Lambert presents a comprehensive taxonomy for understanding AI reasoning progress through four pillars: Skills, Calibration, Strategy, and Abstraction. While skills like math and coding have been largely achieved through reinforcement learning, the remaining three pillars require deliberate, focused effort.
The talk reveals two critical inefficiencies in current AI systems: a 10-100x token waste problem from overthinking (models using excessive tokens for simple tasks) and a massive shift in post-training RL compute from 1% to 10%+ of total training budget. Drawing from OpenAI's o1 development (12-18 months of reasoning trace collection) and DeepSeek's compute scaling (0.18% → 10-20% post-training), Lambert identifies the infrastructure and data requirements for next-generation reasoning.
The framework predicts that planning data will be more accessible than reasoning traces, suggesting a path forward for organizations without massive resources. The core takeaway: AI progress requires deliberate investment across all four pillars, not just model scale.
"Gains Aren't Free": The End of Easy Progress
Why AI reasoning improvements require deliberate effort across multiple dimensions
The core message I want to convey is that gains aren't free. And I think this is like a fundamental shift from what we've seen in the last year and a half where basically every time we scaled a model we got these improvements.Nathan Lambert (00:03:57)
Past: Scaling = Free Gains
For 18 months, simply scaling models delivered automatic improvements. More compute, more data, better performance—no specialized work required.
Future: Deliberate Effort
AI reasoning progress now requires targeted investment across Skills, Calibration, Strategy, and Abstraction. Scale alone is no longer sufficient.
The Four-Pillar Taxonomy
Math, code, tool use through RL with verifiable rewards
Matching output tokens to task difficulty, solving overthinking
Going in the right direction, knowing when to change course
Breaking down problems into tractable subtasks
Skills: The Achieved Foundation
Math, coding, and tool use—solved through reinforcement learning with verifiable rewards
Skills = Verifiable Rewards
The key to achieving skills is reinforcement learning with clear, objective feedback signals
What Makes Skills Solvable
Skills have clear success criteria: math problems have correct answers, code executes successfully, tools return expected outputs. This verifiability enables effective RL training.
Why Skills Are "Solved"
The RL + verifiable rewards pattern is well-understood and repeatedly proven. Models can reliably acquire these skills through established training techniques. This is the foundation upon which the other three pillars build.
Calibration: Solving the Overthinking Crisis
The 10-100x token waste problem: models using excessive computation for simple tasks
Calibration is going to be crucial, which is like these models overthink like crazy.Nathan Lambert (00:04:48)
The Overthinking Problem
Reasoning models don't know when to stop thinking. They apply massive computational resources to trivial problems, wasting tokens and increasing latency without improving outcomes.
Models use 10 to 100 times more tokens than necessary for simple tasks because they can't calibrate their reasoning effort to task difficulty.
What Calibration Means
Matching reasoning effort (output tokens) to task difficulty
The Calibration Goal
A well-calibrated model should use:
Why This Is Hard
Current reasoning models are trained to "think more = better." The training signal encourages extensive reasoning without teaching models when less thinking is appropriate. Calibration requires new training techniques that reward efficiency.
Strategy: Knowing When to Change Course
Going in the right direction—and recognizing when you're not
Strategy vs. Skills
Skills are execution; strategy is direction and course correction
Skills (Execution)
- • Solving equations correctly
- • Writing bug-free code
- • Using tools effectively
Strategy (Direction)
- • Choosing the right approach
- • Recognizing dead ends
- • Pivoting when stuck
The Strategy Challenge
Strategy requires meta-cognition: the ability to step back, assess progress, and make deliberate decisions about direction. Current models can execute well but struggle with "is this approach working?" assessments. This is an emerging area of research.
Abstraction: The Final Frontier
Breaking down complex problems into tractable subtasks
What Is Abstraction?
The ability to decompose problems and build hierarchical solutions
The Abstraction Hierarchy
Why Abstraction Is Hard
Abstraction requires models to understand problem structure at multiple levels simultaneously. They must identify natural breakpoints, determine what can be solved independently, and recognize how solutions combine. This is the frontier of AI reasoning research.
The Post-Training RL Compute Revolution
From 1% to 10%+: The massive infrastructure investment powering next-gen reasoning
DeepSeek Compute Shift
DeepSeek increased post-training RL compute by 50-100x, demonstrating the massive investment required for reasoning capabilities.
OpenAI's o1 Timeline
OpenAI spent 12-18 months collecting reasoning traces for o1. This data collection is a major bottleneck in reasoning model development.
The core of it is really stable infrastructure and data.
— Nathan Lambert (00:17:15)
Watch explanationWhat Post-Training RL Requires
Reasoning Traces
Extensive data showing how models think through problems
Stable Infrastructure
RL systems that run reliably at massive scale
Massive Compute
10%+ of total training budget dedicated to post-training
Reward Models
Accurate evaluation of reasoning quality
Planning Data More Accessible Than Reasoning Traces
A path forward for organizations without OpenAI's resources
The Key Insight
While OpenAI spent 12-18 months collecting reasoning traces for o1, planning data—information about how to approach and structure problems—is far more accessible. This creates a viable path for organizations to improve reasoning capabilities without massive data collection efforts.
Planning Data Examples
- • Problem decomposition patterns
- • Task hierarchies and dependencies
- • Algorithm design templates
- • Solution strategy libraries
Reasoning Traces (Hard)
Require recording every step of model thinking. Expensive, time-consuming, and requires specialized infrastructure.
Planning Data (Accessible)
Structured knowledge about problem-solving approaches. Available in textbooks, papers, codebases, and documentation.
Top Quotes from the Talk
Verbatim insights on AI reasoning, calibration, and infrastructure
"Gains aren't free."
"Calibration is going to be crucial, which is like these models overthink like crazy."
"The core of it is really stable infrastructure and data."
"Scaling RL is a very real thing."
Actionable Insights
Practical guidance for AI engineering in the reasoning era
Gains Require Deliberate Effort
The era of free improvements is over
- •Scaling alone no longer guarantees progress
- •Must invest across all four pillars: Skills, Calibration, Strategy, Abstraction
- •Reasoning capabilities require targeted work, not just bigger models
Calibration Is Critical
Solve the overthinking problem
- •Models waste 10-100x tokens through overthinking
- •Must train models to match effort to task difficulty
- •Calibration reduces cost and latency without sacrificing quality
Post-Training RL Is Expensive
Budget 10%+ for reasoning capabilities
- •Post-training compute growing from 1% to 10%+ of total budget
- •DeepSeek: 0.18% → 10-20% (50-100x increase)
- •OpenAI spent 12-18 months collecting reasoning traces for o1
Planning Data Is Accessible
A viable path for resource-constrained teams
- •Planning data more available than reasoning traces
- •Leverage existing knowledge: textbooks, papers, codebases
- •Focus on problem decomposition patterns and solution strategies
Skills Are Solved
Focus on the remaining three pillars
- •Math, code, tool use achieved through RL + verifiable rewards
- •Well-understood training pattern with clear success criteria
- •Build on this foundation for Calibration, Strategy, Abstraction
Infrastructure Matters
Stable RL systems are essential
- •Post-training RL requires robust, scalable infrastructure
- •Data pipelines for reasoning traces are complex
- •Invest in infrastructure before attempting reasoning model training
Key Video Timestamps
Jump to specific sections of the talk
Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free
Speaker: Nathan Lambert, Allen Institute for AI / Interconnects.ai
Duration: ~19 minutes
Event: AI Engineer Conference
