AI Reasoning Framework

Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free

Nathan Lambert • Allen Institute for AI / Interconnects.ai

A four-pillar framework for understanding AI reasoning progress: Skills (achieved), Calibration (critical), Strategy (emerging), Abstraction (frontier). Reveals the 10-100x token waste problem from overthinking and the post-training RL compute revolution growing from 1% to 10%+.

"Gains aren't free."

Nathan Lambert (00:03:57)

Pillar Framework

10-100x

Token Waste

10x

RL Compute Growth

12-18

Months for o1

Executive Summary

The era of "free lunch" AI improvements is over. Nathan Lambert presents a comprehensive taxonomy for understanding AI reasoning progress through four pillars: Skills, Calibration, Strategy, and Abstraction. While skills like math and coding have been largely achieved through reinforcement learning, the remaining three pillars require deliberate, focused effort.

The talk reveals two critical inefficiencies in current AI systems: a 10-100x token waste problem from overthinking (models using excessive tokens for simple tasks) and a massive shift in post-training RL compute from 1% to 10%+ of total training budget. Drawing from OpenAI's o1 development (12-18 months of reasoning trace collection) and DeepSeek's compute scaling (0.18% → 10-20% post-training), Lambert identifies the infrastructure and data requirements for next-generation reasoning.

The framework predicts that planning data will be more accessible than reasoning traces, suggesting a path forward for organizations without massive resources. The core takeaway: AI progress requires deliberate investment across all four pillars, not just model scale.

Core Thesis

"Gains Aren't Free": The End of Easy Progress

Why AI reasoning improvements require deliberate effort across multiple dimensions

The core message I want to convey is that gains aren't free. And I think this is like a fundamental shift from what we've seen in the last year and a half where basically every time we scaled a model we got these improvements.

Nathan Lambert (00:03:57)

Past: Scaling = Free Gains

For 18 months, simply scaling models delivered automatic improvements. More compute, more data, better performance—no specialized work required.

Future: Deliberate Effort

AI reasoning progress now requires targeted investment across Skills, Calibration, Strategy, and Abstraction. Scale alone is no longer sufficient.

The Four-Pillar Taxonomy

Skills (ACHIEVED)

Math, code, tool use through RL with verifiable rewards

Calibration (CRITICAL)

Matching output tokens to task difficulty, solving overthinking

Strategy (EMERGING)

Going in the right direction, knowing when to change course

Abstraction (FRONTIER)

Breaking down problems into tractable subtasks

Pillar 1

Skills: The Achieved Foundation

Math, coding, and tool use—solved through reinforcement learning with verifiable rewards

Skills = Verifiable Rewards

The key to achieving skills is reinforcement learning with clear, objective feedback signals

What Makes Skills Solvable

Skills have clear success criteria: math problems have correct answers, code executes successfully, tools return expected outputs. This verifiability enables effective RL training.

Math

Correct/Incorrect

Code

Passes/Fails

Tools

Success/Error

Why Skills Are "Solved"

The RL + verifiable rewards pattern is well-understood and repeatedly proven. Models can reliably acquire these skills through established training techniques. This is the foundation upon which the other three pillars build.

Pillar 2

Calibration: Solving the Overthinking Crisis

The 10-100x token waste problem: models using excessive computation for simple tasks

Calibration is going to be crucial, which is like these models overthink like crazy.

Nathan Lambert (00:04:48)

The Overthinking Problem

Reasoning models don't know when to stop thinking. They apply massive computational resources to trivial problems, wasting tokens and increasing latency without improving outcomes.

10-100x

Token Waste from Overthinking

Models use 10 to 100 times more tokens than necessary for simple tasks because they can't calibrate their reasoning effort to task difficulty.

What Calibration Means

Matching reasoning effort (output tokens) to task difficulty

The Calibration Goal

A well-calibrated model should use:

Minimal tokens for simple questions

Moderate tokens for medium complexity

Maximum tokens for hard problems

Why This Is Hard

Current reasoning models are trained to "think more = better." The training signal encourages extensive reasoning without teaching models when less thinking is appropriate. Calibration requires new training techniques that reward efficiency.

Pillar 3

Strategy: Knowing When to Change Course

Going in the right direction—and recognizing when you're not

Strategy vs. Skills

Skills are execution; strategy is direction and course correction

Skills (Execution)

• Solving equations correctly
• Writing bug-free code
• Using tools effectively

Strategy (Direction)

• Choosing the right approach
• Recognizing dead ends
• Pivoting when stuck

The Strategy Challenge

Strategy requires meta-cognition: the ability to step back, assess progress, and make deliberate decisions about direction. Current models can execute well but struggle with "is this approach working?" assessments. This is an emerging area of research.

Pillar 4

Abstraction: The Final Frontier

Breaking down complex problems into tractable subtasks

What Is Abstraction?

The ability to decompose problems and build hierarchical solutions

The Abstraction Hierarchy

Level 1

Understand the overall goal

Level 2

Decompose into sub-problems

Level 3

Solve each sub-problem

Level 4

Integrate solutions

Why Abstraction Is Hard

Abstraction requires models to understand problem structure at multiple levels simultaneously. They must identify natural breakpoints, determine what can be solved independently, and recognize how solutions combine. This is the frontier of AI reasoning research.

Infrastructure Shift

The Post-Training RL Compute Revolution

From 1% to 10%+: The massive infrastructure investment powering next-gen reasoning

Scaling RL is a very real thing.

— Nathan Lambert (00:18:54)

Watch explanation

DeepSeek Compute Shift

0.18%

→

10-20%

Post-training compute

DeepSeek increased post-training RL compute by 50-100x, demonstrating the massive investment required for reasoning capabilities.

OpenAI's o1 Timeline

12-18

months

OpenAI spent 12-18 months collecting reasoning traces for o1. This data collection is a major bottleneck in reasoning model development.

The core of it is really stable infrastructure and data.

— Nathan Lambert (00:17:15)

Watch explanation

What Post-Training RL Requires

Reasoning Traces

Extensive data showing how models think through problems

Stable Infrastructure

RL systems that run reliably at massive scale

Massive Compute

10%+ of total training budget dedicated to post-training

Reward Models

Accurate evaluation of reasoning quality

Data Accessibility

Planning Data More Accessible Than Reasoning Traces

A path forward for organizations without OpenAI's resources

The Key Insight

While OpenAI spent 12-18 months collecting reasoning traces for o1, planning data—information about how to approach and structure problems—is far more accessible. This creates a viable path for organizations to improve reasoning capabilities without massive data collection efforts.

Planning Data Examples

• Problem decomposition patterns
• Task hierarchies and dependencies
• Algorithm design templates
• Solution strategy libraries

Reasoning Traces (Hard)

Require recording every step of model thinking. Expensive, time-consuming, and requires specialized infrastructure.

12-18 months for o1

Planning Data (Accessible)

Structured knowledge about problem-solving approaches. Available in textbooks, papers, codebases, and documentation.

Immediately available

Notable Quotes

Top Quotes from the Talk

Verbatim insights on AI reasoning, calibration, and infrastructure

"Gains aren't free."

— Nathan Lambert

Core thesis: AI progress requires deliberate effort

00:03:57

"Calibration is going to be crucial, which is like these models overthink like crazy."

— Nathan Lambert

The overthinking problem in reasoning models

00:04:48

"The core of it is really stable infrastructure and data."

— Nathan Lambert

What makes post-training RL successful

00:17:15

"Scaling RL is a very real thing."

— Nathan Lambert

Post-training RL compute growth from 1% to 10%+

00:18:54

Key Takeaways

Actionable Insights

Practical guidance for AI engineering in the reasoning era

Gains Require Deliberate Effort

The era of free improvements is over

•Scaling alone no longer guarantees progress
•Must invest across all four pillars: Skills, Calibration, Strategy, Abstraction
•Reasoning capabilities require targeted work, not just bigger models

Calibration Is Critical

Solve the overthinking problem

•Models waste 10-100x tokens through overthinking
•Must train models to match effort to task difficulty
•Calibration reduces cost and latency without sacrificing quality

Post-Training RL Is Expensive

Budget 10%+ for reasoning capabilities

•Post-training compute growing from 1% to 10%+ of total budget
•DeepSeek: 0.18% → 10-20% (50-100x increase)
•OpenAI spent 12-18 months collecting reasoning traces for o1

Planning Data Is Accessible

A viable path for resource-constrained teams

•Planning data more available than reasoning traces
•Leverage existing knowledge: textbooks, papers, codebases
•Focus on problem decomposition patterns and solution strategies

Skills Are Solved

Focus on the remaining three pillars

•Math, code, tool use achieved through RL + verifiable rewards
•Well-understood training pattern with clear success criteria
•Build on this foundation for Calibration, Strategy, Abstraction

Infrastructure Matters

Stable RL systems are essential

•Post-training RL requires robust, scalable infrastructure
•Data pipelines for reasoning traces are complex
•Invest in infrastructure before attempting reasoning model training

Video Navigation

Key Video Timestamps

Jump to specific sections of the talk

Introduction- The shift from scaling to deliberate effort 'Gains Aren't Free'- Core thesis: easy progress is over Calibration Crisis- 10-100x token waste from overthinking Four-Pillar Framework- Skills, Calibration, Strategy, Abstraction Pillar 1: Skills- Achieved through RL with verifiable rewards Pillar 2: Calibration- Solving the overthinking problem Pillar 3: Strategy- Emerging: knowing when to change course Pillar 4: Abstraction- Frontier: problem decomposition Post-Training RL- Infrastructure and data requirements Scaling RL- DeepSeek and OpenAI compute shifts Planning Data- More accessible than reasoning traces Conclusion- The path forward for reasoning AI

Source Video

Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free

Speaker: Nathan Lambert, Allen Institute for AI / Interconnects.ai
Duration: ~19 minutes
Event: AI Engineer Conference

Watch on YouTube