Code World Model: Building World Models for Computation
Jacob Kahn from Meta FAIR presents Code World Model (CWM) - a 32B parameter model that explicitly models program execution dynamics rather than just syntax. Learn how execution tracing, asynchronous RL with mid-trajectory updates, and "neural debugging" enable AI to simulate code execution without running it, effectively approximating solutions to the halting problem.
Jacob Kahn
Meta FAIR (Fundamental AI Research)
Our primary goal is to build models that reason, plan and make decisions. And we start with code because it's an interesting sandbox in which to think about reasoning, right? It's constrained.
— Jacob Kahn, Meta FAIR (00:00:44)
Parameters
Training Tokens
Source Available
Minutes Talk
Syntax vs. Execution: The Fundamental Shift
Traditional code models treat code as syntax - tokens in an editor. Jacob Kahn argues this is wrong. Instead, we should model what code does when it executes.
Is code literally the syntax in your editor or is it something else? And if you think about it, all a model sees that is operating on code is just syntax, right?... But what if we instead modeled execution more explicitly?Watch (00:02:07)
Traditional: Syntax Modeling
Code = tokens, ASTs, syntax trees. Models predict next token based on patterns in code structure. No understanding of what code actually does.
CWM: Execution Modeling
Code = state transitions, variable changes, control flow. Models predict program states through execution traces. Understands runtime behavior.
Execution Tracing: The Heart of CWM
CWM represents code execution as a trace - a sequence of program states with frame separators, local variables, and their values. This explicit representation lets models simulate execution without running code.
We'll have some frame separator which will denote distinct lines of execution. And we'll actually explicitly have local variables... And this is something we could essentially feed to a model.Watch (00:02:44)
# Example Execution Trace Format
---
x = 5
{x: 5}
---
y = x * 2
{x: 5, y: 10}
---
z = y + 3
{x: 5, y: 10, z: 13}
Frame Separators
Distinct markers delineating each line of execution, creating clear boundaries between program states.
Local Variables
Explicit tracking of variable names and memory contents at each execution step.
Line-by-Line Tracking
Complete execution history from start to finish, capturing all state changes.
World Models: Simulate Before You Execute
In traditional agentic loops, models Think → Act → observe Feedback → Repeat. World models change this equation: simulate actions in your head before executing them.
With a world model, maybe we can actually simulate. We can imagine that action. we can get feedback in our imagined environment. So we could actually generate execution traces about a program without executing it.Watch (00:04:30)
Without World Model
- 1. Agent thinks about action
- 2. Agent executes action
- 3. Environment provides feedback
- 4. Agent learns from result
- 5. Repeat (expensive!)
With World Model
- 1. Agent imagines action
- 2. World model simulates outcome
- 3. Agent evaluates imagined result
- 4. Select best action to execute
- 5. Execute once (efficient!)
The Efficiency Gain
Instead of executing code hundreds of times to find the right approach, CWM can simulate execution internally. For expensive operations (compiling large codebases, running distributed systems, database queries), this simulation is orders of magnitude faster than actual execution.
Model Architecture & Training Pipeline
CWM is a 32 billion parameter dense transformer, trained on 200+ billion tokens. What makes it unique is the training pipeline - particularly the asynchronous RL setup with mid-trajectory model updates.
Model Specs
- • Architecture: Dense transformer
- • Parameters: 32 billion
- • Training tokens: 200+ billion
- • Focus: Research-focused, publicly available
- • Specialty: Long context for reasoning
Training Stages
- 1. Pre-training (few trillion tokens)
- 2. Domain-specific mid-training
- 3. Long context mid-training
- 4. Instruction following fine-tuning
- 5. Joint RL + agentic reasoning
The "Punches Above Its Weight" Moment
"We process about 200 and some billion tokens. And this scale works really well. It produces a strong model, a strong open model. It's a pretty small model. It punches above its weight."Watch (00:17:35)
Bash-First Philosophy
CWM is a "very bash-oriented model." Instead of specialized tools for every operation, it learns to use the terminal effectively.
This approach provides more flexibility than tool-specific models. With fewer tools but deeper bash mastery, CWM can compose operations in ways specialized tools can't anticipate.
Technical Breakthrough: Asynchronous RL with Mid-Trajectory Updates
This is the most innovative part of CWM's training. The team solved the producer-consumer pipeline problem with mid-trajectory model updating - updating the model while it's actively interacting with environments.
So one interesting feature of this which is increasingly common is that we're actually updating models mid trajectory... So I have a model which we're sampling from. It's interacting with the environment. And I might actually update that model while it's interacting with the environment.Watch (00:15:45)
Traditional RL Pipeline
- 1. Generate trajectories with Model v1
- 2. Complete all trajectories
- 3. Train Model v2 on complete data
- 4. Deploy Model v2
- Problem: Waiting wastes compute
CWM Asynchronous Pipeline
- 1. Generate trajectories continuously
- 2. Queue checkpoints mid-trajectory
- 3. Update model on-the-fly
- 4. Resume with improved model
- Result: Maximum throughput
The Producer-Consumer Solution
- • Samplers generate trajectories continuously (producers)
- • Trainers consume queued trajectories (consumers)
- • Eager checkpointing saves model state mid-execution
- • Mid-trajectory updates apply improvements without restarting
- • No idle time - models always training or sampling
Neural Debugger: Debugging by Simulation
CWM's execution tracing capabilities enable a new paradigm: neural debugging. Instead of setting breakpoints and stepping through code, you ask CWM to simulate execution and show you what happens.
CWM traces code really well, right?... I can actually give it a function and then it can go and trace line by line that function with very very high accuracy. It can show me the values of local variables at certain points again with a lot of precision.Watch (00:19:05)
Traditional Debugger
- • Set breakpoints in code
- • Execute code in real environment
- • Inspect variables at breakpoints
- • Step through line-by-line
- • Slow for complex systems
Neural Debugger (CWM)
- • Describe what you want to understand
- • CWM simulates execution internally
- • Shows variable states at each line
- • No actual execution required
- • Instant for any code
Example: Debugging with Question Marks
# Traditional debugging
def process(data):
result = transform(data)
return optimize(result)
# What's the value of 'result' here?
# Neural debugging with CWM
def process(data):
result = transform(data)
# CWM shows result values inline
return optimize(result)
Composing Code Side-by-Side
"I can think about a neural debugger on top of a model. Traditionally, right, I have a piece of code. I don't know what I want to write. I put some question marks... With CWM, I can express those things very naturally in line with code."Watch (00:20:25)
Approximating the Impossible: The Halting Problem
The halting problem is a fundamental computer science problem: determining whether a program will finish running or continue forever. Alan Turing proved this is mathematically undecidable. But CWM offers something remarkable: a practical approximation through simulation.
The halting problem we know is this very fundamental problem where we don't know if a program is going to halt to stop executing to terminate.
On the classical computer science problem
Watch (00:22:45)Theoretical Impossibility
To determine if a program halts, you'd need to simulate its entire execution. If it doesn't halt, that simulation takes forever.
"In order to know if a program halts, we would have to simulate the entire execution of the program which if it didn't halt would take forever."
CWM's Practical Approach
Instead of perfect decision, CWM simulates execution patterns and predicts behavior based on learned patterns.
"Can I concretely reason about program execution dynamics in this sense?"
Real-World Impact
"I could use this to debug a huge distributed system where executing code is very very expensive or even an expensive function on a single machine... The ability to have an implicit world model internally where I'm simulating what's happening with a piece of code... gives me the ability to reason about it without executing otherwise expensive things."Watch (00:25:15)
Why This Matters
While we can't solve the halting problem perfectly, we can approximate it well enough for practical use. Debugging distributed systems, analyzing performance bottlenecks, and understanding complex code dynamics - all without executing a single line of code. This transforms previously impossible problems into tractable ones.
Key Takeaways for AI Engineers
1. Execution > Syntax
Modeling what code does (execution dynamics) provides richer understanding than modeling what code looks like (syntax). CWM's execution traces capture program semantics, not just structure.
2. World Models Enable Efficiency
Simulating actions before executing them dramatically improves agentic efficiency. Generate execution traces without running code - invaluable for expensive operations.
3. Small Models Can Be Powerful
32B parameters with focused training outperforms larger models on code tasks. "Punches above its weight" - quality training data > raw scale.
4. Bash as Universal Tool
Fewer specialized tools + deeper terminal mastery = more flexibility. Bash-oriented models can compose operations in ways specialized tools can't anticipate.
5. Mid-Trajectory Updates Scale
Asynchronous RL with eager checkpointing enables unprecedented throughput. Update models during active execution instead of waiting for trajectory completion.
6. Neural Debugging is Possible
Models can trace code execution with high precision, enabling new debugging paradigms. Simulate execution without running code - invaluable for distributed systems.
7. Approximate Impossible Problems
While theoretical limits exist, practical approximations of undecidable problems (like halting) are achievable through pattern recognition and simulation.
8. Open Research Benefits Community
CWM is publicly available on Hugging Face with code and technical report. Open research accelerates field-wide progress and enables practical applications.
Technical Implementation Details
Data Collection Pipeline
- • GitHub data at massive scale
- • PR mutations and predicted changes
- • Running CI/tests on passing repos
- • Generating execution traces
- • Repository-level context
LLM Integration
- • Coupling execution traces with autoregressive LLMs
- • Chain-of-thought style reasoning
- • State and action-to-state functions
- • Predicting next execution states
- • End-to-end differentiable
Agent Environment
- • Bash-oriented model
- • GitHub issue resolution workflow
- • End-to-end bash-based learning
- • Terminal as primary interface
- • Fewer, more flexible tools
Bootstrapping with SFT
- • Supervised fine-tuning before RL
- • Rejection sampling from failed traces
- • Learning from failure modes
- • Explicit grab functions for navigation
- • Quality-focused data curation
Availability & Resources
Model available for download and inference
Complete inference code provided
Comprehensive research documentation
Source Video
Code World Model: Building World Models for Computation
Jacob Kahn • Meta FAIR (Fundamental AI Research) • AI Engineer Conference
Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with transcript-analyzer agent, reading the complete VTT file.
Technologies & Concepts: World Models, Execution Tracing, Asynchronous RL, Mid-Trajectory Updates, Neural Debugger, Halting Problem, Meta FAIR, Hugging Face, Transformer Architecture, Reinforcement Learning, Bash-Oriented Agents, Code Simulation