AI Engineer Summit

AI Agents

Building Cursor Composer

4x token efficiency, parallel tool calling breakthrough, and the infrastructure challenges of applying reinforcement learning to coding agents.

Lee Robinson

Cursor

AI Engineer Summit

"Cursor Composer is a model designed for real world software engineering and it tries to be both fast and smart. It's about four times more efficient at token generation than models at a similar level of intelligence."

Lee Robinson, Cursor (00:00:44)

Token efficiency

3.5x

Training speedup

100K-1M

Tokens per rollout

100s

Tool calls

Executive Summary

Cursor's journey into model development

Lee Robinson presents Cursor Composer, Cursor's first agent model designed specifically for real-world software engineering. Building on their success with Tab—their low-latency autocomplete model—the Cursor team ventured into agent models to achieve both speed and intelligence. The result is a model that delivers 4x better token efficiency than models at similar intelligence levels, while introducing breakthrough features like parallel tool calling.

The talk provides an unvarnished look at the substantial infrastructure challenges of applying reinforcement learning to coding agents. From training-inference parity issues to rollout complexity at scale (100K-1M tokens, hundreds of tool calls), Robinson demonstrates how Cursor's vertical integration strategy—owning the IDE, model, and cloud infrastructure—enables unique co-design opportunities and infrastructure reuse. The presentation concludes with research findings on semantic search, showing dramatic performance improvements when models train in the same environment they use at inference time.

Product Strategy

Why Cursor entered the model space

"Obviously cursor has an IDE. Why are we getting into the model space?"

Strategic question about entering the model space

Watch (00:01:12)

"We wanted to take that same approach for a very low latency model and apply it to coding with agents."

Connecting autocomplete success to agent models

Watch (00:01:22)

"We needed it to be smart and fast. Definitely needed to be smart. Not really smart enough yet to be a daily driver for a lot of their coding."

Early feedback on the model

Watch (00:01:49)

From Autocomplete to Agents

Cursor's successful low-latency autocomplete model (Tab) provided the foundation for their agent model approach. The same optimization principles—speed, efficiency, tight IDE integration—were applied to agents, but with substantially greater complexity.

Breakthrough Moment

Parallel tool calling changed everything

"One big change here that helped actually push this towards a level where we had a checkpoint where people would use it was being able to call tools in parallel."

The breakthrough feature

Watch (00:02:20)

"This has been a quite different programming experience for me... firing off an agent and waiting, let's call it 20 minutes for it to complete where you can kind of context switch away. This really does help keep you in the flow."

Flow state vs traditional agent workflows

Watch (00:03:09)

Before: Serial Execution

Agents called tools one at a time, waiting for each response before proceeding. Slow, inefficient, broke developer flow state.

After: Parallel Execution

Agents autonomously decide when to call multiple tools simultaneously. Fast, efficient, maintains developer flow state.

Flow State Design

Fast interaction keeps developers in the zone

The 20-Minute Problem

Traditional agents require firing off a task and waiting 20 minutes for completion, forcing context switches. Composer enables fast, iterative interaction that keeps developers in flow state—similar to autocomplete but for multi-step tasks.

Traditional Agent Workflow

20 min

Average wait time with context switching

Composer Workflow

Fast

Iterative interaction maintains flow state

Three Infrastructure Challenges

Applying RL to coding agents at scale

"The challenges come from when you take the simple idea and then you try to scale it up to a very large amount."

Scale challenges in RL

Watch (00:04:40)

"Models are going to use hundreds of thousands to millions of tokens. They're going to make hundreds of different tool calls."

Rollout scale

Watch (00:05:12)

"All of the solutions coincidentally are actually infrastructure problems."

Infrastructure nature of ML challenges

Watch (00:05:54)

Training/Inference Parity

Training uses large MoE model across thousands of GPUs with bursty compute, while inference handles standard production requests. Solution: Custom kernels and low-precision training.

Rollout Complexity

Each rollout involves 100K-1M tokens and hundreds of tool calls with variable completion times. Solution: Load balancing across threads and processes.

Environment Consistency

Training must match production environment for best results. Training has bursty compute vs production's steady state. Solution: Cloud VM orchestration and environment servers.

Infrastructure as ML

"All of the solutions coincidentally are actually infrastructure problems." The challenges of applying RL to coding agents manifest as infrastructure challenges: load balancing, VM orchestration, custom kernels, training-inference parity. The solutions require infrastructure engineering, not just model improvements.

Technical Solutions

Custom kernels, load balancing, and environment consistency

"TL;DR here is we found for the mixture of experts layer was about three and a half times faster... on Nvidia Blackwell chips."

Hardware optimization results

Watch (00:07:13)

"We developed a library of custom kernels that allowed for very low precision training."

Custom kernels approach

Watch (00:06:50)

"We're trying to mirror the cursor production environment as close as we possibly can."

Training-inference consistency

Watch (00:03:57)

Three-Server Architecture

Training Server
PyTorch-based model updates
Inference Server
Ray-based rollouts and tool calling
Environment Servers
Simulates Cursor production environment

Key Tools

Read/Edit Files
File system operations
Semantic Search
Codebase understanding
Terminal/Shell
Command execution
Lints
Code quality checks

Optimizations

Custom Kernels
Low-precision training
Load Balancing
Across threads/processes
VM Orchestration
Environment consistency

Vertical Integration Advantage

Co-design when you own the full stack

"One thing that's nice about having both the coding agent, the IDE, as well as what we're doing with the model research is we can kind of co-design these things together."

Co-design advantage

Watch (00:08:24)

"This is the perfect infrastructure for RL and our use in training."

Infrastructure reuse

Watch (00:08:52)

Co-Design Benefits

IDE and model evolve together. Product features inform model training, model capabilities unlock new IDE features.

Infrastructure Reuse

Cloud agents VM infrastructure built for product also powers RL training. Dual-purpose investment.

Semantic Search Impact

Tool mastery through training-inference consistency

"Semantic search not only helped basically every single model inside of the cursor agent harness, but it was particularly helpful with composer."

Research findings

Watch (00:10:12)

"The model kind of becomes a power user of this tool which is really effective."

Tool mastery through training

Watch (00:10:27)

"Like we trained composer in the exact same environment that we're using at inference time."

Environment consistency impact

Watch (00:10:19)

Training-Inference Consistency

Semantic search helped every model in Cursor's agent harness, but was particularly effective with Composer. The reason: Composer was trained in the exact same environment used at inference time. The model "becomes a power user of this tool" through consistent exposure during training. This demonstrates a broader principle: training-inference consistency dramatically improves tool usage.

Key Takeaways

Practical insights for AI engineers

Parallel Tool Calling Breakthrough

The feature that made Composer viable as a daily driver. Enables autonomous agent decisions about serial vs parallel execution.

4x Token Efficiency

Composer achieves 4x better token efficiency than models at similar intelligence levels through custom training and optimization.

Infrastructure as ML

All ML challenges manifest as infrastructure problems. The solutions require infra work, not just model improvements.

Training-Inference Consistency

Training in the same environment used at inference time dramatically improves performance. The model becomes a power user of available tools.

Key Timestamps

Navigate to specific moments in the talk