OpenAI Engineering

Future-Proof Coding Agents

How to Build AI That Writes Code and Survives Rapid Model Evolution

Software engineering can be seen as a universal medium for problem solving. Coding is one of the most active frontiers in applied AI, and it's really a signal on how close we are to AGI.

— Bill Chen • 01:37

17 min

Talk Duration

Packed with insights

3 Parts

Agent Anatomy

UI + Model + Harness

Codeex

OpenAI's Agent

SDK available

Trillions

Tokens/Week

Fastest growing model

Executive Summary

Building coding agents that survive rapid model evolution requires understanding the critical "harness" layer—the interface between models and users that manages prompts, tools, context, and execution. While models like GPT-5.1 get all the attention, the harness is where the real engineering value lies.

Bill Chen and Brian Fioca from OpenAI's startups team reveal that coding agents have three components: a user interface, a model, and a harness. The harness is "really the interface layer to the model"—a collection of prompts and tools combined in a core agent loop that provides input and outputs from the model.

Codeex, OpenAI's coding agent, serves "dozens of trillions of tokens per week" and has doubled in usage since Dev Day. It's available as a VS Code plugin, CLI, cloud service, and even via ChatGPT on your phone. The future involves agents that can "safely write its own tools to solve new problems that it encounters"—a profound capability that transforms software development.

5 Key Themes

Anatomy of an Agent

Three parts: UI, Model, and Harness. The harness is the critical middle layer.

"It's made out of three parts. It's a user interface. It has a model. It's a harness."

— Bill Chen • 02:15

Harness Challenges

AV is one. Your custom tool might not be something the model is used to using.

"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is using is used to using."

— Brian Fioca • 04:40

Codeex Architecture

Bundles complex features: parallel tool calls, thread merging, sandboxing, compaction.

"It's way harder than you think. You have to manage parallel tool calls like thread merging."

— Bill Chen • 09:16

Intelligence + Habit

Models are trained with specific behaviors. Don't overprompt—let the model do what it's used to.

"If you don't instruct the model in ways that it's familiar with, you can have problems."

— Brian Fioca • 06:57

Future Predictions

Models will work on longer-horizon tasks unsupervised. The trust ceiling will keep rising.

"They'll be able to get to work on much longer horizon tasks unsupervised."

— Bill Chen • 16:06

The Harness: Why It's Harder Than You Think

Why Build a Harness?

"The ground keeps shifting really under the harness on the coding agents." Every time a new model is released, teams have to rebuild the agent on top of the model. The harness is "the interface layer to the model"—the surface area the model uses to talk to users and the code and perform actions with tools.

Challenge 1: Tool Adoption

"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is used to using. It may not have ever seen that tool before in training."

— 04:40

Challenge 2: Context Management

"Managing the context window and compaction can be really challenging. We just launched Codeex Max that does that out of the box for you. It's really hard to do."

— 05:30

Challenge 3: API Evolution

"The APIs keep changing, right? So we have completions, we have responses, we have whatever else is coming in the future."

— 05:45

Challenge 4: Complexity

"Parallel tool calls like thread merging and all of the things involved in that. Think about all the security considerations you have with sandboxing, prompt forwarding, permissions."

— 09:23

Top 15 Quotes from the Talk

AGI Signal

"Coding is one of the most active frontiers in applied AI. And it's really a signal on how close we are to AGI."

Bill Chen

OpenAI Applied AI Startups