Future-Proof Coding Agents
How to Build AI That Writes Code and Survives Rapid Model Evolution
Software engineering can be seen as a universal medium for problem solving. Coding is one of the most active frontiers in applied AI, and it's really a signal on how close we are to AGI.
— Bill Chen • 01:37
Talk Duration
Packed with insights
Agent Anatomy
UI + Model + Harness
OpenAI's Agent
SDK available
Tokens/Week
Fastest growing model
Executive Summary
Building coding agents that survive rapid model evolution requires understanding the critical "harness" layer—the interface between models and users that manages prompts, tools, context, and execution. While models like GPT-5.1 get all the attention, the harness is where the real engineering value lies.
Bill Chen and Brian Fioca from OpenAI's startups team reveal that coding agents have three components: a user interface, a model, and a harness. The harness is "really the interface layer to the model"—a collection of prompts and tools combined in a core agent loop that provides input and outputs from the model.
Codeex, OpenAI's coding agent, serves "dozens of trillions of tokens per week" and has doubled in usage since Dev Day. It's available as a VS Code plugin, CLI, cloud service, and even via ChatGPT on your phone. The future involves agents that can "safely write its own tools to solve new problems that it encounters"—a profound capability that transforms software development.
5 Key Themes
Anatomy of an Agent
Three parts: UI, Model, and Harness. The harness is the critical middle layer.
"It's made out of three parts. It's a user interface. It has a model. It's a harness."
— Bill Chen • 02:15
Harness Challenges
AV is one. Your custom tool might not be something the model is used to using.
"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is using is used to using."
— Brian Fioca • 04:40
Codeex Architecture
Bundles complex features: parallel tool calls, thread merging, sandboxing, compaction.
"It's way harder than you think. You have to manage parallel tool calls like thread merging."
— Bill Chen • 09:16
Intelligence + Habit
Models are trained with specific behaviors. Don't overprompt—let the model do what it's used to.
"If you don't instruct the model in ways that it's familiar with, you can have problems."
— Brian Fioca • 06:57
Future Predictions
Models will work on longer-horizon tasks unsupervised. The trust ceiling will keep rising.
"They'll be able to get to work on much longer horizon tasks unsupervised."
— Bill Chen • 16:06
The Harness: Why It's Harder Than You Think
Why Build a Harness?
"The ground keeps shifting really under the harness on the coding agents." Every time a new model is released, teams have to rebuild the agent on top of the model. The harness is "the interface layer to the model"—the surface area the model uses to talk to users and the code and perform actions with tools.
Challenge 1: Tool Adoption
"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is used to using. It may not have ever seen that tool before in training."
— 04:40
Challenge 2: Context Management
"Managing the context window and compaction can be really challenging. We just launched Codeex Max that does that out of the box for you. It's really hard to do."
— 05:30
Challenge 3: API Evolution
"The APIs keep changing, right? So we have completions, we have responses, we have whatever else is coming in the future."
— 05:45
Challenge 4: Complexity
"Parallel tool calls like thread merging and all of the things involved in that. Think about all the security considerations you have with sandboxing, prompt forwarding, permissions."
— 09:23
Top 15 Quotes from the Talk
"Coding is one of the most active frontiers in applied AI. And it's really a signal on how close we are to AGI."
"It's made out of three parts. It's a user interface. It has a model. It's a harness."
"The harness is a little bit more of an interesting part. This is the part that directly interacts with the model in the most reductive way."
"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is used to using."
"Intelligence plus habit. What is the model good at? What languages does it know really well? And then what habits did it learn to use to solve those problems?"
"If you don't instruct the model in ways that it's familiar with, you can have problems."
"If you let the model just do the behaviors that it's used to and don't overprompt it, it'll actually perform really better."
"I was literally like, 'Hey, like I like the solution, but it took you a long time to get there. What can I do differently in your instructions to help you get there faster next time?'"
"And literally it said, 'Uh, you're telling me to go look at everything and I don't really need to. So that's what's taking forever.'"
"You can actually see the advantages of building both the model and the harness together because you just like know all of that while you're building it."
"So we built Codeex to be an agent for everywhere that you code. It's a VS Code plugin. It's a CLI. You can call it in the cloud from the VS Code plugin or from ChatGPT from your phone."
"It does not have to be a coding task and if it can be accomplished by running tools from command line you can use Codeex."
"We've bundled all of these features together for you in an agent that can safely write its own tools to solve new problems that it encounters."
"It's the fastest growing model in usage now serving dozens of trillions of tokens per week which has actually doubled since Dev Day."
"New models will raise the trust ceiling. I trust these models now to do some way harder work than I would have 6 months ago."
Codeex: OpenAI's Reference Implementation
What Codeex Does
"It does not have to be a coding task and if it can be accomplished by running tools from command line you can use Codeex."
— 11:20
Available Interfaces
VS Code Plugin
CLI
Cloud API
ChatGPT Mobile
Using Codeex to Build Your Own Agents
"You can use Codeex the agent inside of your own agent." This creates a powerful pattern where Codeex becomes a tool that your custom agent can call.
SDK Integration
TypeScript library, Python exec, GitHub Actions
CI/CD Pipeline
Auto-merge conflicts on PRs
MCP Connectors
Plug into your product's APIs
Tool Creation
"Give a tool to your chatbot that can make other tools that it doesn't have"
"You can actually build out enterprise software that does it that writes its own plug-in connectors to the API level for each customer on the spot. That's something that a professional services team used to have to do."
— 14:11
Emerging Patterns from Production Use
Pattern 1: Harness as the New Abstraction Layer
"The benefits of this is quite obvious. You no longer have to care about prioritize optimizing the prompt and tools with every model upgrade."
The "Wrapper" Question: "Does that mean you're just building a wrapper?"
"I disagree with that take. Building wrappers on top of models I think is really reductive on the whole value prop of the infrastructure layer."
— 12:02
Pattern 2: Custom Alignment for Performance
Cursor worked closely with OpenAI to "get the best performance out of the Codeex. They did so by aligning their tools to be in distribution with how the model is trained and they did so by aligning their harness with our open-source implementation of Codeex CLI."
Tool Alignment
Match model training
Harness Integration
Open-source CLI patterns
Result
Best performance
— 15:03
What the Future Holds for Codeex
Model Evolution
- "The models will get better"
- Work on "much longer horizon tasks unsupervised"
- "New models will raise the trust ceiling"
— 16:01
Application Challenges
- "Sprawling code bases and non-standard libraries"
- "Knowing how to work in closed source environments"
- "Matching existing templates and practices"
— 16:21
SDK Evolution
"Imagine that the SDK will evolve to better support these model capabilities, letting the model learn as it goes and not repeat mistakes and generally provide more surface area for an agent that writes code and uses a terminal to solve whatever problems it encounters."
Learn as it goes
Not repeat mistakes
More surface area
— 16:52
What We Learned
For Builders
- Harnesses are really complicated and take a lot of work to maintain, especially with all the new models coming out
- Don't overprompt — let the model do the behaviors it's used to
- Use Codeex off the shelf or look at the source code if you want to customize
The OpenAI Approach
"So we've built one for you inside of Codeex that you can use off the shelf or look at the source if you want to and you can use it to build new things outside of coding and let us do all of the work making sure that you have the most capable computer agent."
— 17:07
Meet the Speakers
Bill Chen
Applied AI Startups Team, OpenAI
Works on the applied AI startups team at OpenAI, specifically focusing on building coding agents. Leads the development of Codeex, OpenAI's comprehensive coding agent solution.
"Software engineering can be seen as a universal medium for problem solving."
Brian Fioca
Startups Team, OpenAI
Works with Bill on the OpenAI startups team. Deep technical expertise in model training behaviors and prompt engineering. Former VC with unique perspective on infrastructure value props.
"Developing a feel for these habits is how you become a good prompt engineer."
Key Timestamps
Introduction
Today we'll be talking about how to build coding agents
AGI Signal
Coding as a signal of how close we are to AGI
Agent Anatomy
Three parts: UI, Model, and Harness
The Harness
Interface layer that directly interacts with the model
Tool Adoption
Your custom tool might not be something the model uses
Context Management
Codeex Max handles compaction out of the box
Intelligence + Habit
Models have trained behaviors and patterns
Prompt Engineering
If you don't instruct in familiar ways, problems occur
Don't Overprompt
Let the model do behaviors it's used to
Model Feedback
"You're telling me to look at everything and I don't need to"
Vertical Integration
Advantages of building model + harness together
Codeex Intro
Agent for everywhere that you code
Harness Complexity
Parallel tool calls, thread merging, sandboxing
Self-Improving
Agent that can safely write its own tools
Beyond Coding
Any task accomplishable from command line
Agent in Agent
Use Codeex inside your own agent
Harness as Abstraction
No longer optimize prompts with every model upgrade
Wrapper Question
"Does that mean you're just building a wrapper?"
Enterprise Use
Software that writes its own plug-in connectors
Cursor Alignment
Aligning tools and harness with model training
Growth Metric
Dozens of trillions of tokens per week, doubled since Dev Day
Future: Better Models
Work on longer horizon tasks unsupervised
Future: Trust Ceiling
Trust models for harder work than 6 months ago
Future: Challenges
Sprawling code bases, closed source, existing templates
SDK Evolution
Let model learn, not repeat mistakes, more surface area
Summary
Use Codeex off the shelf or customize from source
Source Video
Future-Proof Coding Agents
Bill Chen & Brian Fioca • AI Engineer Summit
Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted by reading the complete VTT transcript (3,328 lines) and extracting key insights about building coding agents that survive rapid model evolution.