OpenAI logoOpenAI Engineering

Future-Proof Coding Agents

How to Build AI That Writes Code and Survives Rapid Model Evolution

Software engineering can be seen as a universal medium for problem solving. Coding is one of the most active frontiers in applied AI, and it's really a signal on how close we are to AGI.

— Bill Chen • 01:37

17 min

Talk Duration

Packed with insights

3 Parts

Agent Anatomy

UI + Model + Harness

Codeex

OpenAI's Agent

SDK available

Trillions

Tokens/Week

Fastest growing model

Executive Summary

Building coding agents that survive rapid model evolution requires understanding the critical "harness" layer—the interface between models and users that manages prompts, tools, context, and execution. While models like GPT-5.1 get all the attention, the harness is where the real engineering value lies.

Bill Chen and Brian Fioca from OpenAI's startups team reveal that coding agents have three components: a user interface, a model, and a harness. The harness is "really the interface layer to the model"—a collection of prompts and tools combined in a core agent loop that provides input and outputs from the model.

Codeex, OpenAI's coding agent, serves "dozens of trillions of tokens per week" and has doubled in usage since Dev Day. It's available as a VS Code plugin, CLI, cloud service, and even via ChatGPT on your phone. The future involves agents that can "safely write its own tools to solve new problems that it encounters"—a profound capability that transforms software development.

5 Key Themes

Anatomy of an Agent

Three parts: UI, Model, and Harness. The harness is the critical middle layer.

"It's made out of three parts. It's a user interface. It has a model. It's a harness."

— Bill Chen • 02:15

Harness Challenges

AV is one. Your custom tool might not be something the model is used to using.

"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is using is used to using."

— Brian Fioca • 04:40

Codeex Architecture

Bundles complex features: parallel tool calls, thread merging, sandboxing, compaction.

"It's way harder than you think. You have to manage parallel tool calls like thread merging."

— Bill Chen • 09:16

Intelligence + Habit

Models are trained with specific behaviors. Don't overprompt—let the model do what it's used to.

"If you don't instruct the model in ways that it's familiar with, you can have problems."

— Brian Fioca • 06:57

Future Predictions

Models will work on longer-horizon tasks unsupervised. The trust ceiling will keep rising.

"They'll be able to get to work on much longer horizon tasks unsupervised."

— Bill Chen • 16:06

The Harness: Why It's Harder Than You Think

Why Build a Harness?

"The ground keeps shifting really under the harness on the coding agents." Every time a new model is released, teams have to rebuild the agent on top of the model. The harness is "the interface layer to the model"—the surface area the model uses to talk to users and the code and perform actions with tools.

Challenge 1: Tool Adoption

"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is used to using. It may not have ever seen that tool before in training."

04:40

Challenge 2: Context Management

"Managing the context window and compaction can be really challenging. We just launched Codeex Max that does that out of the box for you. It's really hard to do."

05:30

Challenge 3: API Evolution

"The APIs keep changing, right? So we have completions, we have responses, we have whatever else is coming in the future."

05:45

Challenge 4: Complexity

"Parallel tool calls like thread merging and all of the things involved in that. Think about all the security considerations you have with sandboxing, prompt forwarding, permissions."

09:23

Top 15 Quotes from the Talk

AGI Signal
"Coding is one of the most active frontiers in applied AI. And it's really a signal on how close we are to AGI."

Bill Chen

OpenAI Applied AI Startups

01:37
Agent Anatomy
"It's made out of three parts. It's a user interface. It has a model. It's a harness."

Bill Chen

OpenAI Applied AI Startups

02:15
Harness Layer
"The harness is a little bit more of an interesting part. This is the part that directly interacts with the model in the most reductive way."

Bill Chen

OpenAI Applied AI Startups

03:01
Tool Adoption
"Your brand new innovative custom tool that you're giving to your agent might not actually be something the model is used to using."

Brian Fioca

OpenAI Startups Team

04:40
Model Training
"Intelligence plus habit. What is the model good at? What languages does it know really well? And then what habits did it learn to use to solve those problems?"

Brian Fioca

OpenAI Startups Team

06:16
Prompt Engineering
"If you don't instruct the model in ways that it's familiar with, you can have problems."

Brian Fioca

OpenAI Startups Team

06:57
Anti-Pattern
"If you let the model just do the behaviors that it's used to and don't overprompt it, it'll actually perform really better."

Brian Fioca

OpenAI Startups Team

07:47
Optimization
"I was literally like, 'Hey, like I like the solution, but it took you a long time to get there. What can I do differently in your instructions to help you get there faster next time?'"

Brian Fioca

OpenAI Startups Team

07:58
Model Feedback
"And literally it said, 'Uh, you're telling me to go look at everything and I don't really need to. So that's what's taking forever.'"

Brian Fioca

OpenAI Startups Team

08:03
Vertical Integration
"You can actually see the advantages of building both the model and the harness together because you just like know all of that while you're building it."

Brian Fioca

OpenAI Startups Team

08:17
Omnipresence
"So we built Codeex to be an agent for everywhere that you code. It's a VS Code plugin. It's a CLI. You can call it in the cloud from the VS Code plugin or from ChatGPT from your phone."

Bill Chen

OpenAI Applied AI Startups

08:33
Beyond Coding
"It does not have to be a coding task and if it can be accomplished by running tools from command line you can use Codeex."

Bill Chen

OpenAI Applied AI Startups

11:20
Self-Improving
"We've bundled all of these features together for you in an agent that can safely write its own tools to solve new problems that it encounters."

Bill Chen

OpenAI Applied AI Startups

10:13
Growth Metric
"It's the fastest growing model in usage now serving dozens of trillions of tokens per week which has actually doubled since Dev Day."

Bill Chen

OpenAI Applied AI Startups

15:47
Trust Evolution
"New models will raise the trust ceiling. I trust these models now to do some way harder work than I would have 6 months ago."

Bill Chen

OpenAI Applied AI Startups

16:12

Codeex: OpenAI's Reference Implementation

What Codeex Does

Turn specs into runnable code
Navigate repos to edit files
Run commands, execute tasks
Call from Slack or GitHub PR review
Organize photos, analyze CSV files
Any command-line task

"It does not have to be a coding task and if it can be accomplished by running tools from command line you can use Codeex."

11:20

Available Interfaces

VS Code Plugin

CLI

Cloud API

ChatGPT Mobile

Using Codeex to Build Your Own Agents

"You can use Codeex the agent inside of your own agent." This creates a powerful pattern where Codeex becomes a tool that your custom agent can call.

1

SDK Integration

TypeScript library, Python exec, GitHub Actions

2

CI/CD Pipeline

Auto-merge conflicts on PRs

3

MCP Connectors

Plug into your product's APIs

4

Tool Creation

"Give a tool to your chatbot that can make other tools that it doesn't have"

"You can actually build out enterprise software that does it that writes its own plug-in connectors to the API level for each customer on the spot. That's something that a professional services team used to have to do."

14:11

Emerging Patterns from Production Use

Pattern 1: Harness as the New Abstraction Layer

"The benefits of this is quite obvious. You no longer have to care about prioritize optimizing the prompt and tools with every model upgrade."

The "Wrapper" Question: "Does that mean you're just building a wrapper?"

"I disagree with that take. Building wrappers on top of models I think is really reductive on the whole value prop of the infrastructure layer."

12:02

Pattern 2: Custom Alignment for Performance

Cursor worked closely with OpenAI to "get the best performance out of the Codeex. They did so by aligning their tools to be in distribution with how the model is trained and they did so by aligning their harness with our open-source implementation of Codeex CLI."

Tool Alignment

Match model training

Harness Integration

Open-source CLI patterns

Result

Best performance

15:03

What the Future Holds for Codeex

Model Evolution

  • "The models will get better"
  • Work on "much longer horizon tasks unsupervised"
  • "New models will raise the trust ceiling"

16:01

Application Challenges

  • "Sprawling code bases and non-standard libraries"
  • "Knowing how to work in closed source environments"
  • "Matching existing templates and practices"

16:21

SDK Evolution

"Imagine that the SDK will evolve to better support these model capabilities, letting the model learn as it goes and not repeat mistakes and generally provide more surface area for an agent that writes code and uses a terminal to solve whatever problems it encounters."

Learn as it goes

Not repeat mistakes

More surface area

16:52

What We Learned

For Builders

  • Harnesses are really complicated and take a lot of work to maintain, especially with all the new models coming out
  • Don't overprompt — let the model do the behaviors it's used to
  • Use Codeex off the shelf or look at the source code if you want to customize

The OpenAI Approach

"So we've built one for you inside of Codeex that you can use off the shelf or look at the source if you want to and you can use it to build new things outside of coding and let us do all of the work making sure that you have the most capable computer agent."

17:07

Meet the Speakers

OpenAI logo

Bill Chen

Applied AI Startups Team, OpenAI

Works on the applied AI startups team at OpenAI, specifically focusing on building coding agents. Leads the development of Codeex, OpenAI's comprehensive coding agent solution.

"Software engineering can be seen as a universal medium for problem solving."

OpenAI logo

Brian Fioca

Startups Team, OpenAI

Works with Bill on the OpenAI startups team. Deep technical expertise in model training behaviors and prompt engineering. Former VC with unique perspective on infrastructure value props.

"Developing a feel for these habits is how you become a good prompt engineer."

Key Timestamps

00:21

Introduction

Today we'll be talking about how to build coding agents

01:37

AGI Signal

Coding as a signal of how close we are to AGI

02:15

Agent Anatomy

Three parts: UI, Model, and Harness

03:01

The Harness

Interface layer that directly interacts with the model

04:40

Tool Adoption

Your custom tool might not be something the model uses

05:30

Context Management

Codeex Max handles compaction out of the box

06:16

Intelligence + Habit

Models have trained behaviors and patterns

06:57

Prompt Engineering

If you don't instruct in familiar ways, problems occur

07:47

Don't Overprompt

Let the model do behaviors it's used to

08:03

Model Feedback

"You're telling me to look at everything and I don't need to"

08:17

Vertical Integration

Advantages of building model + harness together

08:33

Codeex Intro

Agent for everywhere that you code

09:16

Harness Complexity

Parallel tool calls, thread merging, sandboxing

10:13

Self-Improving

Agent that can safely write its own tools

11:20

Beyond Coding

Any task accomplishable from command line

11:57

Agent in Agent

Use Codeex inside your own agent

12:02

Harness as Abstraction

No longer optimize prompts with every model upgrade

12:21

Wrapper Question

"Does that mean you're just building a wrapper?"

14:11

Enterprise Use

Software that writes its own plug-in connectors

15:03

Cursor Alignment

Aligning tools and harness with model training

15:47

Growth Metric

Dozens of trillions of tokens per week, doubled since Dev Day

16:01

Future: Better Models

Work on longer horizon tasks unsupervised

16:12

Future: Trust Ceiling

Trust models for harder work than 6 months ago

16:21

Future: Challenges

Sprawling code bases, closed source, existing templates

16:52

SDK Evolution

Let model learn, not repeat mistakes, more surface area

17:07

Summary

Use Codeex off the shelf or customize from source

Source Video

Future-Proof Coding Agents

Bill Chen & Brian Fioca • AI Engineer Summit

Video ID: wVl6ZjELpBkDuration: ~17 minutes
coding agents
OpenAI
Codeex
agent architecture
prompt engineering
model evolution
Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted by reading the complete VTT transcript (3,328 lines) and extracting key insights about building coding agents that survive rapid model evolution.

Research sourced from AI Engineer Summit transcript. Full VTT file analyzed with complete transcript reading and key insight extraction. Focus on practical patterns for building production-ready coding agents.