Robotics Meets AI

Agents are Robots Too

What Self-Driving Taught Me About Building Agents

Jesse Hu draws parallels between self-driving cars and AI agents, introducing "Agentics"—applying robotics principles to agent development. Learn why the model is only 1% of the work, how to design closed-loop systems, and what robotics can teach us about building better agents.

When you get into real world applications, the model is only doing 1% of the work and 99% of the work goes into other things.

Jesse Hu, Abundant (01:29)

Model work

99%

Everything else

Agentics

New discipline

MDP

Framework

Why Robotics Matters for Agent Developers

Jesse Hu, a lifelong ML engineer who worked on YouTube and Google's two-tower embedding model, BERT, and mixture of experts, now applies his robotics background to building agentic coding models at Abundant. His central thesis: AI agents and robots face the same fundamental challenges. By studying decades of robotics research and self-driving development, agent engineers can accelerate progress and avoid repeating costly mistakes.

"I've been a lifelong ML engineer and I've worked at places like YouTube and Google where I worked on the two tower embedding model as well as some early work on BERT and mixture of experts."

Speaker background: ML engineer at YouTube/Google, now at Abundant building datasets for agentic coding models

Watch (00:17)

"I've given different variants of this talk in person for different events, but this is the first one that I've done for coding agents."

This talk adapts robotics/self-driving lessons specifically for AI agent developers

Watch (00:04)

The Core Insight

The term "Agentics" refers to applying robotics principles, abstractions, and core concepts to agent development. This moves agent development from "something we hack on" to a dedicated scientific practice with established methodologies. Robotics has spent decades solving problems that agent developers are encountering for the first time—the closed-loop control, action space design, simulation, and failure mode analysis that robotics teams have mastered can directly inform agent development.

The 1% vs 99% Problem

In both robotics and agents, the machine learning model is only the tip of the iceberg. The real engineering challenge lies in the 99% of the system that surrounds the model.

Model Work vs Everything Else

1% → 99%

Robotics 99%

Hardware, sensors, actuators, integration, offline stack (simulation, training)

Agents 99%

APIs, MCPs, terminal, browser, VM, persistent file systems, offline stack (data, training)

"The model is only doing 1% of the work and 99% of the work goes into other things."

In both robotics and agents, the ML model is tiny compared to the full system stack

Watch (01:29)

"And I think this is something that is very analogous to self-driving cars. When you get into real world applications, it's like hardware, sensors, actuators, integration, and the offline stack like simulation and training."

Robotics breakdown of the 99%: hardware, sensors, integration, simulation

Watch (01:39)

"For agents, it's like APIs, MCPs, terminal, browser, VM, persistent file systems, all this kind of stuff. And then the offline stack which is like data and model training."

Agent equivalent of the 99%: APIs, tools, interfaces, and the offline stack

Watch (02:05)

Winning Teams Have the Best Offline Stack

In robotics competitions and self-driving development, the teams that win aren't necessarily those with the best models—they're the teams with the best offline stacks: simulation environments, data pipelines, evaluation frameworks, and training infrastructure. The same applies to agents. Invest in your simulation, evaluation, and data infrastructure—it's where real competitive advantage lies.

Embodiment: Robots vs Agents

Both robots and agents have a "body" that interacts with the world. Understanding these parallels helps transfer insights from robotics to agent development.

Hardware & Sensors

🤖 Robotics: Cameras, LiDAR, radar, GPS - perception systems

🤖 Agents: APIs, web scrapers, database queries - information gathering

💡 Both need rich perception of the environment

Actuators & Tools

🤖 Robotics: Motors, servos, steering - physical action

🤖 Agents: MCPs, terminal commands, function calls - digital action

💡 Action primitives define what the system can do

Fleet Management

🤖 Robotics: Coordinating multiple vehicles

🤖 Agents: Multi-agent orchestration

💡 Systems-level coordination across many autonomous units

Offline Stack

🤖 Robotics: Simulation environments for testing

🤖 Agents: Evaluation frameworks, sandboxed environments

💡 Testing without real-world consequences

Closed-Loop vs Open-Loop Systems

Robotics relies on closed-loop feedback: act, measure, recalibrate. Most agents today use open-loop, turn-based interaction. This implicit design decision has significant implications for agent reliability.

🔄 Robotics: Closed-Loop

• Turn wheel → measure actual turn → recalibrate
• Continuous sampling from environment
• Real-time feedback on every action
• Can immediately respond to changes

💬 Agents: Turn-Based (Open-Loop)

• Execute tool → wait for response
• Discrete turns in conversation
• No real-time feedback during execution
• Can't respond to pop-ups or long-running processes immediately

"In robotics, you turn the wheel, you actually measure, you actually look at like, did the wheel actually turn five degrees? And then you recalibrate if it's off."

Robotics relies on real-time feedback loops between action and measurement

Watch (03:30)

"We don't do this thing that's natural robotics where we keep sampling from the world and we keep interacting in real time."

Agents typically use turn-based interaction instead of continuous sampling

Watch (05:13)

"We've kind of done this implicitly. So in agents we often have a conversation. So we wait to take our turn."

Turn-based conversation is an implicit design choice with implications

Watch (04:22)

The Time Discretization Trade-off

Agent developers have implicitly chosen turn-based interaction (wait for our turn). This is easier to reason about but limits real-time responsiveness. Robotics keeps sampling from the world continuously. For agents, consider whether your use case requires continuous interaction—browser agents dealing with pop-ups, terminal agents watching process output, or agents monitoring real-time data streams may need closed-loop designs.

Action Spaces: Beyond Tool Calls

How your agent interacts with the world is a design choice with trade-offs. Tool calls and MCPs are just the beginning.

Discrete Tools

MCPs, function calls

Easy to reason about, limited flexibility

Character-Level I/O

Terminus agent, TX streams

Finer control, more complex state

Continuous

Velocities, acceleration (Dreamer)

Robotics-style, 20 FPS interaction

"The question is what trade-offs are we making and what implicit or explicit design decisions have we made."

Every action space design involves trade-offs

Watch (05:49)

"There's an agent called Terminus which is uh from TerminalBench. And instead of using tools, they're doing like TX streams."

Character-level I/O as an alternative to discrete tool calls

Watch (06:41)

"So they're basically interacting at the character level rather than at the tool level."

Character-level interaction enables finer-grained control

Watch (06:51)

Stateful vs Stateless Agents

Agents are evolving from stateless "spawn from nothing" systems to stateful systems with persistent memory and environment context.

"Similarly, we're going from these stateless agents to more stateful agents."

Agents are evolving from stateless to stateful systems

Watch (07:31)

"Before, you kind of spawn from nothing like in a video game. Now you have VMs with persistent file stores."

VMs with persistent storage give agents 'memory' and 'state'

Watch (07:41)

"You have to think about what's the entire space? What Slack messages are active? What's the state of the world?"

Stateful agents require reasoning about the entire environment state

Watch (08:03)

The State Explosion Challenge

Stateful agents must reason about the entire environment state: active Slack messages, file system contents, running processes, browser tabs, and more. This state explosion makes planning and decision-making exponentially harder. Robotics faces the same challenge—the "state" of the real world includes every object, vehicle, pedestrian, and environmental condition. Successful systems use hierarchical abstraction and attention mechanisms to focus on relevant state.

Distribution Shift & Cascading Consequences

Unlike pure prediction or classification, actions have consequences. When agents act, they change the environment, creating new states that may be outside their training distribution.

⚠️ Distribution Shift

Browser agents get confused by pop-ups never seen in training

Imitation learning fails at the edges of the training distribution

🔗 Cascading Issues

Great plans fail when implemented - execution gap

Real-world is messy: actions have unintended consequences

"And you can start to see this in agents such as browser agents. When you see a pop-up that never happened in training because humans actually interact with pop-ups quite naturally, it gets confused and it gets really confused."

Distribution shift: agents fail on scenarios not seen in training

Watch (08:45)

"Actions have consequences in a very messy real world."

Unlike pure prediction, actions change the environment in unpredictable ways

Watch (10:01)

"We're dealing with a whole new paradigm in which you predict, you act, and then you deal with the consequences of that action and then re-evaluate everything you've done before."

The action loop: predict → act → handle consequences → re-evaluate

Watch (09:36)

"In self-driving from 2017 to 2020, we thought just make boxes and drive around them. That assumption wasn't true."

Self-driving focused on perception, but action models were equally important

Watch (11:25)

"Same thing with agents. You can have a great plan, but when you actually implement it, you realize there's all these cascading issues."

Great plans fail when implemented - the execution gap

Watch (11:56)

Simulation & Counterfactuals

Robotics learned that simulation is essential. It enables exploring multiple possible paths ("counterfactuals") without real-world consequences. The same applies to agents.

🌐 Simulation Benefits

• Explore all possible paths, not just one
• Test failure modes safely
• Represent real-world complexity in starting state
• Play out counterfactuals: "what if" scenarios

🎯 MDP Framework

• State: Environment representation
• Reward: Objective function
• Action Primitives: What can the agent do?
• Useful communication primitives between teams

"You can play through the real world not just in a single path but all the paths that you could possibly take as your agent changes."

Simulation enables exploring counterfactuals - all possible paths

Watch (10:03)

"You need to be able to represent the complexities and messiness of the real world in your starting state."

Simulation must capture real-world complexity, not simplified scenarios

Watch (10:13)

"State, reward, action primitives. These are very useful primitives for communication between people."

Markov Decision Process (MDP) framework provides common language

Watch (11:04)

"Moving from chat models to agent models."

Agents require different thinking than chat models

Watch (11:16)

Development Process: Hill Climbing & Logs

Agent development follows a hill climbing process: make changes, test against nebulous metrics, hope you improve. Unlike traditional software where features ship and work reliably, ML systems require iteration without guaranteed forward progress.

📜 Traditional Software

Feature → Production

Guaranteed forward progress

🎲 ML / Agents

Nebulous metric → Guess and check → Hope ↑

Iterative hill climbing

"In both cases in self-driving when it comes to robotics and in code when it comes to digital agents we're actually very lucky in both."

Both domains benefit from predefined human-machine interfaces

Watch (12:22)

"When you're exploring new domains, you should ask like, is there a predefined human interface?"

Domains with existing interfaces are easier for agent deployment

Watch (13:07)

"It's basically this iterative process of building or iterating on a complex system such as an LM or an agent when you don't always make forward progress."

Hill climbing: making progress on complex systems without guaranteed forward movement

Watch (13:44)

"The old way is like you ship a feature and you know it's going to work and you can move forward."

Traditional software: feature → production (guaranteed progress)

Watch (14:13)

"The new way is you pick a nebulous metric and you guess and check and you hope you go up."

ML/agents: nebulous metric → guess and check → hope for improvement

Watch (14:21)

Logs Become Critical

In agent development, detailed logs are more important than benchmark scores. A 70% benchmark score tells you very little. But breaking down failures by category, environment, failure mode, and then triaging individual failures gives you actionable insights on how to improve. Build logging infrastructure that captures full execution traces—not just metrics.

"The logs actually become a much more important part of the process than they are today. You can get a lot more insights than just your numbers."

Detailed logs enable deeper insights than simple benchmark scores

Watch (14:45)

"70% on a benchmark tells you very little. But if you break it down by categories, by cities, by different failure modes, and then you go triage individual failures, you get a lot more insights on how to improve."

Break down failures systematically to understand improvement paths

Watch (15:02)

The Current State of Agentics

Where are we today? Great demos and predictive models, but end-to-end work completion remains elusive. The reasons are the same challenges robotics faced: actions have consequences, real world complexity.

Great Demos

Impressive one-off demonstrations

Great Models

Strong predictive capabilities

Not There Yet

End-to-end work completion

"Great demos, great predictive models, but not nearly there on end-to-end work completion."

Current state of agents: impressive demos, limited real-world completion

Watch (15:27)

"Actions have consequences, real world complexity. These are things that we learned in robotics and self-driving."

Robotics learned these lessons the hard way - agents can benefit from that experience

Watch (15:53)

What is Agentics?

"Agentics: Applying robotics principles, abstractions, and core concepts to agent development to move from 'something we hack on' to 'dedicated real science and really becomes a practice.'"

Agentics as a discipline: systematic approach to agent development

Watch (16:20)

Key Takeaways for Agent Engineers

1. The Model is Only 1%

Infrastructure

•99% of the work is APIs, MCPs, tools, interfaces, and the offline stack
•Winning teams have the best simulation and evaluation infrastructure
•Invest in your offline stack—data, training, and simulation environments

2. Design Closed-Loop Systems

Feedback

•Robotics uses continuous sampling and real-time feedback
•Agents typically use turn-based interaction—an implicit design choice
•Consider whether your use case requires closed-loop, real-time interaction

3. Action Space Design Matters

Primitives

•Tool calls are just one option—consider character-level I/O or continuous actions
•Every action space design involves trade-offs
•Choose action primitives that match your domain

4. Prepare for Distribution Shift

Robustness

•Agents fail on scenarios outside their training distribution
•Actions have consequences in a messy real world
•Browser agents get confused by unseen pop-ups and edge cases

5. Invest in Simulation

Counterfactuals

•Simulation enables exploring all possible paths, not just one
•Play out counterfactuals—what if scenarios
•Represent real-world complexity in your starting state

6. Make Logs Central

Observability

•Detailed logs matter more than benchmark scores
•Break down failures by category and triage individual cases
•Build logging infrastructure that captures full execution traces

7. Use the MDP Framework

Communication

•State, reward, action primitives are useful communication tools
•Moving from chat models to agent models requires new thinking
•MDP abstractions help teams reason about agent behavior

8. Embrace Hill Climbing

Process

•Agent development is iterative—nebulous metrics, guess and check, hope for improvement
•Unlike traditional software, there's no guaranteed forward progress
•Learning → Simulation → Deploy with confidence → Real-world logs feed back

Source Video