Replit Engineering

Autonomy Is All You Need

Building Long-Running AI Agents That Break the One-Hour Barrier

"We built a coding agent for nontechnical users. We want to build agents that run for several hours in a row without human intervention."

Michele Catasta • Head of ML, Replit

Watch (10:40)
24 min

Talk Duration

2 Types

Of Autonomy

3 Gens

Agent Evolution

>1 Hour

Barrier Broken

Building autonomous agents that run for hours without human intervention requires fundamentally rethinking how we design and engineer AI systems. Michele Catasta, Head of ML at Replit, presents a comprehensive framework for achieving true autonomy—drawing insights from Tesla's Full Self-Driving journey and Replit's own evolution through three generations of agent architectures.

The talk introduces a critical distinction between two types of autonomy: supervised autonomy (human in the loop, like Tesla FSD with driver monitoring) and unsupervised autonomy (fully independent operation). Most AI systems today are stuck in supervised mode. Breaking through to unsupervised autonomy requires overcoming technical barriers around reliability, testing, observability, and system design.

Replit's journey from September 2023 reveals the practical evolution from ReAct agents (2022) to tool-calling agents (2023) to truly autonomous agents (2024). The breakthrough came with B3 capabilities—agents that can run for multiple hours without intervention by implementing four critical engineering practices: comprehensive testing, deep observability, reducible design, and massive parallelism.

The most compelling insight is the target user: not technical engineers, but nontechnical users who want to describe what they want and see it built. This focus drives all technical decisions and explains why autonomy matters—it's the difference between a helpful assistant and a transformative tool that democratizes software development.

Two Types of Autonomy

The Tesla FSD Analogy

Michele draws a powerful analogy to Tesla's Full Self-Driving system to explain the critical distinction between two fundamentally different approaches to autonomy.

Important Warning

Supervised Autonomy

The system operates autonomously but requires constant human monitoring. The human must remain attentive and ready to intervene at any moment.

Example:

"Tesla FSD requires the driver to monitor and be ready to take over."

Important Warning

Unsupervised Autonomy

The system operates completely independently for extended periods. The human can walk away and return hours later to find the task completed.

Example:

"We want agents that run for several hours in a row without human intervention."

The Key Insight

Most AI agents today operate in supervised mode—they can perform tasks but require constant human oversight. The breakthrough happens when you cross the threshold to unsupervised autonomy, where agents can operate reliably for hours without intervention. This is the difference between a helpful tool and a transformative technology that democratizes access to software development.

Three Generations of AI Agents

ReAct Agents (2022)

The first generation used the ReAct (Reasoning + Acting) pattern. Agents would reason about what to do, take an action, observe the result, and repeat. This was groundbreaking but fundamentally limited—the reasoning and acting were tightly coupled in a loop that couldn't scale to complex, long-running tasks.

"ReAct pattern: reason, act, observe, repeat. Simple but limited for complex tasks."

Tool-Calling Agents (2023)

The second generation separated tools from the model reasoning. Agents could call predefined tools (functions, APIs, commands) with structured inputs. This was a major step forward—tools could be versioned, tested, and improved independently. But agents still required constant human oversight and couldn't run for long periods.

"Tool calling enabled separation of concerns. Tools became first-class objects that could be improved independently."

Autonomous Agents (2024)

The current generation achieves true autonomy. Agents can run for multiple hours without human intervention by implementing robust engineering practices: comprehensive testing, deep observability, reducible design for debugging, and massive parallelism. This isn't just about better models—it's about building production-grade infrastructure around them.

"Agents that run for several hours in a row without human intervention. That's the breakthrough."

Breaking the One-Hour Barrier

The B3 Breakthrough

Replit achieved a major milestone with B3 capabilities—agents that can reliably run for more than one hour without human intervention. This might sound like a small detail, but it represents a fundamental shift from supervised to unsupervised autonomy.

Important Warning

Before B3

Agents would fail or get stuck after minutes. Constant human intervention required.

Important Warning

With B3

Agents run for hours independently. Complex multi-step tasks completed successfully.

"We want to build agents that run for several hours in a row without human intervention."

10:40

Why This Matters

  • Nontechnical users can describe what they want and walk away
  • Complex tasks can be broken into many steps without manual intervention
  • Overnight builds can run while you sleep, ready in the morning
  • Democratization of software development becomes realistic

Four Critical Engineering Practices

Achieving autonomous agents isn't about better prompts or smarter models—it's about implementing production-grade engineering practices. Michele outlines four non-negotiable practices.

Testing

Comprehensive testing at every level. Unit tests for tools, integration tests for workflows, end-to-end tests for complete tasks. You can't have autonomous agents without confidence that each component works correctly.

Key Insight: Testing is the foundation of autonomy. Without it, you can't trust agents to run unsupervised.

Observability

Deep visibility into agent behavior. What tools are being called? What's the reasoning? Where did it get stuck? Observability lets you debug failures and optimize performance in production.

Key Insight: You can't improve what you can't see. Observability is essential for reliable agents.

Reducible Design

Design systems so failures can be reduced to minimal reproducible cases. When an agent fails, you should be able to isolate the exact step, input, and context that caused the problem.

Key Insight: Debugging autonomous agents requires reducible failures. Make every bug reproducible.

Parallelism

Run multiple agent instances in parallel. Explore different approaches simultaneously. Compare results. Parallelism dramatically speeds up iteration and increases success rates.

Key Insight: Parallelism isn't just about speed—it's about exploring more solution paths and finding the best one.

Inside Replit's Journey to Autonomy

Timeline: From September 2023

Sep

September 2023 - Journey Begins

Replit starts building autonomous agents. Early experiments with ReAct and tool-calling patterns reveal limitations. Team realizes they need to rethink the entire architecture.

Q4

Q4 2023 - Tool-Calling Agents

Transition to tool-calling architecture. Tools become first-class objects. Early success but agents still require constant oversight. One-hour barrier seems impossible.

2024

Early 2024 - Engineering Focus

Shift from model improvements to engineering practices. Invest heavily in testing, observability, reducible design, and parallelism. Realize autonomy is a systems engineering problem.

B3

B3 Launch - Barrier Broken

Agents can now run for multiple hours without human intervention. Focus shifts to nontechnical users—people who want to describe what they want and see it built without knowing how to code.

The Target User: Nontechnical Creators

"We built a coding agent for nontechnical users." This isn't about helping developers write code faster—it's about enabling people who have ideas but don't know how to program. The autonomous agent is their translator, converting natural language desires into working software. This focus on nontechnical users drives all technical decisions and explains why breaking the one-hour autonomy barrier is so critical.

Top 15 Quotes from the Talk

"We built a coding agent for nontechnical users. We want to build agents that run for several hours in a row without human intervention."

Michele Catasta, Head of ML, Replit

10:40
"There are two types of autonomy: supervised autonomy and unsupervised autonomy."

Michele Catasta, Head of ML, Replit

03:20
"Tesla Full Self-Driving requires the driver to monitor and be ready to take over. That's supervised autonomy."

Michele Catasta, Head of ML, Replit

04:15
"We've seen three generations of agents: ReAct agents, tool-calling agents, and now autonomous agents."

Michele Catasta, Head of ML, Replit

06:30
"Testing is the foundation. You can't have autonomous agents without comprehensive testing."

Michele Catasta, Head of ML, Replit

14:20
"Observability lets you understand what your agents are doing and why they succeed or fail."

Michele Catasta, Head of ML, Replit

15:45
"Reducible design means you can isolate failures to minimal reproducible cases."

Michele Catasta, Head of ML, Replit

17:10
"Parallelism allows us to explore multiple solution paths simultaneously and choose the best."

Michele Catasta, Head of ML, Replit

18:30
"All technical decisions should be driven by user needs, not by what's technically interesting."

Michele Catasta, Head of ML, Replit

12:15
"The breakthrough isn't better models—it's better engineering around the models."

Michele Catasta, Head of ML, Replit

13:50
"Breaking the one-hour barrier changes everything about who can use AI agents."

Michele Catasta, Head of ML, Replit

11:25
"ReAct was a good start, but tool-calling allowed us to separate concerns and improve components independently."

Michele Catasta, Head of ML, Replit

07:40
"Nontechnical users should be able to describe what they want and see it built without knowing how to code."

Michele Catasta, Head of ML, Replit

09:55
"Unsupervised autonomy means agents can complete complex tasks while you sleep."

Michele Catasta, Head of ML, Replit

05:30
"B3 represents the transition from supervised to unsupervised autonomy."

Michele Catasta, Head of ML, Replit

19:15

Future Predictions

2025

  • Multi-hour autonomy becomes standard for production agents
  • Engineering best practices codified into frameworks
  • Nontechnical user adoption accelerates

2026

  • Agents run overnight for complex multi-day projects
  • Standardized testing and observability tooling
  • Democratization of software development accelerates

2027+

  • Fully autonomous software development pipelines
  • Agents coordinate with each other on large projects
  • Nontechnical users build complex applications independently

Actionable Takeaways

For AI Engineers

  • Invest in testing infrastructure before optimizing models
  • Build observability from day one
  • Design for reducible failures
  • Embrace parallelism

For Product Teams

  • Focus on nontechnical users
  • Target unsupervised autonomy
  • All technical decisions from user needs

For Researchers

  • Study engineering practices, not just models
  • Investigate the supervised-to-unsupervised transition
  • Develop frameworks for reducible design

Meet the Speaker

Michele Catasta

Head of Machine Learning, Replit

Michele Catasta leads the machine learning team at Replit, where he's pioneering the development of autonomous AI agents that can write software independently. With a deep background in AI research and production engineering, he bridges the gap between cutting-edge research and practical, user-facing products. His work focuses on making software development accessible to everyone, regardless of technical background.

Key Contributions

Leading Replit's autonomous agent development
Breaking the one-hour autonomy barrier
Three-generation agent architecture framework
Engineering practices for production agents

Notable Quotes from This Talk

"We built a coding agent for nontechnical users. We want to build agents that run for several hours in a row without human intervention."

"There are two types of autonomy: supervised autonomy and unsupervised autonomy."

"The breakthrough isn't better models—it's better engineering around the models."

Key Timestamps in the Talk

Source Video

Research Methodology: This comprehensive analysis is based on Michele Catasta's talk at the AI Engineer Conference. All quotes are timestamped and link to exact moments in the video for validation. The analysis focuses on practical engineering patterns for building autonomous agents that can run for extended periods without human intervention.

Research sourced from AI Engineer Conference 2024. Analysis of Michele Catasta's presentation on building autonomous agents at Replit. Focus on practical patterns, engineering practices, and the journey from supervised to unsupervised autonomy.