AI Engineering Insight

12-Factor Agents: Patterns of Reliable LLM Applications

Why production-grade AI agents aren't built with magical frameworks, but with proven software engineering principles.

"Agents are just software. You all can build software."

— Dex Horthy, Founder of HumanLayer

Watch (8:45)

17,000+

GitHub Stars

14+

Contributors

200k+

Social Reach

Hacker News

Executive Summary

Dex Horthy delivers a groundbreaking challenge to the AI industry's framework obsession. After interviewing 100+ founders and builders working on production AI systems, he discovered a consistent pattern: agent frameworks get you to 70-80% functionality quickly, but achieving production-grade reliability requires throwing them away and rebuilding from scratch.

The core thesis is simple yet profound: "Agents are just software." They should be built using established software engineering principles rather than magical, opaque frameworks. Horthy introduces 12-Factor Agents — a methodology inspired by Heroku's original 12-Factor App patterns — to standardize production AI systems.

The talk resonated so deeply it hit the front page of Hacker News all day, garnered 17,000+ GitHub stars, and gave engineers the vocabulary to discuss agent architecture patterns they were already using but couldn't name.

The 70-80% Problem

The Framework Trap

You spin up a LangChain project, prototype an agent in a weekend, and it works surprisingly well. The CEO is excited. You add more tools. The demo goes great.

Then you hit production. Suddenly, your agent is hallucinating API calls, stuck in retry loops, context windows are bloating...

Seven Layers Deep

You're seven layers deep in a framework's call stack, trying to reverse engineer why your "intelligent" system decided to deploy to the wrong environment.

The abstraction that made you productive is now opaque, debugging is a nightmare, and you've lost control.

Not Every Problem Needs an Agent

Horthy tried building a DevOps agent to run makefile commands. After 2 hours of prompt engineering, he realized:

"I could have written the bash script to do this in about 90 seconds."

8:00

The Core Philosophy

"Agents are just software. You all can build software."

— Dex Horthy

8:45

LLMs Are Pure Functions

Tokens in → Tokens out. Agent reliability = context engineering.

7:00

"LLMs are pure functions, token in, tokens out, and everything everything in making agents good is context engineering."

"The only thing that determines the reliability of your agent is how good of tokens can you get out and the only thing that determines the tokens you get out other than like retraining your own own model and something like that is being really careful about what tokens you put in."

This isn't prompt engineering as mystical art. It's systematic optimization of every token that enters your model's context window.

Most Production Agents Aren't That Agentic

They're well-engineered software with small, deterministic components.

5:00

"Most production agents aren't that agentic at all. They were mostly just software."

The most reliable production systems use fundamental software engineering principles that have worked for decades: small components, clear responsibilities, explicit control flow, and deterministic workflows sprinkled with probabilistic reasoning where it adds value.

The 12 Factors

Patterns for building reliable LLM applications, inspired by Heroku's 12-Factor App methodology.

1. Structured Output (JSON Mode)

The foundational capability - turning natural language into structured JSON.

2. Own Your Prompts

Hand-craft every token for production quality. Context engineering is the key.

3. Explicit Tool Calling

Demystify tool use - recognize it as just JSON parsing and code execution.

4. Own Your Control Flow

Explicit management of agent execution loops and decision branching.

5. State Management

Separate execution state from business state. Make agents stateless with external state.

6. Own Your Context Window

Intentional curation of what information is fed to the LLM. Every token matters.

7. Contact Humans Early

Push the decision between tool execution and human intervention to natural language.

8. Multi-Channel Access

Meet users where they are - Email, Slack, Discord, SMS - not just web chat.

9. Micro-Agents

Small focused loops (3-10 steps) embedded in deterministic workflows.

10. Stateless Agents

Agents should be stateless functions with explicit external state management.

11. Deterministic Workflows

Mix deterministic and probabilistic components strategically.

12. Error Handling

Don't blindly append errors - summarize, clear resolved issues, and maintain context quality.

Factor 1: Structured Output

The Most Magical Capability

Everything begins here - LLMs can turn natural language into structured JSON.

6:00

"It is turning a sentence like this into JSON that looks like this. Doesn't even matter what you do with that JSON."

This is the foundational capability. All other factors build on this. You don't need a framework to access structured output — you can implement it immediately. Every major LLM provider (OpenAI, Anthropic, Google) supports JSON mode natively.

The Core Loop

// 1. Send natural language prompt to LLM
response = llm.generate("Schedule a meeting for tomorrow")

// 2. Receive structured JSON response
{
  "action": "schedule_meeting",
  "date": "2024-10-30",
  "time": "14:00",
  "attendees": ["alice@example.com"]
}

// 3. Parse and route to deterministic code
switch(response.action):
  case "schedule_meeting":
    schedule_meeting(response.date, response.time)

// 4. Execute action

That's it. That's the core loop of every agent system, whether it has 1 tool or 100.

Factor 4: Tool Use is Harmful

"I'm going to go ahead and go on a limb here and say tool use is harmful."

— Dex Horthy

6:30

Tool Use = JSON + Code

There's nothing special about tools. It's just JSON and code.

6:40

"There's nothing special about tools. It's just JSON and code."

"What is happening is our LM is putting out JSON. We're going to give that to some deterministic code that's going to do something"

The abstraction obscures what's actually happening. Tool use is simply: LLM outputs JSON → deterministic code executes it. No magic. No "alien entity interacting with environment."

❌ Framework Approach

// Mysterious, magical
agent.useTool(
  "deploy_frontend",
  {env: "prod"}
)

// What happens inside? Who knows.
// 7 layers of abstraction.

✅ Explicit Approach

// You own it
tool_call = llm_output_json

switch(tool_call.action):
  case "deploy_frontend":
    deploy_front_end(tool_call.params)
  case "deploy_backend":
    deploy_back_end(tool_call.params)

// Clear, debuggable, yours.

Why the Abstraction is Harmful

It obscures reality, makes debugging harder, removes control, creates mystery.

Obscures reality — You think something complex is happening when it's not
Makes debugging harder — You can't debug a "tool" but you can debug JSON parsing
Removes control — Framework decisions replace your engineering judgment
Creates mystery — New engineers think there's something special to learn

Factor 8: Own Your Control Flow

The most powerful section of Horthy's talk walks through a real production deployment agent with 100 tools and 20 steps. How is this manageable? Small, focused agent loops embedded in deterministic workflows.

Deployment Agent Architecture

Deterministic CI/CD Pipeline

Agent: Deployment Planning

Human Approval

Agent: Execute Deployment

End-to-End Tests

Agent: Rollback (if needed)

"100 tools, 20 steps, easy. Um manageable context, clear responsibilities."

— Dex Horthy

9:00

Why This Works

Each agent loop has 3-10 steps with clear decision points.

Each agent loop has 3-10 steps (manageable context)
Clear decision points between deterministic and probabilistic
Human-in-the-loop at critical junctures
Each component has single responsibility
You can pause, resume, debug, and reason about every step

Own Your Context Window

The Context Window Anti-Pattern

Your agent calls an API, gets an error. You append that error to the context and have it retry. Then it calls another API, gets another error. You append that too.

After 5 retries, your context window is bloated with error messages, and the model is completely confused.

11:00

"Seen like this thing just like kind of spin out and like go crazy and lose context and just get stuck."

The Pattern: Curate Every Token

You will always get tighter, better, higher reliability results by controlling and limiting tokens.

10:30

"You will always get like tighter, better, higher reliability results by controlling and limiting the number of tokens you put in that context window"

"If you're not looking at every single token and if you're not optimizing the density and the clarity of the way that you're passing information to an LLM, you might be missing out on upside and quality."

❌ Anti-Pattern

✗Blindly appending errors to context
✗Including full stack traces
✗Never clearing resolved errors
✗Letting context grow unbounded

✅ Best Practices

✓Summarize multiple errors into one clear message
✓Clear pending errors when valid tool call succeeds
✓Compress long conversations into summaries
✓Be intentional about every single token

The Micro-Agents Pattern

"The things that people are doing that work really well are micro agents."

— Dex Horthy

7:30

Small Focused Loops (3-10 Steps)

You still have a mostly deterministic DAG with very small agent loops embedded.

7:30

"You still have a mostly deterministic DAG and you have these very small agent loops with like three to 10 steps."

DAG = Directed Acyclic Graph — essentially what your code already is. Every if statement creates a branching path. Every function call creates a node.

The mistake engineers make is thinking agents need to replace this entire graph. They don't. They should enhance specific nodes where probabilistic reasoning adds value.

The Pattern

Deterministic Code

Agent Loop (3-10 steps)

Deterministic Code

Agent Loop

Benefits

Each loop is independently testable, failures isolated, debugging straightforward.

Each loop is independently testable
Failures are isolated
Context windows stay small
Debugging is straightforward
You can iterate on individual loops without touching the whole system

State Management: Stateless Agents

"Basically agents should be stateless. You should own the state, manage it however you want."

— Dex Horthy

10:00

"LM are stateless functions, which means just make sure you put the right things in the context and you'll get the best results."

— Dex Horthy

10:15

Stateless Agent Pattern

def agent(input: str, state: State) -> tuple[str, State]:
    """
    Stateless function: given input and state,
    returns output and new state.
    """
    # 1. Build context window from state
    context = build_context(state)

    # 2. Call LLM (pure function)
    response = llm.generate(context)

    # 3. Extract new state
    new_state = extract_state(response, state)

    # 4. Return output + new state
    return response.output, new_state

Execution State

Framework-level state

• Current step
• Next step
• Retry counts
• Loop status

Business State

Application-level state

• Messages that have happened
• Data to display to user
• Things waiting on approval
• User interactions

Why This Matters

Scalability, reliability, debugging, flexibility.

Scalability — Stateless functions scale horizontally
Reliability — State is persisted, not lost on crashes
Debugging — You can inspect and replay any state
Flexibility — Pause/resume, branch, retry any workflow

Meet Users Where They Are

"People don't want to have seven tabs open of different chat GPT style agents. Just let people email with the agents you're building. Let them slack with the agents you're building. Discord, SMS, whatever it is."

— Dex Horthy

11:40

One of Horthy's most practical insights: Stop building web chat interfaces. Your users already have communication preferences. Some live in Slack. Some live in email. Some want to DM your agent on Discord.

12:00

"We see this taking off all over the place."

The Architecture

Email/Slack/SMS/Discord

REST API / MCP Server

Agent Core

Response via Same Channel

Your agent shouldn't care about the communication channel. Put it behind a REST API or MCP server, and let adapters handle channel-specific logic.

Community Validation

Why It Resonated

Captured patterns people were already using

Captured patterns people were already using
Gave names to unnamed practices
Provided vocabulary for discussing agent architecture
Anti-framework stance appealed to experienced engineers
Validated frustrations everyone was feeling

Key Takeaways for AI Engineers

Key Takeaways

•Start with frameworks for learning, not production - be prepared to throw them away when you hit the 70-80% ceiling
•Own your prompts - hand-craft every token for production quality
•Treat tool calling as explicit JSON + code parsing, not magic
•Build micro-agents (3-10 steps) embedded in deterministic workflows
•Separate execution state from business state - make agents stateless
•Own your context window - curate every token aggressively
•Meet users where they are - Email, Slack, Discord, SMS - not just web chat
•Not every problem needs an agent - sometimes a bash script is better
•If you own your control flow, you can do fun things like break, switch, summarize, and branch
•LLMs are pure functions - tokens in, tokens out - context engineering is everything

The Anti-Framework Stance

"I am not here to bash frameworks. I think it should be the opposite."

— Dex Horthy

12:30

Throughout the talk, Horthy is careful to position this not as framework-bashing, but as a wishlist — a list of feature requests for what frameworks should become.

What Frameworks Should Do Instead

Take away 'other hard parts' so we can focus on 'hard AI parts'.

12:40

"A lot of frameworks try to take away the hard AI parts of the problem so that you can just kind of drop it in and go. And, uh, I think it should be the opposite."

"I think the tools that we get should take away the other hard parts so that we can spend all our time focusing on the hard AI parts, on getting the prompts right, on getting the flow right, on getting the tokens right."

❌ Bootstrap Approach

Complete framework, drop-in solution, hard to customize, you're stuck with it.

Frameworks that take away hard AI parts, leave you at 70-80% reliability.

✅ Shadcn Approach

Scaffolded out, then you own the code, can customize anything.

Tools that handle infrastructure (state, APIs) but leave prompts, flow, tokens to you.

Find the Bleeding Edge

"Find the bleeding edge. Find ways to do things better than everybody else by really curating what you put in the model and how you control what comes out."

— Dex Horthy

14:00

"If you can figure out how to get it right reliably anyways because you've engineered reliability into your system then you will have created something magical and you will have created something that's better than what everybody else is building."

— Dex Horthy

14:15

The Vision

Engineering reliability at the boundary of what models can do.

Horthy's vision isn't about avoiding complexity — it's about engineering reliability at the boundary of what models can do.

The companies pushing the frontier aren't using better models. They're not using secret frameworks. They're engineering better context.

They're optimizing every token. They're testing variations systematically. They're building small, focused loops with clear responsibilities. They're treating agents as software, not magic.

About the Speaker

Dex Horthy

Founder of HumanLayer

Dex Horthy is the founder of HumanLayer and creator of the 12-Factor Agents methodology. With extensive experience building production AI systems, he identified patterns that separate successful agent implementations from framework traps.

After interviewing 100+ founders and builders working on production AI systems, he discovered consistent patterns that became the foundation for the 12-Factor Agents methodology.

github.com/humanlayer/12-factor-agents

Source Video

12-Factor Agents: Patterns of Reliable LLM Applications

Dex Horthy, Founder of HumanLayer • AI Engineer Summit 2024

Video ID: 8kMaTybvDUw•Duration: 17:03

AI Engineering

12-Factor Agents

Software Engineering

Production AI

Agent Architecture

Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with dedicated agents for transcript analysis, highlight extraction, fact-checking, and UI/UX design.