12-Factor Agents: Patterns of Reliable LLM Applications
Why production-grade AI agents aren't built with magical frameworks, but with proven software engineering principles.
"Agents are just software. You all can build software."
— Dex Horthy, Founder of HumanLayer
Watch (8:45)GitHub Stars
Contributors
Social Reach
Hacker News
Executive Summary
Dex Horthy delivers a groundbreaking challenge to the AI industry's framework obsession. After interviewing 100+ founders and builders working on production AI systems, he discovered a consistent pattern: agent frameworks get you to 70-80% functionality quickly, but achieving production-grade reliability requires throwing them away and rebuilding from scratch.
The core thesis is simple yet profound: "Agents are just software." They should be built using established software engineering principles rather than magical, opaque frameworks. Horthy introduces 12-Factor Agents — a methodology inspired by Heroku's original 12-Factor App patterns — to standardize production AI systems.
The talk resonated so deeply it hit the front page of Hacker News all day, garnered 17,000+ GitHub stars, and gave engineers the vocabulary to discuss agent architecture patterns they were already using but couldn't name.
The 70-80% Problem
The Framework Trap
You spin up a LangChain project, prototype an agent in a weekend, and it works surprisingly well. The CEO is excited. You add more tools. The demo goes great.
Then you hit production. Suddenly, your agent is hallucinating API calls, stuck in retry loops, context windows are bloating...
Seven Layers Deep
You're seven layers deep in a framework's call stack, trying to reverse engineer why your "intelligent" system decided to deploy to the wrong environment.
The abstraction that made you productive is now opaque, debugging is a nightmare, and you've lost control.
Not Every Problem Needs an Agent
Horthy tried building a DevOps agent to run makefile commands. After 2 hours of prompt engineering, he realized:
"I could have written the bash script to do this in about 90 seconds."
The Core Philosophy
"Agents are just software. You all can build software."
— Dex Horthy
8:45LLMs Are Pure Functions
Tokens in → Tokens out. Agent reliability = context engineering.
"LLMs are pure functions, token in, tokens out, and everything everything in making agents good is context engineering."
"The only thing that determines the reliability of your agent is how good of tokens can you get out and the only thing that determines the tokens you get out other than like retraining your own own model and something like that is being really careful about what tokens you put in."
This isn't prompt engineering as mystical art. It's systematic optimization of every token that enters your model's context window.
Most Production Agents Aren't That Agentic
They're well-engineered software with small, deterministic components.
"Most production agents aren't that agentic at all. They were mostly just software."
The most reliable production systems use fundamental software engineering principles that have worked for decades: small components, clear responsibilities, explicit control flow, and deterministic workflows sprinkled with probabilistic reasoning where it adds value.
The 12 Factors
Patterns for building reliable LLM applications, inspired by Heroku's 12-Factor App methodology.
1. Structured Output (JSON Mode)
The foundational capability - turning natural language into structured JSON.
2. Own Your Prompts
Hand-craft every token for production quality. Context engineering is the key.
3. Explicit Tool Calling
Demystify tool use - recognize it as just JSON parsing and code execution.
4. Own Your Control Flow
Explicit management of agent execution loops and decision branching.
5. State Management
Separate execution state from business state. Make agents stateless with external state.
6. Own Your Context Window
Intentional curation of what information is fed to the LLM. Every token matters.
7. Contact Humans Early
Push the decision between tool execution and human intervention to natural language.
8. Multi-Channel Access
Meet users where they are - Email, Slack, Discord, SMS - not just web chat.
9. Micro-Agents
Small focused loops (3-10 steps) embedded in deterministic workflows.
10. Stateless Agents
Agents should be stateless functions with explicit external state management.
11. Deterministic Workflows
Mix deterministic and probabilistic components strategically.
12. Error Handling
Don't blindly append errors - summarize, clear resolved issues, and maintain context quality.
Factor 1: Structured Output
The Most Magical Capability
Everything begins here - LLMs can turn natural language into structured JSON.
"It is turning a sentence like this into JSON that looks like this. Doesn't even matter what you do with that JSON."
This is the foundational capability. All other factors build on this. You don't need a framework to access structured output — you can implement it immediately. Every major LLM provider (OpenAI, Anthropic, Google) supports JSON mode natively.
The Core Loop
// 1. Send natural language prompt to LLM
response = llm.generate("Schedule a meeting for tomorrow")
// 2. Receive structured JSON response
{
"action": "schedule_meeting",
"date": "2024-10-30",
"time": "14:00",
"attendees": ["alice@example.com"]
}
// 3. Parse and route to deterministic code
switch(response.action):
case "schedule_meeting":
schedule_meeting(response.date, response.time)
// 4. Execute actionThat's it. That's the core loop of every agent system, whether it has 1 tool or 100.
Factor 4: Tool Use is Harmful
"I'm going to go ahead and go on a limb here and say tool use is harmful."
— Dex Horthy
6:30Tool Use = JSON + Code
There's nothing special about tools. It's just JSON and code.
"There's nothing special about tools. It's just JSON and code."
"What is happening is our LM is putting out JSON. We're going to give that to some deterministic code that's going to do something"
The abstraction obscures what's actually happening. Tool use is simply: LLM outputs JSON → deterministic code executes it. No magic. No "alien entity interacting with environment."
❌ Framework Approach
// Mysterious, magical
agent.useTool(
"deploy_frontend",
{env: "prod"}
)
// What happens inside? Who knows.
// 7 layers of abstraction.✅ Explicit Approach
// You own it
tool_call = llm_output_json
switch(tool_call.action):
case "deploy_frontend":
deploy_front_end(tool_call.params)
case "deploy_backend":
deploy_back_end(tool_call.params)
// Clear, debuggable, yours.Why the Abstraction is Harmful
It obscures reality, makes debugging harder, removes control, creates mystery.
- Obscures reality — You think something complex is happening when it's not
- Makes debugging harder — You can't debug a "tool" but you can debug JSON parsing
- Removes control — Framework decisions replace your engineering judgment
- Creates mystery — New engineers think there's something special to learn
Factor 8: Own Your Control Flow
The most powerful section of Horthy's talk walks through a real production deployment agent with 100 tools and 20 steps. How is this manageable? Small, focused agent loops embedded in deterministic workflows.
Deployment Agent Architecture
"100 tools, 20 steps, easy. Um manageable context, clear responsibilities."
— Dex Horthy
9:00Why This Works
Each agent loop has 3-10 steps with clear decision points.
- Each agent loop has 3-10 steps (manageable context)
- Clear decision points between deterministic and probabilistic
- Human-in-the-loop at critical junctures
- Each component has single responsibility
- You can pause, resume, debug, and reason about every step
Own Your Context Window
The Context Window Anti-Pattern
Your agent calls an API, gets an error. You append that error to the context and have it retry. Then it calls another API, gets another error. You append that too.
After 5 retries, your context window is bloated with error messages, and the model is completely confused.
"Seen like this thing just like kind of spin out and like go crazy and lose context and just get stuck."
The Pattern: Curate Every Token
You will always get tighter, better, higher reliability results by controlling and limiting tokens.
"You will always get like tighter, better, higher reliability results by controlling and limiting the number of tokens you put in that context window"
"If you're not looking at every single token and if you're not optimizing the density and the clarity of the way that you're passing information to an LLM, you might be missing out on upside and quality."
❌ Anti-Pattern
- ✗Blindly appending errors to context
- ✗Including full stack traces
- ✗Never clearing resolved errors
- ✗Letting context grow unbounded
✅ Best Practices
- ✓Summarize multiple errors into one clear message
- ✓Clear pending errors when valid tool call succeeds
- ✓Compress long conversations into summaries
- ✓Be intentional about every single token
The Micro-Agents Pattern
"The things that people are doing that work really well are micro agents."
— Dex Horthy
7:30Small Focused Loops (3-10 Steps)
You still have a mostly deterministic DAG with very small agent loops embedded.
"You still have a mostly deterministic DAG and you have these very small agent loops with like three to 10 steps."
DAG = Directed Acyclic Graph — essentially what your code already is. Every if statement creates a branching path. Every function call creates a node.
The mistake engineers make is thinking agents need to replace this entire graph. They don't. They should enhance specific nodes where probabilistic reasoning adds value.
The Pattern
Benefits
Each loop is independently testable, failures isolated, debugging straightforward.
- Each loop is independently testable
- Failures are isolated
- Context windows stay small
- Debugging is straightforward
- You can iterate on individual loops without touching the whole system
State Management: Stateless Agents
"Basically agents should be stateless. You should own the state, manage it however you want."
— Dex Horthy
10:00"LM are stateless functions, which means just make sure you put the right things in the context and you'll get the best results."
— Dex Horthy
10:15Stateless Agent Pattern
def agent(input: str, state: State) -> tuple[str, State]:
"""
Stateless function: given input and state,
returns output and new state.
"""
# 1. Build context window from state
context = build_context(state)
# 2. Call LLM (pure function)
response = llm.generate(context)
# 3. Extract new state
new_state = extract_state(response, state)
# 4. Return output + new state
return response.output, new_stateExecution State
Framework-level state
- • Current step
- • Next step
- • Retry counts
- • Loop status
Business State
Application-level state
- • Messages that have happened
- • Data to display to user
- • Things waiting on approval
- • User interactions
Why This Matters
Scalability, reliability, debugging, flexibility.
- Scalability — Stateless functions scale horizontally
- Reliability — State is persisted, not lost on crashes
- Debugging — You can inspect and replay any state
- Flexibility — Pause/resume, branch, retry any workflow
Meet Users Where They Are
"People don't want to have seven tabs open of different chat GPT style agents. Just let people email with the agents you're building. Let them slack with the agents you're building. Discord, SMS, whatever it is."
— Dex Horthy
11:40One of Horthy's most practical insights: Stop building web chat interfaces. Your users already have communication preferences. Some live in Slack. Some live in email. Some want to DM your agent on Discord.
"We see this taking off all over the place."
The Architecture
Your agent shouldn't care about the communication channel. Put it behind a REST API or MCP server, and let adapters handle channel-specific logic.
Community Validation
Why It Resonated
Captured patterns people were already using
- Captured patterns people were already using
- Gave names to unnamed practices
- Provided vocabulary for discussing agent architecture
- Anti-framework stance appealed to experienced engineers
- Validated frustrations everyone was feeling
Key Takeaways for AI Engineers
Key Takeaways
- •Start with frameworks for learning, not production - be prepared to throw them away when you hit the 70-80% ceiling
- •Own your prompts - hand-craft every token for production quality
- •Treat tool calling as explicit JSON + code parsing, not magic
- •Build micro-agents (3-10 steps) embedded in deterministic workflows
- •Separate execution state from business state - make agents stateless
- •Own your context window - curate every token aggressively
- •Meet users where they are - Email, Slack, Discord, SMS - not just web chat
- •Not every problem needs an agent - sometimes a bash script is better
- •If you own your control flow, you can do fun things like break, switch, summarize, and branch
- •LLMs are pure functions - tokens in, tokens out - context engineering is everything
The Anti-Framework Stance
"I am not here to bash frameworks. I think it should be the opposite."
— Dex Horthy
12:30Throughout the talk, Horthy is careful to position this not as framework-bashing, but as a wishlist — a list of feature requests for what frameworks should become.
What Frameworks Should Do Instead
Take away 'other hard parts' so we can focus on 'hard AI parts'.
"A lot of frameworks try to take away the hard AI parts of the problem so that you can just kind of drop it in and go. And, uh, I think it should be the opposite."
"I think the tools that we get should take away the other hard parts so that we can spend all our time focusing on the hard AI parts, on getting the prompts right, on getting the flow right, on getting the tokens right."
❌ Bootstrap Approach
Complete framework, drop-in solution, hard to customize, you're stuck with it.
Frameworks that take away hard AI parts, leave you at 70-80% reliability.
✅ Shadcn Approach
Scaffolded out, then you own the code, can customize anything.
Tools that handle infrastructure (state, APIs) but leave prompts, flow, tokens to you.
Find the Bleeding Edge
"Find the bleeding edge. Find ways to do things better than everybody else by really curating what you put in the model and how you control what comes out."
— Dex Horthy
14:00"If you can figure out how to get it right reliably anyways because you've engineered reliability into your system then you will have created something magical and you will have created something that's better than what everybody else is building."
— Dex Horthy
14:15The Vision
Engineering reliability at the boundary of what models can do.
Horthy's vision isn't about avoiding complexity — it's about engineering reliability at the boundary of what models can do.
The companies pushing the frontier aren't using better models. They're not using secret frameworks. They're engineering better context.
They're optimizing every token. They're testing variations systematically. They're building small, focused loops with clear responsibilities. They're treating agents as software, not magic.
About the Speaker
Dex Horthy
Founder of HumanLayer
Dex Horthy is the founder of HumanLayer and creator of the 12-Factor Agents methodology. With extensive experience building production AI systems, he identified patterns that separate successful agent implementations from framework traps.
After interviewing 100+ founders and builders working on production AI systems, he discovered consistent patterns that became the foundation for the 12-Factor Agents methodology.
Source Video
12-Factor Agents: Patterns of Reliable LLM Applications
Dex Horthy, Founder of HumanLayer • AI Engineer Summit 2024
Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with dedicated agents for transcript analysis, highlight extraction, fact-checking, and UI/UX design.