Ship Agents that Ship: Building Production AI Agents with Guardrails
Kyle Penfound and Jeremy Adams from Dagger demonstrate building production-ready AI agents live—no slides, just real code, failures, debugging, and eventual success.
"The key insight is that LLMs are great at choosing from a menu of options, not great at free-form coding. Give them well-defined tools."
— Kyle Penfound, Dagger Ecosystem Team • 28:15
Workshop Format
Hands-on live coding
Dagger GitHub
Stars on main repo
Container-Native
Isolated execution
Production-Ready
Real-world patterns
Why This Workshop Matters
The Problem Agents Face
Most AI agent demos look impressive but break in production. Hallucinations, infinite loops, unclear objectives, and lack of guardrails make them unreliable for real-world use. Teams struggle to move from prototype to production.
Common Agent Failures
- Agents that generate syntactically invalid code
- Infinite loops or decision paralysis
- No sandboxing—agents can affect production systems
- Poor error handling and recovery
- Missing context window optimization
The Workshop Approach
This isn't about agent architecture theory. It's about sitting down and building agents that actually work—showing you the failures, the debugging process, and the patterns that emerge from real-world iteration.
"Guardrails aren'''t about limiting the agent, they'''re about giving the agent a safe playground to operate in."
— Kyle Penfound, Ecosystem Team at Dagger
35:42About Dagger
Container-Native CI/CD for AI Agents
Dagger is a programmable CI/CD platform that combines the power of containers with the consistency of programming. Founded by Solomon Hykes (creator of Docker), Dagger treats LLMs as first-class components—perfect for building AI agents that run in isolated, reproducible environments.
Kyle Penfound
Ecosystem Team at Dagger. Background in DevOps and platform engineering. Focuses on making complex infrastructure accessible to developers.
@kpenfoundJeremy Adams
Ecosystem Team at Dagger. "I've been at Dagger for a few years." Expertise in infrastructure and container orchestration. Brings architecture-focused perspective to agent development.
@jeremyadamsdingCore Insights from the Workshop
"Containers are the perfect sandbox for AI agents because they'''re isolated, reproducible, and ephemeral."
— Jeremy Adams, Ecosystem Team at Dagger
42:18LLMs Excel at Tool Selection, Not Free-Form Coding
The fundamental insight behind effective agent design: structure enables capability. Instead of letting LLMs generate arbitrary code, give them a well-defined menu of tools to choose from. This is why OpenAI's function calling works so well.
Guardrails Enable Freedom
Counterintuitively, constraining agents with well-defined tools makes them more capable and reliable. Guardrails aren't limitations—they're safety boundaries that enable confident operation in production environments.
Containers Are Perfect for Agents
Containers provide three critical properties for agent sandboxes: isolation (agent actions can't affect host), reproducibility (same environment every time), and ephemerality (clean slate for each execution).
Decomposition is Critical
Break down large tasks into smaller, tool-callable operations. This enables handling tasks of varying complexity and makes debugging easier when something goes wrong.
"One of the most important things is making sure your agent can fail gracefully. When it doesn'''t know what to do, it should ask for help."
— Jeremy Adams, Ecosystem Team at Dagger
68:15Workshop Demonstrations
The workshop featured four live demonstrations showing the complete journey from setup to production-ready agent.
GitHub Integration Setup
Setting up authentication, configuring the GitHub repository, and creating test issues for the agent to process.
Key Steps:
- • Created GitHub personal access token
- • Configured Dagger environment variables
- • Set up test repository with issues
- • Verified GitHub API connectivity
Agent Reading and Understanding Issues
The agent connects to the GitHub API, reads issue content, parses requirements, and selects appropriate tools.
Agent Decision Process:
- • Read issue title and description via GitHub API
- • Parse requirements and identify task type
- • Select from available tools (read_file, write_file, test)
- • Plan multi-step execution strategy
Code Generation and File Creation
Agent generates Python code, writes files to repository, and creates commits in real-time with live debugging.
Live Demo Moments:
- • 00:55:00 Live debugging when agent couldn't access GitHub
- • 01:02:30 Real-time code modification during demo
- • 01:06:45 Handling authentication errors on the fly
Pull Request Creation and Automation
Creating pull requests with descriptions, handling merge conflicts, and running automated tests.
Automated Workflow:
- • Create feature branch from main
- • Commit generated code changes
- • Open pull request with description
- • Run tests and validate changes
Production-Ready Patterns
Design Tool Menus, Not Freedom
Don't give agents unrestricted code execution. Provide a curated menu of tools they can choose from. Build function calling schemas with discrete, well-documented operations.
Containerize Everything
Every agent action should run in a container. This provides safety, reproducibility, and clean state management. Use Docker containers as the execution environment for all agent operations.
Implement Explicit Fallbacks
Design your agent to recognize when it's stuck and trigger a human-in-the-loop workflow. Add confidence thresholds and escalation paths to your agent logic.
Decompose for Scale
Break complex workflows into small, tool-callable operations. This enables handling tasks of varying complexity and makes systems more maintainable.
Production Agent Checklist
- All agent actions run in isolated containers
- Well-defined tool menu with function calling
- Graceful error handling and human-in-the-loop fallbacks
- Comprehensive testing before accepting agent changes
- Observability and monitoring for agent decisions
- Clear context window management and optimization
Key Takeaways
Practical Agent Development
- •Start Simple: Don't over-engineer initial agent implementation. Begin with a small tool menu and expand as needed.
- •Embrace Failure: Expect things to break; design for debugging. The workshop showed real failures and how to fix them.
- •Context is King: Every token in the context window matters. Design your prompts and tool descriptions carefully.
- •Test Early and Often: Write tests before or alongside agent code. Never accept agent changes without validation.
- •Iterate Quickly: Small, fast cycles beat long planning sessions. Run experiments, gather data, and improve.
- •Monitor Everything: You can't improve what you don't measure. Log agent decisions, tool usage, and success rates.
- •Use Tools Judiciously: More tools ≠ better agent. Curate a focused set of high-quality, well-documented operations.
- •Human-in-the-Loop: Know when to let humans intervene. Design agents to ask for help when uncertain.
"If you decompose things down and you can architect things right, it can handle a lot of different sizes."
— Kyle Penfound, Ecosystem Team at Dagger
77:22Watch the Full Workshop
Related Resources
Dagger Documentation
Research Notes & Methodology
This highlight page is based on a comprehensive analysis of the workshop transcript from the AI Engineer Summit 2024. The workshop featured live demonstrations, real-time debugging, and practical implementation patterns.
- • Full VTT transcript (16,896 lines)
- • Complete workshop recording (~80 minutes)
- • GitHub repositories and documentation
- • Dagger official documentation
- • Complete transcript analysis
- • Quote extraction and verification
- • Fact-checking against official sources
- • Cross-reference with documentation
Video: Ship Agents that Ship: A Hands-On Workshop by Kyle Penfound and Jeremy Adams (Dagger)
Event: AI Engineer Summit 2024 • Published: October 29, 2024