Why an AI Agent Taught Itself to Escape a Hole in Minecraft
An AWS engineer builds "Rocky" - a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents that demonstrates emergent behavior, live at a Serverless conference. Learn the production architecture patterns, the critical "Return of Control" technique, and why Claude Haiku was chosen over more powerful models.
Behavior that we didn't expect and it just works, which is really fascinating.
— AWS Solutions Architect, on Rocky digging itself out of a hole (00:06:16)
Live demo with real-time AI
Chosen for speed, not intelligence
Return of Control enabled
The Live Demo That Shouldn't Have Worked
At an AWS Serverless Conference, an engineer from Australia decided to break the golden rule of live presentations: never demo with kids, animals, or an LLM. He did all three.
Today I'm going to live demo what you should never do. They say you should never live demo with kids, animals, and an LLM. So I'm going to do that.Watch opening (00:02:34)
The Challenge
Build an autonomous AI agent that can play Minecraft in real-time, demonstrating natural language understanding, spatial reasoning, and tool use.
The Risk
LLMs are unpredictable. A live demo with 18 minutes of unscripted AI gameplay could easily fail, hallucinate, or produce nonsense.
The Emergent Behavior Moment
During a pre-recorded demo, Rocky the bot was digging a 2x2 hole. But then something unexpected happened—the bot kept digging until it escaped the hole it had created. No one programmed it to do that.
The "Aha" Moment
"Behavior that we didn't expect and it just works, which is really fascinating."
This wasn't programmed behavior. Rocky reasoned through its environment and discovered a solution autonomously—digging a staircase to escape the hole. This is the holy grail of AI: emergent problem-solving that wasn't explicitly trained or coded.
Watch the emergent behavior (00:06:16)Why This Matters
Most AI systems are brittle—they break when faced with novel situations. Rocky's ability to solve an unanticipated problem demonstrates that well-designed agent systems can generalize beyond their training. This emergence is what separates scripted automation from true intelligence.
Architecture Evolution: From Complex to Simple
The team didn't start with Amazon Bedrock Agents. They built their way there through complexity, failure, and iteration.
The LangChain + Lambda Approach
Initial architecture: LangChain for agent orchestration, AWS Lambda for compute, SageMaker hosting Cohere LLMs. Custom code for every tool, action, and workflow step.
"LangChain became too complex as actions increased. We decided that's not service enough, let's use Agents for Amazon Bedrock."
Watch (00:07:41)Bedrock Agents + ECS
Final architecture: Amazon Bedrock Agents for orchestration (managed service), ECS containers for Minecraft/Mineflayer (stateful requirements), Claude Haiku for fast inference.
Why Bedrock Agents?
- ✅ Managed agentic workflow orchestration
- ✅ Built-in knowledge bases for RAG
- ✅ Guardrails for safety
- ✅ Chain of Thought tracing
- ✅ Return of Control support
- ✅ Common API across multiple models
Infrastructure Decision: Why Containers Over Lambda?
Minecraft + Mineflayer
Cannot run on Lambda due to state requirements—persistent connection to game server, real-time interaction needed.
Bedrock Agents
Serverless orchestration—manages agent workflows, tool selection, prompt engineering.
The Secret Sauce: Return of Control
This is the most important technical pattern from the entire talk. Return of Control (ROC) is what makes agent outputs structured, parseable, and production-ready.
Very key for this demo it has Return Of Control.Watch (00:10:02)
Without ROC
Agent returns unstructured natural language: "Sure, I'll build a double-decker couch for you! Let me start by placing some oak planks..."
With ROC
Agent returns structured JSON: {"blocks": [{"x":0,"y":0,"z":0,"type":"oak_planks"}]}
How Return of Control Works
- 1. Agent receives user request (e.g., "build a double-decker couch")
- 2. Agent selects appropriate tool (e.g., build action)
- 3. Agent generates structured output (JSON) for tool parameters
- 4. Control returns to system with parseable output
- 5. System executes action programmatically
Model Selection: Speed Over Intelligence
Why did they choose Claude Haiku instead of more powerful models like Claude 3 Opus or GPT-4? The answer will surprise you: latency matters more than capability for real-time interactions.
Claude in particular Claude Haiku because it's fast.Watch (00:09:04)
Real-Time Requirements
Minecraft gameplay requires sub-3-second responses to feel natural. Larger models would feel sluggish and break immersion.
The Speed/Capability Trade-off
Haiku sacrifices some reasoning capability for 3-5x faster inference. For gaming, that's the right call.
Bedrock Flexibility
Using Bedrock's common API, they can swap models without code changes. Can test Opus later if needed.
Cost Optimization
Haiku is significantly cheaper per token than larger models. Important for a bot running continuous gameplay.
Decision Framework
Use Fast Models (Haiku):
- • Real-time games and interactive apps
- • High-volume, low-latency use cases
- • Simple tool-calling workflows
Use Smart Models (Opus/GPT-4):
- • Complex reasoning tasks
- • Batch/offline processing
- • Multi-step analysis workflows
Prompt Engineering in 3D Space
Building in Minecraft is hard. The bot needs to understand 3D coordinate systems, block types, adjacency rules, and spatial relationships. Without careful prompt engineering, "it goes bananas and builds just nonsense."
Because if we didn't do that it goes bananas and builds just nonsense.
On why strict prompt engineering is required for the build action
Watch (00:16:37)The Build Prompt Strategy
- • Coordinate System: Define X, Y, Z axes explicitly
- • Block Types: List valid Minecraft blocks (oak_planks, stone, etc.)
- • JSON Schema: Include example of expected output format
- • Adjacency Rules: Specify which blocks can touch
- • Structure Template: Provide examples of valid builds
Natural Language → 3D Structure
Input: "Build a double-decker couch"
↓
LLM Reasoning (spatial planning)
↓
Output: JSON coordinates + block types
The transformation requires the LLM to understand 3D space and output precise coordinates.
Key Takeaways for AI Engineers
1. Emergent Behavior is Real
Well-designed agent systems can solve unanticipated problems. Rocky digging out of a hole wasn't programmed—it emerged from the interaction of reasoning, tools, and environment.
2. Return of Control is Essential
Production agents need structured outputs, not freeform text. ROC forces parseable JSON that your code can execute programmatically.
3. Speed Beats Intelligence for Real-Time
Claude Haiku was chosen for latency, not capability. For interactive applications, response time is more important than reasoning power.
4. Managed Services Reduce Complexity
LangChain became unwieldy as tools increased. Bedrock Agents provides managed orchestration, tracing, and RAG integration.
5. Stateful Components Need Containers
Don't force serverless where it doesn't fit. Mineflayer requires persistent connections, so ECS containers replace Lambda.
6. Prompt Engineering for Physical Spaces
3D environments require strict constraints: coordinate systems, block types, adjacency rules, and JSON schemas. Without these, agents produce nonsense.
Source Video
Claude plays Minecraft!
AWS Solutions Architect • Serverless Conference
Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with transcript-analyzer, highlight-extractor, fact-checker, and content-strategist agents.
Technologies Mentioned: Amazon Bedrock, Claude Haiku (Anthropic), Mineflayer, LangChain, AWS ECS, AWS Lambda, Amazon SageMaker, AWS CDK, CloudFormation