AI Engineering Case Study

Why an AI Agent Taught Itself to Escape a Hole in Minecraft

An AWS engineer builds "Rocky" - a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents that demonstrates emergent behavior, live at a Serverless conference. Learn the production architecture patterns, the critical "Return of Control" technique, and why Claude Haiku was chosen over more powerful models.

Behavior that we didn't expect and it just works, which is really fascinating.

— AWS Solutions Architect, on Rocky digging itself out of a hole (00:06:16)

18 min

Live demo with real-time AI

Claude Haiku

Chosen for speed, not intelligence

ROC

Return of Control enabled

The Live Demo That Shouldn't Have Worked

At an AWS Serverless Conference, an engineer from Australia decided to break the golden rule of live presentations: never demo with kids, animals, or an LLM. He did all three.

Today I'm going to live demo what you should never do. They say you should never live demo with kids, animals, and an LLM. So I'm going to do that.
Watch opening (00:02:34)

The Challenge

Build an autonomous AI agent that can play Minecraft in real-time, demonstrating natural language understanding, spatial reasoning, and tool use.

The Risk

LLMs are unpredictable. A live demo with 18 minutes of unscripted AI gameplay could easily fail, hallucinate, or produce nonsense.

The Emergent Behavior Moment

During a pre-recorded demo, Rocky the bot was digging a 2x2 hole. But then something unexpected happened—the bot kept digging until it escaped the hole it had created. No one programmed it to do that.

The "Aha" Moment

"Behavior that we didn't expect and it just works, which is really fascinating."

This wasn't programmed behavior. Rocky reasoned through its environment and discovered a solution autonomously—digging a staircase to escape the hole. This is the holy grail of AI: emergent problem-solving that wasn't explicitly trained or coded.

Watch the emergent behavior (00:06:16)

Why This Matters

Most AI systems are brittle—they break when faced with novel situations. Rocky's ability to solve an unanticipated problem demonstrates that well-designed agent systems can generalize beyond their training. This emergence is what separates scripted automation from true intelligence.

Architecture Evolution: From Complex to Simple

The team didn't start with Amazon Bedrock Agents. They built their way there through complexity, failure, and iteration.

Phase 1 (Failed)

The LangChain + Lambda Approach

Initial architecture: LangChain for agent orchestration, AWS Lambda for compute, SageMaker hosting Cohere LLMs. Custom code for every tool, action, and workflow step.

"LangChain became too complex as actions increased. We decided that's not service enough, let's use Agents for Amazon Bedrock."

Watch (00:07:41)
Phase 2 (Current)

Bedrock Agents + ECS

Final architecture: Amazon Bedrock Agents for orchestration (managed service), ECS containers for Minecraft/Mineflayer (stateful requirements), Claude Haiku for fast inference.

Why Bedrock Agents?

  • ✅ Managed agentic workflow orchestration
  • ✅ Built-in knowledge bases for RAG
  • ✅ Guardrails for safety
  • ✅ Chain of Thought tracing
  • ✅ Return of Control support
  • ✅ Common API across multiple models

Infrastructure Decision: Why Containers Over Lambda?

Minecraft + Mineflayer

Cannot run on Lambda due to state requirements—persistent connection to game server, real-time interaction needed.

ECS Containers

Bedrock Agents

Serverless orchestration—manages agent workflows, tool selection, prompt engineering.

Managed Service
Watch infrastructure explanation (00:06:58)

The Secret Sauce: Return of Control

This is the most important technical pattern from the entire talk. Return of Control (ROC) is what makes agent outputs structured, parseable, and production-ready.

Very key for this demo it has Return Of Control.
Watch (00:10:02)

Without ROC

Agent returns unstructured natural language: "Sure, I'll build a double-decker couch for you! Let me start by placing some oak planks..."

Unparseable

With ROC

Agent returns structured JSON: {"blocks": [{"x":0,"y":0,"z":0,"type":"oak_planks"}]}

Parseable

How Return of Control Works

  1. 1. Agent receives user request (e.g., "build a double-decker couch")
  2. 2. Agent selects appropriate tool (e.g., build action)
  3. 3. Agent generates structured output (JSON) for tool parameters
  4. 4. Control returns to system with parseable output
  5. 5. System executes action programmatically

Model Selection: Speed Over Intelligence

Why did they choose Claude Haiku instead of more powerful models like Claude 3 Opus or GPT-4? The answer will surprise you: latency matters more than capability for real-time interactions.

Claude in particular Claude Haiku because it's fast.
Watch (00:09:04)

Real-Time Requirements

Minecraft gameplay requires sub-3-second responses to feel natural. Larger models would feel sluggish and break immersion.

The Speed/Capability Trade-off

Haiku sacrifices some reasoning capability for 3-5x faster inference. For gaming, that's the right call.

Bedrock Flexibility

Using Bedrock's common API, they can swap models without code changes. Can test Opus later if needed.

Cost Optimization

Haiku is significantly cheaper per token than larger models. Important for a bot running continuous gameplay.

Decision Framework

Use Fast Models (Haiku):

  • • Real-time games and interactive apps
  • • High-volume, low-latency use cases
  • • Simple tool-calling workflows

Use Smart Models (Opus/GPT-4):

  • • Complex reasoning tasks
  • • Batch/offline processing
  • • Multi-step analysis workflows

Prompt Engineering in 3D Space

Building in Minecraft is hard. The bot needs to understand 3D coordinate systems, block types, adjacency rules, and spatial relationships. Without careful prompt engineering, "it goes bananas and builds just nonsense."

Because if we didn't do that it goes bananas and builds just nonsense.

On why strict prompt engineering is required for the build action

Watch (00:16:37)

The Build Prompt Strategy

  • • Coordinate System: Define X, Y, Z axes explicitly
  • • Block Types: List valid Minecraft blocks (oak_planks, stone, etc.)
  • • JSON Schema: Include example of expected output format
  • • Adjacency Rules: Specify which blocks can touch
  • • Structure Template: Provide examples of valid builds

Natural Language → 3D Structure

Input: "Build a double-decker couch"

LLM Reasoning (spatial planning)

Output: JSON coordinates + block types

The transformation requires the LLM to understand 3D space and output precise coordinates.

Key Takeaways for AI Engineers

1. Emergent Behavior is Real

Well-designed agent systems can solve unanticipated problems. Rocky digging out of a hole wasn't programmed—it emerged from the interaction of reasoning, tools, and environment.

Holy Grail of AI

2. Return of Control is Essential

Production agents need structured outputs, not freeform text. ROC forces parseable JSON that your code can execute programmatically.

Production Pattern

3. Speed Beats Intelligence for Real-Time

Claude Haiku was chosen for latency, not capability. For interactive applications, response time is more important than reasoning power.

Model Selection

4. Managed Services Reduce Complexity

LangChain became unwieldy as tools increased. Bedrock Agents provides managed orchestration, tracing, and RAG integration.

Architecture Lesson

5. Stateful Components Need Containers

Don't force serverless where it doesn't fit. Mineflayer requires persistent connections, so ECS containers replace Lambda.

Infrastructure Pattern

6. Prompt Engineering for Physical Spaces

3D environments require strict constraints: coordinate systems, block types, adjacency rules, and JSON schemas. Without these, agents produce nonsense.

Prompt Strategy

Source Video

Claude plays Minecraft!

AWS Solutions Architect • Serverless Conference

Video ID: 1B9i7FBsRVQDuration: ~18 minutes
Watch on YouTube

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using multi-agent transcript analysis with transcript-analyzer, highlight-extractor, fact-checker, and content-strategist agents.

Technologies Mentioned: Amazon Bedrock, Claude Haiku (Anthropic), Mineflayer, LangChain, AWS ECS, AWS Lambda, Amazon SageMaker, AWS CDK, CloudFormation

Research sourced from AWS Serverless Conference transcript. Analysis conducted using dedicated agents for transcript analysis, highlight extraction, fact-checking, and content strategy. All quotes verified against original VTT file.