Claude plays Minecraft! Emergent Behavior in Real-Time AI Agents
An AWS engineer builds Rocky, a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents. Watch emergent behavior unfold live as the agent digs, builds, and surprises everyone with unexpected problem-solving.
They say you should never live demo with kids, animals and an llm so I'm going to do that.
— AWS Engineer • 2:37
Live Demo
Real-time gameplay
Model Choice
Speed over capabilities
Architecture
Managed workflow
Emergent
Unexpected behavior
Why This Talk Matters
The Real-World Agent Challenge
Most agent demos are carefully rehearsed or heavily edited. This talk does the opposite: a live demonstration where anything can happen. Rocky the Minecraft bot behaves in ways the engineer didn't program—showcasing both the promise and unpredictability of agentic AI systems.
Live Demo Risks
- LLMs are non-deterministic by nature
- Real-time games require fast responses
- Anything can go wrong in front of hundreds
- Emergent behavior can be surprising
Why Minecraft?
Minecraft is the perfect testbed for AI agents: rich action space (build, dig, move), clear goals, visual feedback, chat interface for inputs, and a supportive community for testing.
"This is behavior that we didn't expect and it just just works which is um really fascinating"
— AWS Engineer, on Rocky digging out of a hole without being programmed to
6:14Meet Rocky: The Minecraft Bot
Rocky: Gender-Neutral Minecraft Agent
Rocky is a Minecraft bot built with Claude Haiku, Amazon Bedrock Agents, and the Mineflayer framework. The bot responds to chat commands, digs holes, finds players and entities, and can even build structures—all in real-time.
The Speaker
AWS engineer from Australia, not a Python developer. Built Rocky as a side project to learn agent engineering and demonstrate Bedrock Agents capabilities.
I'm not a big python developer uh I am an engineer but not really with python
Amazon Web Services
Rocky showcases AWS's agentic AI stack: Bedrock for LLM orchestration, ECS for containerized game servers, and CloudFormation/CDK for infrastructure.
Architecture Evolution: From LangChain to Bedrock
"It got really really um complex and then we decided okay so that's not service enough uh let's use uh agents for Amazon Bedrock"
— AWS Engineer, explaining why they abandoned LangChain
7:21Failed Approach: LangChain on Lambda
- Started with LangChain (not a Python developer)
- Tried to run on AWS Lambda (serverless)
- Used Cohere LLMs hosted on SageMaker
- Got "really, really complex" with more tools
- State management became a nightmare
Working Solution: ECS + Bedrock Agents
Migrated to Amazon ECS for stateful game servers, switched to Claude Haiku for speed, and adopted Bedrock Agents for managed orchestration. All infrastructure as code with CloudFormation and CDK.
Agentic Workflow Fundamentals
Rocky follows the standard agent pattern: Minecraft chat provides input/unstructured data, Bedrock Agents orchestrates tools, Claude Haiku reasons about actions, and Mineflayer executes them in the game.
Flow:
Why Claude Haiku?
The speaker specifically chose Claude Haiku over more capable models because speed matters for real-time gameplay. Latency kills the experience when you're waiting for an agent to decide where to dig.
"Claud in particular Claude Haiku because it's it's fast"
— AWS Engineer
9:04Return of Control Pattern
Every action in Rocky's system has defined input parameters (e.g., depth/width for digging), JSON output schema for Mineflayer execution, and Return of Control back to the orchestrator. This feedback loop enables multi-step reasoning.
Available Actions:
- •
jump- Simple movement - •
move_to_position- Navigation - •
locate_player- Entity detection - •
locate_entity- Find objects/pigs - •
hit- Attack action - •
dig- Terrain modification (with params) - •
build- Construct structures (experimental)
Live Demo Highlights
The talk featured both prerecorded demos and a live demonstration of Rocky's capabilities, including the experimental "build" feature that turns natural language into 3D structures.
Rocky's Personality Emerges
Rocky is designed as playful and friendly. When finding players, Rocky says 'On my way!' and provides weather updates. This personality isn't hardcoded—it emerges from the system prompt.
System Prompt:
You're a playful friendly and creative Minecraft Agent called Rocky uh and your goal is the entertain players and collaborate with them in a fun gaming experience
Parameter Inference in Action
When asked to dig a "small hole," Rocky infers 1×1 dimensions. When asked for a "2 by 2 hole," Rocky uses explicit values. This natural language understanding happens automatically through Claude's reasoning.
Demo Moments:
- • 14:57 User: Rocky please dig a small hole
- • Rocky infers 1×1 dimensions from context
- • Executes dig action with parameters
The Emergent Behavior Moment
Rocky dug itself into a hole and then—without being explicitly programmed to—figured out how to dig its way out. This emergent problem-solving surprised everyone, including the engineer.
"Can come find us out of that hole and dig their way out of the hole uh this is behavior that we didn't expect"
— AWS Engineer
6:11Experimental Build Feature
The speaker attempted to spell "Coliseum" but changed to "double decker couch" instead. Rocky successfully built the 3D structure by translating natural language into Mineflayer JSON coordinates.
Build Prompt Engineering:
You are Claude, an expert Minecraft builder created by Anthropic. When given a structure description, output valid JSON.
"If we didn't do that it goes bananas and builds just nonsense"
— AWS Engineer, on why strict JSON rules are required
16:17Human Behavior Insights
Rocky has been demoed at conferences worldwide. The most common request? "Hit the pig." People consistently choose violence when given control of an AI agent—a fascinating commentary on human nature.
"Lots of people have observed do don't know why but hey human behavior is even more fascinating than LMS"
— AWS Engineer, on people asking Rocky to hit pigs
5:31Key Technical Insights
Speed Over Capabilities for Real-Time
Claude Haiku was chosen specifically for latency, not intelligence. In real-time games, every millisecond matters. A faster model beats a smarter one when the user is waiting for a response.
Stateful Components Need Containers
Lambda failed because Minecraft requires persistent state. ECS containers provide the memory and connection continuity that serverless functions can't match.
Managed Services Reduce Complexity
Bedrock Agents eliminated the orchestration complexity that made LangChain unwieldy. RAG, agents, and guardrails all in one place—no need to build custom workflow engines.
Prompt Engineering is Essential
Even Claude needs guardrails. The build feature initially "went bananas" with hallucinated structures until strict JSON rules and examples were added to the system prompt.
Emergent Behavior: Feature, Not Bug
Rocky digging out of a hole wasn't programmed—it emerged from the interaction between the agent's tools, goals, and environment. This is both the promise and the challenge of agentic AI.
Lesson: Design agent systems with enough flexibility to surprise you, but enough constraints to remain safe.
"Think of it more as a sort of a a managed agentic workflow right so you can manage um Rag and you manage agents as well so it's all in one spot"
— AWS Engineer, explaining Amazon Bedrock Agents
9:31Key Takeaways
Building Production AI Agents
- •Speed Matters: For real-time applications, choose the fastest model that can do the job. Claude Haiku over Opus for games.
- •Managed Over Custom: Don't build orchestration from scratch. Use Bedrock Agents, LangSmith, or other managed services.
- •Stateful Requires Containers: Serverless can't handle persistent connections. Use ECS or Kubernetes for game-like applications.
- •Define Tools Explicitly: Every action needs clear input/output schemas. Parameter inference is better than explicit values.
- •Return of Control: Always design feedback loops. The orchestrator needs action results to make decisions.
- •Prompt Engineering Never Ends: Even great models need guardrails. Test, iterate, and add constraints as needed.
- •Embrace Emergence: Agents will surprise you. Design systems that can learn from unexpected behaviors.
- •Human-in-the-Loop: Know when to let humans intervene. Rocky's build feature is experimental for a reason.
- •Infrastructure as Code: Use CloudFormation, CDK, or Terraform. Reproducible deployments are non-negotiable.
- •Test in Production Carefully: Live demos are risky. Have backups, record everything, and embrace failure when it happens.
"The biggest thing we've seen is people try to hit the pig"
— AWS Engineer, on what users do with Rocky
5:21Watch the Full Talk
Related Resources
AWS AI Services
Research Notes & Methodology
This highlight page is based on a comprehensive analysis of the complete VTT transcript from the AI Engineer Summit 2024. The talk featured a live demonstration of Rocky the Minecraft bot, including emergent behavior, architectural evolution insights, and real-time agent gameplay.
- • Full VTT transcript (3,185 lines)
- • Complete talk recording (~18 minutes)
- • Live demo with real-time gameplay
- • AWS Bedrock and Anthropic documentation
- • Complete transcript analysis
- • Quote extraction with timestamps
- • Technical architecture verification
- • Cross-reference with official docs
Video: Claude plays Minecraft! by AWS Engineer
Event: AI Engineer Summit 2024 • Published: October 29, 2024