Live Demo • AI Engineer Summit

Claude plays Minecraft! Emergent Behavior in Real-Time AI Agents

An AWS engineer builds Rocky, a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents. Watch emergent behavior unfold live as the agent digs, builds, and surprises everyone with unexpected problem-solving.

They say you should never live demo with kids, animals and an llm so I'm going to do that.

— AWS Engineer • 2:37

18 min

Live Demo

Real-time gameplay

Haiku

Model Choice

Speed over capabilities

Bedrock

Architecture

Managed workflow

Emergent

Unexpected behavior

Why This Talk Matters

The Real-World Agent Challenge

Most agent demos are carefully rehearsed or heavily edited. This talk does the opposite: a live demonstration where anything can happen. Rocky the Minecraft bot behaves in ways the engineer didn't program—showcasing both the promise and unpredictability of agentic AI systems.

Live Demo Risks

  • LLMs are non-deterministic by nature
  • Real-time games require fast responses
  • Anything can go wrong in front of hundreds
  • Emergent behavior can be surprising

Why Minecraft?

Minecraft is the perfect testbed for AI agents: rich action space (build, dig, move), clear goals, visual feedback, chat interface for inputs, and a supportive community for testing.

"This is behavior that we didn't expect and it just just works which is um really fascinating"

AWS Engineer, on Rocky digging out of a hole without being programmed to

6:14

Meet Rocky: The Minecraft Bot

Rocky: Gender-Neutral Minecraft Agent

Rocky is a Minecraft bot built with Claude Haiku, Amazon Bedrock Agents, and the Mineflayer framework. The bot responds to chat commands, digs holes, finds players and entities, and can even build structures—all in real-time.

Claude Haiku
Chosen for speed, not capabilities
Bedrock Agents
Managed agentic workflow
Mineflayer
JavaScript bot framework

The Speaker

AWS engineer from Australia, not a Python developer. Built Rocky as a side project to learn agent engineering and demonstrate Bedrock Agents capabilities.

I'm not a big python developer uh I am an engineer but not really with python

AWS logo

Amazon Web Services

Rocky showcases AWS's agentic AI stack: Bedrock for LLM orchestration, ECS for containerized game servers, and CloudFormation/CDK for infrastructure.

Architecture Evolution: From LangChain to Bedrock

"It got really really um complex and then we decided okay so that's not service enough uh let's use uh agents for Amazon Bedrock"

AWS Engineer, explaining why they abandoned LangChain

7:21

Failed Approach: LangChain on Lambda

  • Started with LangChain (not a Python developer)
  • Tried to run on AWS Lambda (serverless)
  • Used Cohere LLMs hosted on SageMaker
  • Got "really, really complex" with more tools
  • State management became a nightmare

Working Solution: ECS + Bedrock Agents

Migrated to Amazon ECS for stateful game servers, switched to Claude Haiku for speed, and adopted Bedrock Agents for managed orchestration. All infrastructure as code with CloudFormation and CDK.

1

Agentic Workflow Fundamentals

Rocky follows the standard agent pattern: Minecraft chat provides input/unstructured data, Bedrock Agents orchestrates tools, Claude Haiku reasons about actions, and Mineflayer executes them in the game.

Flow:

Chat Input → Mineflayer → Bedrock Agents
Claude Haiku (Tool Selection)
Action Execution → Return of Control
Response to Minecraft
2

Why Claude Haiku?

The speaker specifically chose Claude Haiku over more capable models because speed matters for real-time gameplay. Latency kills the experience when you're waiting for an agent to decide where to dig.

"Claud in particular Claude Haiku because it's it's fast"

AWS Engineer

9:04
3

Return of Control Pattern

Every action in Rocky's system has defined input parameters (e.g., depth/width for digging), JSON output schema for Mineflayer execution, and Return of Control back to the orchestrator. This feedback loop enables multi-step reasoning.

Available Actions:

  • jump - Simple movement
  • move_to_position - Navigation
  • locate_player - Entity detection
  • locate_entity - Find objects/pigs
  • hit - Attack action
  • dig - Terrain modification (with params)
  • build - Construct structures (experimental)

Live Demo Highlights

The talk featured both prerecorded demos and a live demonstration of Rocky's capabilities, including the experimental "build" feature that turns natural language into 3D structures.

1

Rocky's Personality Emerges

Rocky is designed as playful and friendly. When finding players, Rocky says 'On my way!' and provides weather updates. This personality isn't hardcoded—it emerges from the system prompt.

System Prompt:

You're a playful friendly and creative Minecraft Agent called Rocky uh and your goal is the entertain players and collaborate with them in a fun gaming experience

2

Parameter Inference in Action

When asked to dig a "small hole," Rocky infers 1×1 dimensions. When asked for a "2 by 2 hole," Rocky uses explicit values. This natural language understanding happens automatically through Claude's reasoning.

Demo Moments:

  • 14:57 User: Rocky please dig a small hole
  • • Rocky infers 1×1 dimensions from context
  • • Executes dig action with parameters
3

The Emergent Behavior Moment

Rocky dug itself into a hole and then—without being explicitly programmed to—figured out how to dig its way out. This emergent problem-solving surprised everyone, including the engineer.

"Can come find us out of that hole and dig their way out of the hole uh this is behavior that we didn't expect"

AWS Engineer

6:11
4

Experimental Build Feature

The speaker attempted to spell "Coliseum" but changed to "double decker couch" instead. Rocky successfully built the 3D structure by translating natural language into Mineflayer JSON coordinates.

Build Prompt Engineering:

You are Claude, an expert Minecraft builder created by Anthropic. When given a structure description, output valid JSON.

"If we didn't do that it goes bananas and builds just nonsense"

AWS Engineer, on why strict JSON rules are required

16:17
5

Human Behavior Insights

Rocky has been demoed at conferences worldwide. The most common request? "Hit the pig." People consistently choose violence when given control of an AI agent—a fascinating commentary on human nature.

"Lots of people have observed do don't know why but hey human behavior is even more fascinating than LMS"

AWS Engineer, on people asking Rocky to hit pigs

5:31

Key Technical Insights

Speed Over Capabilities for Real-Time

Claude Haiku was chosen specifically for latency, not intelligence. In real-time games, every millisecond matters. A faster model beats a smarter one when the user is waiting for a response.

Stateful Components Need Containers

Lambda failed because Minecraft requires persistent state. ECS containers provide the memory and connection continuity that serverless functions can't match.

Managed Services Reduce Complexity

Bedrock Agents eliminated the orchestration complexity that made LangChain unwieldy. RAG, agents, and guardrails all in one place—no need to build custom workflow engines.

Prompt Engineering is Essential

Even Claude needs guardrails. The build feature initially "went bananas" with hallucinated structures until strict JSON rules and examples were added to the system prompt.

Emergent Behavior: Feature, Not Bug

Rocky digging out of a hole wasn't programmed—it emerged from the interaction between the agent's tools, goals, and environment. This is both the promise and the challenge of agentic AI.

Lesson: Design agent systems with enough flexibility to surprise you, but enough constraints to remain safe.

"Think of it more as a sort of a a managed agentic workflow right so you can manage um Rag and you manage agents as well so it's all in one spot"

AWS Engineer, explaining Amazon Bedrock Agents

9:31

Key Takeaways

Building Production AI Agents

  • Speed Matters: For real-time applications, choose the fastest model that can do the job. Claude Haiku over Opus for games.
  • Managed Over Custom: Don't build orchestration from scratch. Use Bedrock Agents, LangSmith, or other managed services.
  • Stateful Requires Containers: Serverless can't handle persistent connections. Use ECS or Kubernetes for game-like applications.
  • Define Tools Explicitly: Every action needs clear input/output schemas. Parameter inference is better than explicit values.
  • Return of Control: Always design feedback loops. The orchestrator needs action results to make decisions.
  • Prompt Engineering Never Ends: Even great models need guardrails. Test, iterate, and add constraints as needed.
  • Embrace Emergence: Agents will surprise you. Design systems that can learn from unexpected behaviors.
  • Human-in-the-Loop: Know when to let humans intervene. Rocky's build feature is experimental for a reason.
  • Infrastructure as Code: Use CloudFormation, CDK, or Terraform. Reproducible deployments are non-negotiable.
  • Test in Production Carefully: Live demos are risky. Have backups, record everything, and embrace failure when it happens.
"The biggest thing we've seen is people try to hit the pig"

AWS Engineer, on what users do with Rocky

5:21

Research Notes & Methodology

This highlight page is based on a comprehensive analysis of the complete VTT transcript from the AI Engineer Summit 2024. The talk featured a live demonstration of Rocky the Minecraft bot, including emergent behavior, architectural evolution insights, and real-time agent gameplay.

Source Material:
  • • Full VTT transcript (3,185 lines)
  • • Complete talk recording (~18 minutes)
  • • Live demo with real-time gameplay
  • • AWS Bedrock and Anthropic documentation
Analysis Method:
  • • Complete transcript analysis
  • • Quote extraction with timestamps
  • • Technical architecture verification
  • • Cross-reference with official docs

Video: Claude plays Minecraft! by AWS Engineer

Event: AI Engineer Summit 2024 • Published: October 29, 2024