AI Engineer Insights

AI Engineering Highlights

Comprehensive analysis and insights from the AI Engineer Summit. Deep dives into the latest trends, technologies, and thought leadership in AI engineering.

In-Depth Analyses

Comprehensive AI engineering analysis

Expert Speakers

100

Industry leaders and practitioners

Videos Analyzed

461+

Conference talks and presentations

Fact-Checked

100%

Verified and validated insights

Featured Topics

Vibe Coding

Multi-Agent Systems

AI Infrastructure

Developer Experience

Enterprise AI

LLM Applications

All Highlights

95 total

Devin 2.0 and Moore's Law for AI Agents: 70-Day Doubling Cycle

Scott Wu from Cognition presents Moore's Law for AI Agents - capabilities double every 70 days (16-64x annually). From tab completion to autonomous engineers in 18 months. Learn the 5-tier evolution framework: migrations → bug fixes → complex debugging → project autonomy, plus technical infrastructure including Playbooks, Deep Wiki, With Search, and integrations with Linear, Jira, and Slack.

Scott Wu

•AI Engineer Summit•

Jun 15, 2025

Devin-2

Scott-Wu

Cognition

Moores-Law-AI

+19 more

Small Bets, Big Impact: Building GenBI at Northwestern Mutual

Asaf Bord shares how a 160-year-old Fortune 100 insurance company built GenBI using 6-week sprints, continuous plug-pulling rights, and incremental value delivery. Learn the crawl-walk-run adoption strategy, why 80% of BI work is report routing (not SQL generation), and the honest assessment that executive-ready AI may never arrive.

Asaf Bord

•AI Engineer Summit•

Oct 29, 2024

GenBI

Northwestern-Mutual

enterprise-AI

incremental-delivery

+16 more

7 Habits of Highly Effective GenAI Evaluations: AWS Framework for Production AI

Justin Muller, Principal Applied AI Architect at AWS, reveals the battle-tested 7 Habits framework that transformed document processing from 22% to 92% accuracy in 6 months. Learn why evals are the missing piece to scaling GenAI, the 30-second rule for rapid iteration, and how to build evaluation systems that enable production deployment with real-world case studies and practical implementation guidance.

Justin Muller

•AI Engineer Summit•

Oct 29, 2024

GenAI-evaluations

LLM-evaluation

AWS

prompt-decomposition

+16 more

Form Factors for Your New AI Coworkers: A Design Framework

Craig Wattrus from Flatfile presents a four-form-factor framework for AI coworkers: Invisible, Ambient, Inline, and Conversational. Learn why traditional design processes fail with LLMs and how playful experimentation leads to better AI products through "feeling the material" and character coaching over control.

Craig Wattrus

•AI Engineer Summit•

Oct 29, 2024

AI-coworkers

form-factors

Flatfile

UX-design

+12 more

How Bolt.new Scaled $0-20M ARR in 60 Days with 15 People

Eric Simons shares how Bolt.new went from near-shutdown to $20M+ ARR in 60 days with just 15 people. Learn the Spartan mentality, community strategies, AI-powered support, and team culture that made it possible.

Eric Simons

•AI Engineer Summit•

Oct 29, 2024

Bolt.new

StackBlitz

$20M-ARR

Spartan-mentality

+10 more

Llama 3 at 1,000 tokens/s on SambaNova AI Platform

Full workshop on achieving unprecedented Llama 3 inference speeds of 1,000 tokens/second using SambaNova's Composition of Experts architecture and custom RDU hardware. Includes hands-on RAG implementation with LlamaIndex, ChromaDB, and performance benchmarks (16 chips vs 576, full precision).

Relle, Pedro

•AI Engineer Summit 2024•

Oct 29, 2024

Llama-3

SambaNova

RDU

inference-optimization

+14 more

Why Cisco Ditched RAG for Fine-Tuning in Production AI Agents

Ola Mabadeje from Cisco's Outshift group reveals how they built a 5-agent system for network change management, why fine-tuning beat RAG for knowledge graph queries (drastic token reduction), and how knowledge graphs serve as digital twins for safe testing before production. Complete architecture with real quotes.

Ola Mabadeje

•AI Engineer Conference 2025•

Dec 30, 2025

multi-agent-AI

Cisco

fine-tuning-vs-RAG

knowledge-graphs

+15 more

Ship Agents that Ship: Building Production AI Agents with Guardrails

Kyle Penfound and Jeremy Adams from Dagger demonstrate building production-ready AI agents in a hands-on workshop. Learn guardrails, container-native development, GitHub integration, function calling patterns, and practical production-ready agent architecture.

Kyle Penfound, Jeremy Adams

•AI Engineer Summit 2024•

Oct 29, 2024

AI-agents

Dagger

container-native

agent-guardrails

+12 more

From Arc to Dia: How The Browser Company Built AI-Native Tools

Samir Mody from The Browser Company shares lessons from building Arc and Dia. Learn how they achieved 10x iteration speed, built Jeba for automated prompt optimization, embraced non-engineers as AI developers, and navigated security challenges in AI browsers.

Samir Mody

•AI Engineer Conference 2024•

Oct 29, 2024

The-Browser-Company

Arc-browser

Dia-browser

AI-browsers

+10 more

Why Bolt.new Won and Most DevTools AI Pivots Failed

From Death's Door to $100M ARR: The Three-Step Framework, Anti-Patterns to Avoid, and How to Create Categories Instead of Features

Victoria Melnikova

•AI Engineer Conference 2025•

Dec 30, 2025

Bolt.new

StackBlitz

AI-pivot

startup-turnaround

+10 more

Rust is the Language of AGI: Why AI Prefers Rust Over Python

Michael Yuan explains why Rust is the perfect language for AI-generated code. Learn how the Rust compiler serves as a reward function for AI, the MCP tools that automate code generation, and the path to AGI through verifiable code with 1,000+ developers already using Rust Coder in production.

Michael Yuan

•AI Engineer Conference•

Oct 29, 2024

Rust

AGI

AI-code-generation

MCP

+15 more

Code World Model: Building World Models for Computation

Jacob Kahn from Meta FAIR presents Code World Model (CWM) - a 32B parameter model that explicitly models program execution dynamics rather than just syntax. Learn how execution tracing, asynchronous RL with mid-trajectory updates, and "neural debugging" enable AI to simulate code execution without running it, effectively approximating solutions to the halting problem.

Jacob Kahn

•AI Engineer Conference•

Oct 29, 2024

world-models

execution-tracing

Meta-FAIR

neural-debugging

+18 more

tldraw.computer: The Visual AI Language That Executes Like Code

Steve Ruiz demos tldraw.computer - a visual programming language where LLMs execute through graph-based nodes. See multimodal computing, self-scripting nodes, and AI as collaborator returning structured shapes, not pixels.

Steve Ruiz

•AI Engineer Summit 2024•

Oct 29, 2024

tldraw

visual-programming

AI-execution

multimodal

+11 more

Agent Reinforcement Fine Tuning: OpenAI's Breakthrough in Training AI Agents

Will Hang and Cathy Zhou from OpenAI introduce Agent RFT - the first time models can interact with the outside world during training. Learn how Cognition, Cosine, and Qodo achieved dramatic improvements with as few as 10 examples, the four success principles, and why parallel tool calling reduces latency from 8-10 steps to 4.

Will Hang, Cathy Zhou

•AI Engineer Summit•

Oct 29, 2024

Agent-RFT

OpenAI

Will-Hang

Cathy-Zhou

+21 more

AI Copilots for Tech Architecture: The Highest-ROI Use Case

Why Architecture Copilots Deliver Higher ROI Than Coding Copilots: Preventing Costly Mistakes, Justifying Nine-Figure Infrastructure Spends, and Enabling Safe Delegation to Developers

Boris Bogatin

•AI Engineer Conference•

Dec 30, 2025

AI-architecture

tech-architecture

architecture-copilot

enterprise-AI

+9 more

Claude plays Minecraft! Emergent AI Behavior & Agent Engineering

AWS engineer demonstrates building Rocky, a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents. Live demo with emergent behavior, real-time gameplay, and practical lessons on architecture evolution from LangChain to Bedrock.

AWS Engineer

•AI Engineer Summit 2024•

Oct 29, 2024

AI-agents

Claude-Haiku

Amazon-Bedrock

Minecraft

+12 more

AI Kernel Generation: What's Working, What's Not, What's Next

Natalie Serrino from Gimlet Labs on AI-Driven GPU Optimization: 25-70% Speedups, Agentic Synthesis Swarm, Hardware-in-the-Loop Verification, and the Path Forward for PTX Generation and Formal Verification

Natalie Serrino

•AI Engineer Conference•

Dec 30, 2025

AI-kernel-generation

GPU-optimization

Gimlet-Labs

heterogeneous-compute

+10 more

Giving a Voice to AI Agents: Voice AI 2.0, Contextual AI, and <500ms Latency

Scott Stephenson, CEO at Deepgram, explains the evolution from Siri-era Voice AI 1.0 to LLM-powered Voice AI 2.0, the Intelligence Revolution timeline (25-30 years), accuracy improvements (75% to 90%+), latency breakthroughs (2-5s to 100-200ms), and how contextual AI传递es conversation context to enable human-like voice interactions with <500ms roundtrip latency

Scott Stephenson

•AI Engineer Conference•

Oct 29, 2024

Voice-AI-2.0

Deepgram

Scott-Stephenson

contextual-AI

+13 more

Efficient Reinforcement Learning: Asynchronous Pipeline RL & GPU Optimization

Rhythm Garg and Linden Li from Applied Compute present efficient RL systems for enterprise applications. Learn about asynchronous vs synchronous RL, GPU utilization optimization, staleness trade-offs, system-level modeling, and first-principles optimization for end-to-end performance.

Rhythm Garg, Linden Li

•AI Engineer Summit•

Oct 29, 2024

reinforcement-learning

asynchronous-RL

pipeline-RL

+11 more

A Year of Gemini Progress + What's Next: 50x Growth, Universal Assistant, and Agentic AI

Logan Kilpatrick from Google DeepMind recaps a transformative year — 10 years of progress in 12 months, 50x inference growth, Gemini 2.5 Pro final update, organizational evolution, and what's next for universal assistant vision, omnimodal models, agentic AI, and developer platform expansions (Embeddings API, Deep Research API, Veo 3, Imagen 4, AI Studio repositioning).

Logan Kilpatrick

•AI Education Summit•

Oct 29, 2024

Gemini-2.5-Pro

Google-DeepMind

Logan-Kilpatrick

universal-assistant

+17 more

Top Ten Challenges to Reach AGI

Stephen Chin and Andreas Kollegger explore the fundamental obstacles to AGI through science fiction memes—from Memento's memory problem to The Matrix's simulation control. A concise 4-minute lightning talk covering memory limitations, alignment problems, transparency issues, and the ultimate question: do we know what to ask AGI?

Stephen Chin, Andreas Kollegger (ABK)

•AI Engineer World's Fair•

Oct 29, 2024

AGI

science-fiction

AI-safety

alignment-problem

+10 more

Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free

Nathan Lambert's Four-Pillar Framework: Skills, Calibration, Strategy, Abstraction—10-100x Token Waste Problem, Post-Training RL Compute Revolution (1% → 10%+)

Nathan Lambert

•AI Engineer Conference•

Dec 30, 2025

Nathan-Lambert

AI-reasoning

post-training-RL

calibration

+12 more

2025: LLMs, Pelicans & Bicycles

•

Jan 1, 2024

Why Agent Engineering

swyx Landmark Keynote: Why 2025 is the Year of Agents - 6 Enabling Factors, Agent Definitions, PMF Use Cases, ChatGPT Growth Analysis, and the Evolution of AI Engineering as a Discipline

swyx (Shawn Wang)

•AI Engineer Summit 2025•

Dec 30, 2025

agent-engineering

swyx

AI-Engineer-Summit

agents-2025

+10 more

Latent Space Paper Club: DeepSeek R1/V3 and Test Time Compute

8B = 235B Distillation Breakthrough, Doubled Reasoning Tokens, and the New Scaling Paradigm from Chinchilla to Inference

Vibhu Sapra

•AI Engineer World's Fair•

Oct 29, 2024

DeepSeek-R1

DeepSeek-V3

test-time-compute

model-distillation

+13 more

Agentic GraphRAG: AI's Logical Edge

Neo4j MCP Tools, GraphRAG Architecture, and Enterprise Case Study with 85% Adoption

Stephen Chin

•AI Engineer Conference•

Dec 30, 2024

GraphRAG

Neo4j

knowledge-graphs

MCP

+11 more

Anchoring Enterprise GenAI with Knowledge Graphs: 75% Faster Onboarding

Pfizer & Neo4j Case Study: How GraphRAG Achieved 3 Months → 3 Weeks with Knowledge Graphs. Real Enterprise Lessons on Technology Transfer, Workforce Knowledge Retention Crisis (20 Years → 3 Years Tenure), and Navigating Organizational Politics.

Jonathan Lowe, Stephen Chin

•AI Engineer Summit•

Oct 29, 2024

GraphRAG

Neo4j

Pfizer

enterprise-ai

+18 more

AI Engineering at Jane Street - Building AI Tools in OCaml

John Crepezzi shares how Jane Street builds custom AI infrastructure when off-the-shelf tools won't work. Learn workspace snapshotting, Code Evaluation Service (CES) running 50-100x faster than builds, Aid sidecar architecture for multi-editor support, and why they have more OCaml code than exists publicly worldwide.

John Crepezzi

•AI Engineer Summit 2024•

Oct 29, 2024

Jane-Street

OCaml

AI-engineering

workspace-snapshotting

+13 more

AI Agents, Meet Test Driven Development

Why TDD is Critical for Reliable AI Agents: L0-L4 Agentic Workflow Framework, Evaluation Loops, and SEO Agent Demo with 60% Performance Improvement

Anita

•AI Engineers Conference•

Dec 30, 2024

test-driven-development

TDD

AI-agents

agentic-workflows

+12 more

2025 is the Year of Evals!

Why AI Evaluation Finally Breaks Through: Three Converging Forces, C-Suite Alignment & Market Validation

John Dickerson

•AI Engineer Conference•

Dec 30, 2024

AI-evaluation

ML-monitoring

agentic-systems

enterprise-AI

+11 more

2026: The Year The IDE Died

Why Vibe Coding Will Transform Software Development

Steve Yegge, Gene Kim

•AI Engineer Summit 2024•

Oct 29, 2024

vibe-coding

ide

future-of-work

+1 more

The Price of Intelligence: AI Agent Pricing in 2025

Comprehensive analysis of AI agent pricing models, cost structures, and the economics of intelligence — outcome-based pricing, prepaid credits, cost optimization strategies, and 2025 predictions from 13+ real company examples

Chz

•AI Engineer Conference 2024•

Dec 30, 2024

ai-agent-pricing

cost-of-intelligence

token-economics

model-costs

+13 more

BlackRock: 8 Months → 2 Days

How to Build Custom Knowledge Apps at Scale - Human-in-the-Loop Design, LLM Strategies, and Why Autonomous Agents Don't Work in Finance

Infant Vasanth, Vaibhav Page

•AI Engineer Summit 2024•

Oct 29, 2024

enterprise

financial-services

document-processing

human-in-the-loop

+5 more

How to Build World-Class AI Products

Sarah Sachs (AI Lead, Notion) and Carlos Esteban (Braintrust) share their evaluation-first approach to building AI products. Learn why Notion spends 90% of time on evaluation and 10% on prompting, with practical guidance on data management, trace-based debugging, user feedback analysis, multi-turn conversation evaluation, and production monitoring with online scoring.

Sarah Sachs, Carlos Esteban

•AI Engineer Conference•

Oct 29, 2024

AI-product-development

Notion

Braintrust

evaluation-framework

+10 more

Five Hard Earned Lessons About Evals: Why Braintrust Ships 2 Weeks After New Model Releases

Ankur Goyal (CEO, Braintrust) shares hard-earned lessons: YAML vs JSON (15% token savings), GPT-4o 10% → Claude 4 Sonnet viable (6x better), Notion's 24-hour model integration, continuous reconciliation, and why great evals must be engineered like any other software system.

Ankur Goyal

•AI Engineer Summit•

Oct 29, 2024

AI-evaluation

Braintrust

Ankur-Goyal

YAML-vs-JSON

+11 more

12-Factor Agents: Building Reliable LLM Applications

Production AI Methodology from HumanLayer - Transform Unreliable Demos into Dependable Systems

Dex Horthy

•AI Engineer Summit 2024•

Nov 15, 2024

12-factor-agents

production-AI

agent-frameworks

software-engineering

+3 more

3 Ingredients for Building Reliable Enterprise Agents

The Mathematical Formula for Agent Success: P(success) × Value - Cost(failure) > Cost(running)

Harrison Chase

•AI Engineer Summit 2024•

Nov 15, 2024

agents

enterprise

reliability

human-in-the-loop

+6 more

Agents are Robots Too

What Self-Driving Taught Me About Building Agents: Agentics, 1% vs 99% Problem, and Closed-Loop Systems

Jesse Hu

•AI Engineer Summit•

Jan 1, 2024

robotics

self-driving

agents

Agentics

+8 more

AI Code Quality: Hype vs Reality

•

Jan 1, 2024

AI Consulting in Practice

•

Jan 1, 2024

AI Music Generation: From Prompt to Production

Hands-on workshop exploring AI music generation tools (Udio, Suno, Stable Audio), voice cloning (RVC), stem separation (Wave, UVR5), and the RIAA legal battle. Learn practical workflows for generating professional-quality music from text prompts.

Phlo Young

•AI Engineer World's Fair•

Dec 30, 2024

AI-music

Udio

Suno

Stable-Audio

+17 more

AI + Security & Safety: Why Your Agent Can't Go to Production

The Single-Process Security Flaw, Real-World Production Blocker, and Three-Layer Defense Framework from Apache Ranger's Creator - Don Bosco Durai, Priv

Don Bosco Durai

•AI Engineer Summit•

Dec 30, 2024

AI-security

agent-security

zero-trust

enterprise-AI

+11 more

Vibes Won't Cut It

Production Reality vs. AI Hype in Software Engineering - Why Professional Engineers Are Skeptical and What Actually Works

Chris Kelly

•AI Engineer Conference•

Oct 29, 2024

vibe-coding

production-engineering

AI-coding-skepticism

software-engineering

+7 more

MongoDB Atlas Vector Search: RAG Without the Complexity

Unified Platform: HNSW Algorithm, Search Nodes for Independent Scaling, Framework Integrations (LangChain, LlamaIndex), and Production-Ready RAG with 4,096 Dimensions Support

Ben Flast

•AI Engineer Conference•

Dec 30, 2024

MongoDB

RAG

vector-search

HNSW

+12 more

Building in the Gemini Era

Google DeepMind's Vision for AI-Assisted Development

Kat Kampf, Ammaar Reshi

•AI Engineer Summit•

Jan 1, 2024

gemini-3-pro

ai-studio

vibe-coding

image-generation

+3 more

#define AI Engineer: Technical Humility & Research-Engineering Symbiosis

Greg Brockman (OpenAI President) & Jensen Huang (NVIDIA CEO) on the evolution from AlexNet to 100K GPU clusters, why technical humility matters, and the future of domain-specific AI agents. "If you don't have the idea, you're dead in the water. But if you don't have the engineering, that idea is not going to live and see the light of day." Learn about the 3-phase evolution of AI engineering at OpenAI, the cultural divide between engineers and researchers, and predictions for AGI-era development workflows.

Greg Brockman, Jensen Huang

•AI Engineer Summit•

Nov 15, 2024

Greg-Brockman

Jensen-Huang

OpenAI

NVIDIA

+12 more

The Next Unicorns: 7 Top AI Startups from HF0 Residency

Real Revenue, Validated Models: 25M Users, $100M ARR Across Portfolio. Meet Krea, OpenHome, Koframe, Federous AI, Upside, OpenAudio, Glow, Favored, and OpenRouter.

Diego Rodriguez, Sua, Josh, Eugene, Jonas, David Vorick, David, Alex Atala

•HF0 Residency Demo Day•

Dec 30, 2024

HF0

AI-startups

unicorns

venture-capital

+13 more

The AI Developer Experience Doesn't Have to Suck

Why and How Modal Rebuilt Cloud Infrastructure from Scratch

Eric Bernhardson

•AI Engineer Summit•

Jan 1, 2024

serverless-gpu

container-cold-start

memory-snapshotting

modal

+2 more

AI Native Company

•

Jan 1, 2024

AMP Code: Next Generation AI Coding

•

Jan 1, 2024

What Data from 20m Pull Requests Reveal About AI Transformation

Jellyfish Research: 2x Throughput, 24% Faster Cycles, and the Architecture Correlation That Determines AI Success (4x vs 0x Gains)

Nicholas Arcolano

•AI Engineer Conference (2024)•

Dec 30, 2024

AI-transformation

Jellyfish

pull-request-analytics

GitHub-Copilot

+10 more

Shipping AI That Works: An Evaluation Framework for PMs

LLM-as-Judge Methodology with 4 Components: Role, Task, Context, Goal. Why Even OpenAI and Anthropic CPOs Say Models Hallucinate. Transition from Vibe Coding to Thrive Coding in Production.

Aman Khan

•AI Engineer Conference•

Oct 29, 2024

AI-evaluation

LLM-as-judge

vibe-coding

thrive-coding

+14 more

Architecting Agent Memory: Principles, Patterns, and Best Practices

MongoDB's Guide to Building Stateful AI Agents: Memory Management Lifecycle, Four Memory Types, and Voyage AI Integration

Richmond Alake

•AI Engineer Conference•

Dec 30, 2024

agent-memory

MongoDB

vector-search

RAG

+10 more

Autonomy Is All You Need

How Replit Broke the One-Hour Autonomy Barrier for Non-Technical Users

Michele Catasta

•AI Engineer Summit•

Oct 29, 2024

autonomy

multi-hour-agents

Replit

+5 more

Claude Code Evolution

•

Jan 1, 2024

Claude plays Minecraft!: When AI Spontaneously Emerges Unexpected Behavior

AWS Solutions Architect's live demo of Rocky, a Minecraft bot powered by Claude Haiku and AWS Bedrock, showcasing emergent AI behaviors including autonomous escape from a hole, 3D spatial reasoning, and the critical Return of Control pattern for production agents

AWS Solutions Architect

•AI Engineer Summit•

Oct 29, 2024

emergent-behavior

Claude-Haiku

AWS-Bedrock

Minecraft

+11 more

Compilers Age of LLMs

•

Jan 1, 2024

Continual System Prompt Learning for Code Agents

5-15% Improvement with Only 150 Examples - A Practical Alternative to Reinforcement Learning

Aparna Dhinakaran

•AI Engineer Summit 2024•

Dec 29, 2024

system-prompt-learning

code-agents

LLM-optimization

SWE-bench

+4 more

Building Cursor Composer: Fast, Smart, and Parallel

Lee Robinson reveals how Cursor built their first agent model with 4x token efficiency, parallel tool calling breakthrough, and RL infrastructure secrets. Learn about the 3.5x Blackwell speedup, semantic search impact, and vertical integration advantages.

Lee Robinson

•AI Engineer Summit 2024•

Oct 29, 2024

Cursor

Cursor-Composer

Lee-Robinson

parallel-tool-calling

+15 more

Developing Taste in Coding Agents

Meta Neuro-Symbolic RL: 10x PR Increase with Acquired Taste

Ahmad Awais

•AI Engineer Summit•

Dec 29, 2024

meta-neuro-symbolic-rl

taste-models

acquired-taste

reinforcement-learning

+3 more

Devin 2.0 and Moore's Law for AI Agents

Scott Wu's Framework: 70-Day Doubling Cycle - From Tab Completion to Autonomous Engineers in 18 Months. Deep Wiki, Automated Testing, Backlog Processing, and the Future of Software Engineering

Scott Wu

•AI Engineer Summit•

Oct 29, 2024

Devin-2

Scott-Wu

Cognition-AI

Moore's-Law-AI

+13 more

The DevOps Engineer Who Never Sleeps: AI Agents at Datadog

Diamond Bishop from Datadog shares what they learned building AI agents that automate on-call duties, handle incident response, and transform DevOps. Covers evaluation strategies, team composition, LLM observability, and predictions about agents surpassing humans as SaaS users.

Diamond Bishop

•AI Engineer Summit 2024•

Oct 29, 2024

AI-agents

Datadog

on-call-automation

LLM-observability

+13 more

Don't Build Agents, Build Skills

•

Jan 1, 2024

Enterprise Deep Research

•

Jan 1, 2024

Finetuning 500M Agents

•

Jan 1, 2024

Future-Proof Coding Agents

OpenAI Guide to Building AI That Writes Code and Survives Model Evolution

Bill Chen, Brian Fioca

•AI Engineer Summit•

Jan 1, 2024

coding-agents

openai-codeex

ai-harness

model-evolution

+2 more

Good Design Hasn't Changed With AI

•

Jan 1, 2024

Hard-Won Lessons: Cline

•

Jan 1, 2024

How Claude Code Works

Architecture Deep Dive: Flexible Loops, Skills System & Comparison with Cursor, AMP, and OpenAI Codex

Jared Zoneraich

•AI Engineer Summit•

Nov 15, 2024

claude-code

agent-architecture

skills-system

flexible-loops

+4 more

How to Look at Your Data

A Practical Guide to Evaluating RAG Systems: Fast Evals, Cluster Analysis, and Data-Driven Decision Making

Jeff Huber, Jason Liu

•AI Engineer Summit 2025•

Dec 30, 2025

RAG-evaluation

fast-evals

cluster-analysis

data-driven-AI

+7 more

Your Personal Open-Source Humanoid Robot for $8,999

How K-Scale Labs Built a $9k Open-Source Humanoid in 5 Months: Sim-to-Real RL, Python SDK, and Democratizing Robotics

JX Mo

•AI Engineer Summit•

Jan 1, 2024

humanoid-robots

open-source

robotics

+4 more

The Infinite Software Crisis: When AI Generates Faster Than We Understand

Jake Nations from Netflix argues that while AI has dramatically accelerated code generation, it has created a dangerous gap between what we can produce and what we can understand. He presents "context compression" as a three-phase solution (Research, Planning, Implementation) to maintain control over complex systems.

Jake Nations

•AI Engineer Summit 2024•

Oct 29, 2024

software-complexity

context-compression

Netflix

AI-code-generation

+11 more

Luminal Python Compiler

•

Jan 1, 2024

How Search Conquered Compiler Complexity

Luminal AI Automatically Rediscovered Flash Attention Using Search-Based Compilation—12 Primitives, E-Graphs, and 3M → 5K Lines of Code

Joe Fioti

•AI Engineer Summit 2024•

Oct 29, 2024

search-based-compilation

deep-learning-compilers

Flash-Attention

e-graphs

+8 more

Making Codebases Agent Ready

Organizational Readiness for Autonomous AI Development

Eno Reyes

•AI Engineer Summit 2024•

Jan 1, 2024

ai-agents

codebase-readiness

verification

autonomous-development

+1 more

Production Software Keeps Breaking and It Will Only Get Worse

Why AI Writes Code Faster But Debugging Gets Harder - Three-Part Framework: Causal ML + LLMs + Swarms of Agents. DigitalOcean Case Study: 40% MTTR Reduction

Anish Agarwal, Matt

•AI Engineer Conference•

Dec 30, 2024

production-reliability

AI-debugging

causal-ML

swarm-intelligence

+11 more

Paying Engineers Like Salespeople

•

Jan 1, 2024

Poolside's Path to AGI

Reinforcement Learning, Defense & Vertical Integration

Jason Warner, Eiso Kant

•AI Engineer Summit•

Jan 1, 2024

agi

reinforcement-learning

defense-ai

vertical-integration

+1 more

Reliable Enterprise Agents

•

Jan 1, 2024

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing

Neil Dwyer reveals how to achieve $1/hr voice AI costs using Orpheus model, LoRA fine-tuning, vLLM with FP8, and consistent hash load balancing

Neil Dwyer

•AI Engineer Summit•

Oct 29, 2024

voice-AI

Orpheus

LoRA

vLLM

+8 more

Skills vs Agents

•

Jan 1, 2024

The Unbearable Lightness of Agent Optimization

Why Your Agent Optimization is Failing (And How to Fix It)

Alberto Romero

•AI Engineer Summit•

Nov 15, 2024

agent-optimization

meta-ac

weak-reflector

production-ai

Unlocking AI Powered DevOps Within Your Organization

Practical Patterns from GitHub: Realistic Metrics (30% Average), IDE Integration Over Chat Tools, Human-in-the-Loop Autonomous Agents

Jon Peck

•AI Engineer Summit•

Dec 30, 2024

DevOps

AI-in-DevOps

GitHub-Copilot

IDE-integration

+5 more

Coding Evals: From Code Snippets to Codebases

How AI Code Evaluation Evolved from Single Functions to Hour-Long Challenges—and Why 30% is Reward Hacking

Naman Jain

•AI Engineer Summit•

Dec 30, 2024

coding-evals

reward-hacking

data-contamination

CodeBench

+5 more

ChatGPT is poorly designed. So I fixed it

Multimodal Voice + Text Integration Using GPT-4o Realtime API - "Shipping the Org Chart" Anti-Pattern and the FaceTime + iMessage Solution

•AI Engineer World's Fair•

Dec 30, 2024

UX-design

ChatGPT

GPT-4o-Realtime

multimodal-AI

+6 more

Building Conversational AI Agents

Multilingual Architecture with ElevenLabs: 31 Languages Now, 99 Coming in V3, $5M Voice Marketplace, Production-Grade Low-Latency Pipeline

Thor Schaeff

•API Day Singapore•

Dec 30, 2024

conversational-AI

multilingual-AI

voice-technology

ElevenLabs

+12 more

Code World Model: Building World Models for Computation

Jacob Kahn from FAIR Meta presents a revolutionary framework for understanding code through execution modeling. Learn how 32B parameter models trained on execution traces enable semantic understanding, neural debugging, and approximation of undecidable problems like the halting problem—all through bash-first async RL with mid-trajectory model updates.

Jacob Kahn

•AI Engineer Summit•

Oct 29, 2024

world-models

code-execution

FAIR-Meta

Jacob-Kahn

+12 more

The Cure for the Vibe Coding Hangover

Systematic Framework for Reliable AI-Augmented Development: 5-Step Planning Phase, Multi-Sensory Feedback Loop, Binary Dependencies, and Circular Resolution Strategies That Transform AI Agents from Erratic Novices into Predictable Implementation Partners

Corey J. Gallon

•AI Engineer Conference•

Dec 30, 2024

vibe-coding

AI-agents

software-architecture

dependency-management

+13 more

War on Slop

•

Jan 1, 2024

Enterprise Ready MCP: The Complete Guide

From Localhost to Production: Taking Model Context Protocol Servers from Demo to Enterprise Deployments - Security Challenges, Compliance Requirements, and Implementation Realities

Tobin South

•AI Engineer Summit•

Oct 29, 2024

MCP

Model-Context-Protocol

enterprise-AI

AI-security

+9 more

LinkedIn 360Brew: One Model to Replace All Recommendation Systems

How LinkedIn replaced dozens of specialized models with a single LLM—achieved 7x latency reduction and 30x throughput improvement through promptification and model distillation

Hamed, Maziar

•AI Engineer Summit•

Dec 30, 2024

360Brew

LLM

recommendation-systems

+12 more

Best Practices for Evaluating LLM Applications with llmeval

Niklas Nielsen from Log10 introduces llmeval - a command-line tool for reliable LLM evaluation built on Meta's Hydra. Learn about flexible test criteria for fuzzy model outputs, Python-based metrics, model-based evaluation with its pitfalls (self-preference bias, score inflation), and the innovative "Auto-John" concept for scaling human feedback through AI personas.

Niklas Nielsen

•AI Engineer Conference•

Oct 29, 2024

llmeval

Log10

Niklas-Nielsen

LLM-evaluation

+14 more

Netflix Foundation Model: One Model to Rule All Recommendations

How Netflix proved scaling laws apply to recommendation systems—applying LLM techniques like multi-token prediction, long-context training, semantic embeddings, and rich multi-task objectives to achieve infrastructure consolidation and quality improvements

Yesu Feng

•AI Engineer Summit•

Dec 30, 2025

Netflix

recommendation-systems

foundation-model

LLM-techniques

+12 more

Leadership in AI Assisted Engineering

Justin Reock shares aggregated data from 140,000 engineers revealing extreme variability in AI impact (+20% to -20%) and provides a leadership framework for successful AI adoption. Learn why writing code has never been the bottleneck, how Theory of Constraints applies to AI, and the 7 leadership principles that separate +20% outcomes from -20% outcomes.

Justin Reock

•AI Engineer Summit•

Oct 29, 2024

AI-leadership

Atlassian

software-engineering-productivity

+11 more

Government Agents: AI Agents Meet Tough Regulations

Mark Myshatyn from Los Alamos National Lab reveals how one of the most secure government organizations is deploying AI agents that design fusion capsules and execute code on HPC systems. Learn about 1000+ security controls, FedRAMP compliance, OpenAI models on classified networks, Venado supercomputer (2500+ GraceHopper nodes), and four architecture principles for government-ready AI. "We are not a t-shirt company...People can die if we do this wrong."

Mark Myshatyn

•AI Engineer Conference•

Oct 29, 2024

government-AI

Los-Alamos-National-Lab

AI-regulations

FedRAMP

+17 more

Explore More Research

Dive deeper into AI engineering with our comprehensive collection of research topics, case studies, and company analysis.