AI Engineering Highlights
Comprehensive analysis and insights from the AI Engineer Summit. Deep dives into the latest trends, technologies, and thought leadership in AI engineering.
In-Depth Analyses
95
Comprehensive AI engineering analysis
Expert Speakers
100
Industry leaders and practitioners
Videos Analyzed
461+
Conference talks and presentations
Fact-Checked
100%
Verified and validated insights
Featured Topics
All Highlights
Devin 2.0 and Moore's Law for AI Agents: 70-Day Doubling Cycle
Scott Wu from Cognition presents Moore's Law for AI Agents - capabilities double every 70 days (16-64x annually). From tab completion to autonomous engineers in 18 months. Learn the 5-tier evolution framework: migrations → bug fixes → complex debugging → project autonomy, plus technical infrastructure including Playbooks, Deep Wiki, With Search, and integrations with Linear, Jira, and Slack.
Small Bets, Big Impact: Building GenBI at Northwestern Mutual
Asaf Bord shares how a 160-year-old Fortune 100 insurance company built GenBI using 6-week sprints, continuous plug-pulling rights, and incremental value delivery. Learn the crawl-walk-run adoption strategy, why 80% of BI work is report routing (not SQL generation), and the honest assessment that executive-ready AI may never arrive.
7 Habits of Highly Effective GenAI Evaluations: AWS Framework for Production AI
Justin Muller, Principal Applied AI Architect at AWS, reveals the battle-tested 7 Habits framework that transformed document processing from 22% to 92% accuracy in 6 months. Learn why evals are the missing piece to scaling GenAI, the 30-second rule for rapid iteration, and how to build evaluation systems that enable production deployment with real-world case studies and practical implementation guidance.
Form Factors for Your New AI Coworkers: A Design Framework
Craig Wattrus from Flatfile presents a four-form-factor framework for AI coworkers: Invisible, Ambient, Inline, and Conversational. Learn why traditional design processes fail with LLMs and how playful experimentation leads to better AI products through "feeling the material" and character coaching over control.
How Bolt.new Scaled $0-20M ARR in 60 Days with 15 People
Eric Simons shares how Bolt.new went from near-shutdown to $20M+ ARR in 60 days with just 15 people. Learn the Spartan mentality, community strategies, AI-powered support, and team culture that made it possible.
Llama 3 at 1,000 tokens/s on SambaNova AI Platform
Full workshop on achieving unprecedented Llama 3 inference speeds of 1,000 tokens/second using SambaNova's Composition of Experts architecture and custom RDU hardware. Includes hands-on RAG implementation with LlamaIndex, ChromaDB, and performance benchmarks (16 chips vs 576, full precision).
Why Cisco Ditched RAG for Fine-Tuning in Production AI Agents
Ola Mabadeje from Cisco's Outshift group reveals how they built a 5-agent system for network change management, why fine-tuning beat RAG for knowledge graph queries (drastic token reduction), and how knowledge graphs serve as digital twins for safe testing before production. Complete architecture with real quotes.
Ship Agents that Ship: Building Production AI Agents with Guardrails
Kyle Penfound and Jeremy Adams from Dagger demonstrate building production-ready AI agents in a hands-on workshop. Learn guardrails, container-native development, GitHub integration, function calling patterns, and practical production-ready agent architecture.
From Arc to Dia: How The Browser Company Built AI-Native Tools
Samir Mody from The Browser Company shares lessons from building Arc and Dia. Learn how they achieved 10x iteration speed, built Jeba for automated prompt optimization, embraced non-engineers as AI developers, and navigated security challenges in AI browsers.
Why Bolt.new Won and Most DevTools AI Pivots Failed
From Death's Door to $100M ARR: The Three-Step Framework, Anti-Patterns to Avoid, and How to Create Categories Instead of Features
Rust is the Language of AGI: Why AI Prefers Rust Over Python
Michael Yuan explains why Rust is the perfect language for AI-generated code. Learn how the Rust compiler serves as a reward function for AI, the MCP tools that automate code generation, and the path to AGI through verifiable code with 1,000+ developers already using Rust Coder in production.
Code World Model: Building World Models for Computation
Jacob Kahn from Meta FAIR presents Code World Model (CWM) - a 32B parameter model that explicitly models program execution dynamics rather than just syntax. Learn how execution tracing, asynchronous RL with mid-trajectory updates, and "neural debugging" enable AI to simulate code execution without running it, effectively approximating solutions to the halting problem.
tldraw.computer: The Visual AI Language That Executes Like Code
Steve Ruiz demos tldraw.computer - a visual programming language where LLMs execute through graph-based nodes. See multimodal computing, self-scripting nodes, and AI as collaborator returning structured shapes, not pixels.
Agent Reinforcement Fine Tuning: OpenAI's Breakthrough in Training AI Agents
Will Hang and Cathy Zhou from OpenAI introduce Agent RFT - the first time models can interact with the outside world during training. Learn how Cognition, Cosine, and Qodo achieved dramatic improvements with as few as 10 examples, the four success principles, and why parallel tool calling reduces latency from 8-10 steps to 4.
AI Copilots for Tech Architecture: The Highest-ROI Use Case
Why Architecture Copilots Deliver Higher ROI Than Coding Copilots: Preventing Costly Mistakes, Justifying Nine-Figure Infrastructure Spends, and Enabling Safe Delegation to Developers
Claude plays Minecraft! Emergent AI Behavior & Agent Engineering
AWS engineer demonstrates building Rocky, a Minecraft bot powered by Claude Haiku and Amazon Bedrock Agents. Live demo with emergent behavior, real-time gameplay, and practical lessons on architecture evolution from LangChain to Bedrock.
AI Kernel Generation: What's Working, What's Not, What's Next
Natalie Serrino from Gimlet Labs on AI-Driven GPU Optimization: 25-70% Speedups, Agentic Synthesis Swarm, Hardware-in-the-Loop Verification, and the Path Forward for PTX Generation and Formal Verification
Giving a Voice to AI Agents: Voice AI 2.0, Contextual AI, and <500ms Latency
Scott Stephenson, CEO at Deepgram, explains the evolution from Siri-era Voice AI 1.0 to LLM-powered Voice AI 2.0, the Intelligence Revolution timeline (25-30 years), accuracy improvements (75% to 90%+), latency breakthroughs (2-5s to 100-200ms), and how contextual AI传递es conversation context to enable human-like voice interactions with <500ms roundtrip latency
Efficient Reinforcement Learning: Asynchronous Pipeline RL & GPU Optimization
Rhythm Garg and Linden Li from Applied Compute present efficient RL systems for enterprise applications. Learn about asynchronous vs synchronous RL, GPU utilization optimization, staleness trade-offs, system-level modeling, and first-principles optimization for end-to-end performance.
A Year of Gemini Progress + What's Next: 50x Growth, Universal Assistant, and Agentic AI
Logan Kilpatrick from Google DeepMind recaps a transformative year — 10 years of progress in 12 months, 50x inference growth, Gemini 2.5 Pro final update, organizational evolution, and what's next for universal assistant vision, omnimodal models, agentic AI, and developer platform expansions (Embeddings API, Deep Research API, Veo 3, Imagen 4, AI Studio repositioning).
Top Ten Challenges to Reach AGI
Stephen Chin and Andreas Kollegger explore the fundamental obstacles to AGI through science fiction memes—from Memento's memory problem to The Matrix's simulation control. A concise 4-minute lightning talk covering memory limitations, alignment problems, transparency issues, and the ultimate question: do we know what to ask AGI?
Taxonomy for Next-Gen Reasoning: Why AI Gains Aren't Free
Nathan Lambert's Four-Pillar Framework: Skills, Calibration, Strategy, Abstraction—10-100x Token Waste Problem, Post-Training RL Compute Revolution (1% → 10%+)
2025: LLMs, Pelicans & Bicycles
Why Agent Engineering
swyx Landmark Keynote: Why 2025 is the Year of Agents - 6 Enabling Factors, Agent Definitions, PMF Use Cases, ChatGPT Growth Analysis, and the Evolution of AI Engineering as a Discipline
Latent Space Paper Club: DeepSeek R1/V3 and Test Time Compute
8B = 235B Distillation Breakthrough, Doubled Reasoning Tokens, and the New Scaling Paradigm from Chinchilla to Inference
Agentic GraphRAG: AI's Logical Edge
Neo4j MCP Tools, GraphRAG Architecture, and Enterprise Case Study with 85% Adoption
Anchoring Enterprise GenAI with Knowledge Graphs: 75% Faster Onboarding
Pfizer & Neo4j Case Study: How GraphRAG Achieved 3 Months → 3 Weeks with Knowledge Graphs. Real Enterprise Lessons on Technology Transfer, Workforce Knowledge Retention Crisis (20 Years → 3 Years Tenure), and Navigating Organizational Politics.
AI Engineering at Jane Street - Building AI Tools in OCaml
John Crepezzi shares how Jane Street builds custom AI infrastructure when off-the-shelf tools won't work. Learn workspace snapshotting, Code Evaluation Service (CES) running 50-100x faster than builds, Aid sidecar architecture for multi-editor support, and why they have more OCaml code than exists publicly worldwide.
AI Agents, Meet Test Driven Development
Why TDD is Critical for Reliable AI Agents: L0-L4 Agentic Workflow Framework, Evaluation Loops, and SEO Agent Demo with 60% Performance Improvement
2025 is the Year of Evals!
Why AI Evaluation Finally Breaks Through: Three Converging Forces, C-Suite Alignment & Market Validation
2026: The Year The IDE Died
Why Vibe Coding Will Transform Software Development
The Price of Intelligence: AI Agent Pricing in 2025
Comprehensive analysis of AI agent pricing models, cost structures, and the economics of intelligence — outcome-based pricing, prepaid credits, cost optimization strategies, and 2025 predictions from 13+ real company examples
BlackRock: 8 Months → 2 Days
How to Build Custom Knowledge Apps at Scale - Human-in-the-Loop Design, LLM Strategies, and Why Autonomous Agents Don't Work in Finance
How to Build World-Class AI Products
Sarah Sachs (AI Lead, Notion) and Carlos Esteban (Braintrust) share their evaluation-first approach to building AI products. Learn why Notion spends 90% of time on evaluation and 10% on prompting, with practical guidance on data management, trace-based debugging, user feedback analysis, multi-turn conversation evaluation, and production monitoring with online scoring.
Five Hard Earned Lessons About Evals: Why Braintrust Ships 2 Weeks After New Model Releases
Ankur Goyal (CEO, Braintrust) shares hard-earned lessons: YAML vs JSON (15% token savings), GPT-4o 10% → Claude 4 Sonnet viable (6x better), Notion's 24-hour model integration, continuous reconciliation, and why great evals must be engineered like any other software system.
12-Factor Agents: Building Reliable LLM Applications
Production AI Methodology from HumanLayer - Transform Unreliable Demos into Dependable Systems
3 Ingredients for Building Reliable Enterprise Agents
The Mathematical Formula for Agent Success: P(success) × Value - Cost(failure) > Cost(running)
Agents are Robots Too
What Self-Driving Taught Me About Building Agents: Agentics, 1% vs 99% Problem, and Closed-Loop Systems
AI Code Quality: Hype vs Reality
AI Consulting in Practice
AI Music Generation: From Prompt to Production
Hands-on workshop exploring AI music generation tools (Udio, Suno, Stable Audio), voice cloning (RVC), stem separation (Wave, UVR5), and the RIAA legal battle. Learn practical workflows for generating professional-quality music from text prompts.
AI + Security & Safety: Why Your Agent Can't Go to Production
The Single-Process Security Flaw, Real-World Production Blocker, and Three-Layer Defense Framework from Apache Ranger's Creator - Don Bosco Durai, Priv
Vibes Won't Cut It
Production Reality vs. AI Hype in Software Engineering - Why Professional Engineers Are Skeptical and What Actually Works
MongoDB Atlas Vector Search: RAG Without the Complexity
Unified Platform: HNSW Algorithm, Search Nodes for Independent Scaling, Framework Integrations (LangChain, LlamaIndex), and Production-Ready RAG with 4,096 Dimensions Support
Building in the Gemini Era
Google DeepMind's Vision for AI-Assisted Development
#define AI Engineer: Technical Humility & Research-Engineering Symbiosis
Greg Brockman (OpenAI President) & Jensen Huang (NVIDIA CEO) on the evolution from AlexNet to 100K GPU clusters, why technical humility matters, and the future of domain-specific AI agents. "If you don't have the idea, you're dead in the water. But if you don't have the engineering, that idea is not going to live and see the light of day." Learn about the 3-phase evolution of AI engineering at OpenAI, the cultural divide between engineers and researchers, and predictions for AGI-era development workflows.
The Next Unicorns: 7 Top AI Startups from HF0 Residency
Real Revenue, Validated Models: 25M Users, $100M ARR Across Portfolio. Meet Krea, OpenHome, Koframe, Federous AI, Upside, OpenAudio, Glow, Favored, and OpenRouter.
The AI Developer Experience Doesn't Have to Suck
Why and How Modal Rebuilt Cloud Infrastructure from Scratch
AI Native Company
AMP Code: Next Generation AI Coding
What Data from 20m Pull Requests Reveal About AI Transformation
Jellyfish Research: 2x Throughput, 24% Faster Cycles, and the Architecture Correlation That Determines AI Success (4x vs 0x Gains)
Shipping AI That Works: An Evaluation Framework for PMs
LLM-as-Judge Methodology with 4 Components: Role, Task, Context, Goal. Why Even OpenAI and Anthropic CPOs Say Models Hallucinate. Transition from Vibe Coding to Thrive Coding in Production.
Architecting Agent Memory: Principles, Patterns, and Best Practices
MongoDB's Guide to Building Stateful AI Agents: Memory Management Lifecycle, Four Memory Types, and Voyage AI Integration
Autonomy Is All You Need
How Replit Broke the One-Hour Autonomy Barrier for Non-Technical Users
Claude Code Evolution
Claude plays Minecraft!: When AI Spontaneously Emerges Unexpected Behavior
AWS Solutions Architect's live demo of Rocky, a Minecraft bot powered by Claude Haiku and AWS Bedrock, showcasing emergent AI behaviors including autonomous escape from a hole, 3D spatial reasoning, and the critical Return of Control pattern for production agents
Compilers Age of LLMs
Continual System Prompt Learning for Code Agents
5-15% Improvement with Only 150 Examples - A Practical Alternative to Reinforcement Learning
Building Cursor Composer: Fast, Smart, and Parallel
Lee Robinson reveals how Cursor built their first agent model with 4x token efficiency, parallel tool calling breakthrough, and RL infrastructure secrets. Learn about the 3.5x Blackwell speedup, semantic search impact, and vertical integration advantages.
Developing Taste in Coding Agents
Meta Neuro-Symbolic RL: 10x PR Increase with Acquired Taste
Devin 2.0 and Moore's Law for AI Agents
Scott Wu's Framework: 70-Day Doubling Cycle - From Tab Completion to Autonomous Engineers in 18 Months. Deep Wiki, Automated Testing, Backlog Processing, and the Future of Software Engineering
The DevOps Engineer Who Never Sleeps: AI Agents at Datadog
Diamond Bishop from Datadog shares what they learned building AI agents that automate on-call duties, handle incident response, and transform DevOps. Covers evaluation strategies, team composition, LLM observability, and predictions about agents surpassing humans as SaaS users.
Don't Build Agents, Build Skills
Enterprise Deep Research
Finetuning 500M Agents
Future-Proof Coding Agents
OpenAI Guide to Building AI That Writes Code and Survives Model Evolution
Good Design Hasn't Changed With AI
Hard-Won Lessons: Cline
How Claude Code Works
Architecture Deep Dive: Flexible Loops, Skills System & Comparison with Cursor, AMP, and OpenAI Codex
How to Look at Your Data
A Practical Guide to Evaluating RAG Systems: Fast Evals, Cluster Analysis, and Data-Driven Decision Making
Your Personal Open-Source Humanoid Robot for $8,999
How K-Scale Labs Built a $9k Open-Source Humanoid in 5 Months: Sim-to-Real RL, Python SDK, and Democratizing Robotics
The Infinite Software Crisis: When AI Generates Faster Than We Understand
Jake Nations from Netflix argues that while AI has dramatically accelerated code generation, it has created a dangerous gap between what we can produce and what we can understand. He presents "context compression" as a three-phase solution (Research, Planning, Implementation) to maintain control over complex systems.
Luminal Python Compiler
How Search Conquered Compiler Complexity
Luminal AI Automatically Rediscovered Flash Attention Using Search-Based Compilation—12 Primitives, E-Graphs, and 3M → 5K Lines of Code
Making Codebases Agent Ready
Organizational Readiness for Autonomous AI Development
Production Software Keeps Breaking and It Will Only Get Worse
Why AI Writes Code Faster But Debugging Gets Harder - Three-Part Framework: Causal ML + LLMs + Swarms of Agents. DigitalOcean Case Study: 40% MTTR Reduction
Paying Engineers Like Salespeople
Poolside's Path to AGI
Reinforcement Learning, Defense & Vertical Integration
Reliable Enterprise Agents
Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing
Neil Dwyer reveals how to achieve $1/hr voice AI costs using Orpheus model, LoRA fine-tuning, vLLM with FP8, and consistent hash load balancing
Skills vs Agents
The Unbearable Lightness of Agent Optimization
Why Your Agent Optimization is Failing (And How to Fix It)
Unlocking AI Powered DevOps Within Your Organization
Practical Patterns from GitHub: Realistic Metrics (30% Average), IDE Integration Over Chat Tools, Human-in-the-Loop Autonomous Agents
Coding Evals: From Code Snippets to Codebases
How AI Code Evaluation Evolved from Single Functions to Hour-Long Challenges—and Why 30% is Reward Hacking
ChatGPT is poorly designed. So I fixed it
Multimodal Voice + Text Integration Using GPT-4o Realtime API - "Shipping the Org Chart" Anti-Pattern and the FaceTime + iMessage Solution
Building Conversational AI Agents
Multilingual Architecture with ElevenLabs: 31 Languages Now, 99 Coming in V3, $5M Voice Marketplace, Production-Grade Low-Latency Pipeline
Code World Model: Building World Models for Computation
Jacob Kahn from FAIR Meta presents a revolutionary framework for understanding code through execution modeling. Learn how 32B parameter models trained on execution traces enable semantic understanding, neural debugging, and approximation of undecidable problems like the halting problem—all through bash-first async RL with mid-trajectory model updates.
The Cure for the Vibe Coding Hangover
Systematic Framework for Reliable AI-Augmented Development: 5-Step Planning Phase, Multi-Sensory Feedback Loop, Binary Dependencies, and Circular Resolution Strategies That Transform AI Agents from Erratic Novices into Predictable Implementation Partners
War on Slop
Enterprise Ready MCP: The Complete Guide
From Localhost to Production: Taking Model Context Protocol Servers from Demo to Enterprise Deployments - Security Challenges, Compliance Requirements, and Implementation Realities
LinkedIn 360Brew: One Model to Replace All Recommendation Systems
How LinkedIn replaced dozens of specialized models with a single LLM—achieved 7x latency reduction and 30x throughput improvement through promptification and model distillation
Best Practices for Evaluating LLM Applications with llmeval
Niklas Nielsen from Log10 introduces llmeval - a command-line tool for reliable LLM evaluation built on Meta's Hydra. Learn about flexible test criteria for fuzzy model outputs, Python-based metrics, model-based evaluation with its pitfalls (self-preference bias, score inflation), and the innovative "Auto-John" concept for scaling human feedback through AI personas.
Netflix Foundation Model: One Model to Rule All Recommendations
How Netflix proved scaling laws apply to recommendation systems—applying LLM techniques like multi-token prediction, long-context training, semantic embeddings, and rich multi-task objectives to achieve infrastructure consolidation and quality improvements
Leadership in AI Assisted Engineering
Justin Reock shares aggregated data from 140,000 engineers revealing extreme variability in AI impact (+20% to -20%) and provides a leadership framework for successful AI adoption. Learn why writing code has never been the bottleneck, how Theory of Constraints applies to AI, and the 7 leadership principles that separate +20% outcomes from -20% outcomes.
Government Agents: AI Agents Meet Tough Regulations
Mark Myshatyn from Los Alamos National Lab reveals how one of the most secure government organizations is deploying AI agents that design fusion capsules and execute code on HPC systems. Learn about 1000+ security controls, FedRAMP compliance, OpenAI models on classified networks, Venado supercomputer (2500+ GraceHopper nodes), and four architecture principles for government-ready AI. "We are not a t-shirt company...People can die if we do this wrong."