Muna • AI Engineer Conference

Compilers in the
Age of LLMs

"You've got an open client in your codebase. You've got a few Hugging Face tabs open. You've got three different repos with the word playground in them. And you've got at least one agentic workflow that's really just stringing together a bunch of HTTP calls."

Yusuf Olokoba

Muna • 00:00:00

Watch this moment

Minutes

Python → C++

Compilation

Any

Hardware

Self-Contained

Binaries

The AI Engineering Crisis

Why current infrastructure is broken

Complexity That Only Grows Over Time

"What developers actually want is something way simpler. Just give me an OpenAI-style client that just works. Let me point it to any model at all. It doesn't matter if it's running locally, if it's running remotely, if it's Llama.cpp or TensorRT. I just want something that works with minimal code changes."

— Yusuf Olokoba (00:01:22)

❌ Docker Container Hell

Every new model means writing Dockerfiles, spinning containers, managing infrastructure

❌ Tool Context Overflow

Each model becomes another tool in agent context, bloating prompts

❌ Infrastructure Rebuild

"How do I use more models in more places without having to rebuild or extend my infrastructure every single time?"

❌ Hardware Lock-in

Models tied to specific hardware, can't run anywhere

Watch the problem breakdown (00:01:43)

The Future: Hybrid Inference

"We expect that in the future we will see smaller models typically much closer to users either locally on their devices or in edge locations working with cloud AI models that are much larger and have bigger reasoning abilities."

— Yusuf Olokoba (00:03:31)

🔵 Edge/Local Models

Smaller, faster, closer to users

🟣 Cloud Models

Larger, more reasoning power

The Implication: "This means that developers have to move away from the cages of Python code and Docker containers into something that is a lot more low-level, closer to the hardware, and a lot more responsive."

Watch hybrid inference vision

The Solution: A Python Compiler

Write simple Python, get self-contained binaries that run anywhere

"Simple Plain Python → Tiny Self-Contained Binary"

"I'll walk you through how we decided to build a compiler for Python that enables developers to write simple plain Python code and then convert that into a tiny self-contained binary that can then run anywhere at all."

— Yusuf Olokoba (00:01:43)

Watch solution introduction

Universal Portability

"Simply because every piece of technology that you've ever touched has a C or C++ compiler. This is what gives us the ability to take high-level Python code and convert it into a form that is self-contained and that can now run anywhere at all."

— Yusuf Olokoba (00:14:25)

Watch portability explanation

Cloud, Apple silicon, Linux, Windows, edge devices — if it has a C/C++ compiler, your model can run there

Write Python

Simple, familiar Python code — no low-level expertise needed

Get Binary

Self-contained executable with zero dependencies

Run Anywhere

Cloud, edge, local — any hardware with C compiler

Minimal Changes

Switch models without rebuilding infrastructure

How It Works: The Compiler Pipeline

From Python code to universal binary in four steps

Step 1

Tracing

Analyze Python code using Abstract Syntax Tree (AST) to build internal representation

Challenge: PyTorch FX failed — dynamic Python is hard to trace

Solution: Custom AST tracer with internal heuristics

Watch tracing explanation (00:06:58)

Step 2

Type Propagation

Bridge dynamic Python typing with static C++/Rust typing systems

"Python is a very dynamic language. So one variable X could be assigned to an integer and then immediately after assigned to say a string. Whereas in lower level languages like C++ and Rust if you declare a variable you must give it a type and that type can never change."

— Yusuf Olokoba (00:08:41)

Watch type propagation (00:08:41)

Step 3

LLM Code Generation

Mass-produce native code translations using LLMs

"We can simply have LLMs generate all the code that we need that translates a function from Python right into C++ and Rust. And so this gives us the ability to basically mass-produce a lot of the operations that we would otherwise have had to manually rewrite ourselves in native code."

— Yusuf Olokoba (00:13:13)

Key Insight: Variety in code comes from combinations, not unique operations

Watch LLM generation (00:13:13)

Step 4

Compilation

Generate self-contained binaries via C/C++ compilers

Result: Single binary file with no Python dependencies

Portability: Runs on any hardware with C compiler

Invocation: FFI + OpenAI-style client wrapper

Watch compilation (00:14:25)

"And with this entire system in place, we have just recreated the official OpenAI client, but given it access to any open-source model that we can get into a Python function."

— Yusuf Olokoba (00:16:08)

The Implication:

Universal model compatibility — use any model (local, remote, any format) through a familiar OpenAI-style client interface, without any infrastructure changes

Watch breakthrough reveal

The LLM Innovation

Using LLMs to mass-produce compiler code — meta!

The Problem: Manual Translation

Converting Python operations to C++/Rust manually is tedious and error-prone. The variety of code combinations is massive.

Challenge: "The variety you'll ever see in source code in the wild is not because there's such a giant volume of these operations... It's actually because you can combine operations in so many different ways."

— Yusuf Olokoba (00:12:07)

The Solution: LLM Mass Production

Use LLMs to automatically generate all the native code translations needed.

"We can simply have LLMs generate all the code that we need that translates a function from Python right into C++ and Rust."

— Yusuf Olokoba (00:13:13)

Why This Approach Works

Operations Are Finite

Core operations in Python are limited — the variety comes from how you combine them

Combinations Are Manageable

LLMs can handle the combinatorial explosion of operation combinations

Verification Ensures Correctness

Generated code is verified and tested before compilation

Mass Production Becomes Possible

What would have taken months of manual work can now be done automatically at scale

Watch code variety explanation (00:12:07)

Actionable Takeaways

How to apply these insights today

For AI Engineers

Infrastructure strategy

Escape Docker hell: Compilers provide universal portability without container overhead
Support any model: Single codebase works with local, remote, any format
Enable hybrid inference: Edge + cloud models in the same application
Reduce tool bloat: No more adding every model as an MCP tool

For Product Teams

Deployment flexibility

Deploy anywhere: Cloud, edge, customer premises, offline
Reduce infrastructure costs: No need for separate model servers
Faster iteration: Switch models without rebuilding infrastructure
Better performance: Native binaries vs Python interpretation overhead

For Researchers

Novel applications

LLMs for compiler code: Use LLMs to mass-produce translation logic
AST-based analysis: Deep code understanding vs simple text replacement
Type propagation: Bridge dynamic and static type systems automatically
Verification frameworks: Ensure LLM-generated code is correct

Future Directions

What's coming next

Hybrid inference patterns: Small local models + large cloud models working together
Edge AI revolution: Move beyond server-side to run on any hardware
Model portability: "Write once, run anywhere" for AI models
Lower-level development: Move from Python cages to closer-to-hardware approaches

"Simply because every piece of technology that you've ever touched has a C or C++ compiler."

— Yusuf Olokoba, Muna (00:14:25)

This simple fact enables universal portability: take high-level Python code, convert it to self-contained binaries, and run it anywhere — from cloud servers to edge devices.

Watch Full Talk (17 min)