Deep analysis of 200+ video transcripts revealing the real patterns of AI transformation—from Fortune 500 enterprises to AI-native startups. What works, what fails, and why the gap between leaders and laggards is widening.
The most conservative, low-tech industries—legal, healthcare, insurance—are adopting AI faster than tech companies. Why? They have clear high-value use cases and less legacy AI baggage.
Harvey (Legal AI)
>$70M
2 years, >$70M ARR. "AI is essential now to being competitive in legal industry."
Sierra (Customer Service)
70%
Queries automated for SiriusXM and ADT. CEO is OpenAI Chairman.
The Adoption Chasm
10x
Difference in effectiveness between 90% vs 100% AI adoption.
The fundamental economic shift has occurred: code generation is no longer the bottleneck; context management and orchestration are now the primary challenges.
Professional services shifting from junior-heavy teams doing grunt work to AI-amplified senior experts who can scale their decades of experience through AI agents.
Engineers becoming 'spec writers' rather than 'code writers', reviewing AI-generated code rather than writing it. The ability to articulate requirements clearly is now more valuable than syntax knowledge.
Best AI products don't require prompts—they anticipate needs through context and workflow understanding. If users must prompt, your product has UX problems.
APIs and documentation now designed for LLM consumption (llms.txt, curl examples) not just humans. 'Documentation is conversation with future AI systems.'
Agents need persistent state and crash recovery. Event-driven architecture is wrong for long-running AI workflows—durable execution (Temporal) is the right abstraction.
These areas of disagreement represent the highest-uncertainty, highest-opportunity areas for strategic decision-making.
VIEWPOINT A
Sarah Guo (Conviction): Co-pilots are underrated. Human tolerance for failure decreases as latency increases. Build 'Iron Man suit' augmentation first, then extend to autonomous suit.
VIEWPOINT B
Donald Hruska (Retool): Agents are the $500B opportunity. Vibe coding needs agents to work. The question isn't whether to build agents, but where to use managed platforms vs hand-rolled.
VIEWPOINT A
Mark Zuckerberg: AI will handle mid-level engineering work by end of year. Entry-level positions being wiped out.
VIEWPOINT B
Beth Glenfield, others: AI amplifies engineers, creates new roles. We'll have MORE engineering jobs, just different ones. YC running AI school for young people.
VIEWPOINT A
Jan Siml: Build in-house once workflow is yours. 'Stop ordering AI takeout.' We built in 2 weeks, delivered millions in revenue. Buy for vendor integrations.
VIEWPOINT B
Others: Use SaaS to explore unknown, build in-house once workflow is proven. Don't reinvent wheels for non-core capabilities.
VIEWPOINT A
Jane Street, Meta's CodeCompose: Fine-tuning essential for niche domains (OCaml has more training data inside Jane Street than exists publicly).
VIEWPOINT B
Box: Prefers prompts and orchestration. Models improve too fast to justify fine-tuning investment. Focus on data quality instead.
VIEWPOINT A
John Welsh (Anthropic): MCP is industry standard. We built MCP gateway for all internal integrations. 'MCP is just JSON streams.'
VIEWPOINT B
David Cramer (Sentry): MCP is not good yet. It's in beta, clients break constantly. 'You cannot just proxy OpenAPI—must design for agent context.'
VIEWPOINT A
Some say 'don't hire juniors, AI will replace them.' TechCrunch reporting entry-level engineering jobs being wiped out.
VIEWPOINT B
YC runs AI school specifically for 2,000 young people. Juniors who leverage AI from day one can contribute at higher levels immediately.
Leadership asks for AI adoption without specific use cases
Can't answer 'what problem does this solve and what's the dollar value?'
Measuring success by F1 scores or NDCG instead of business outcomes
Building complex multi-agent systems for simple problems
Benchmarks show 95% success but real users struggle
Developers can't explain what good looks like
No domain experts on the AI team
Evaluations are expensive so you skip them or don't run them frequently
Relying solely on semantic similarity for retrieval
Hiring AI researchers without domain expertise
Can't switch models because prompts are vendor-specific
Treating time savings as primary ROI rather than revenue or new capabilities
Real strategies from companies that have successfully scaled AI adoption.
6-week sprints with tangible deliverables at each phase. Leadership can pull plug at any point, sunk cost bias eliminated.
Example
Northwestern Mutual GenBI: Research → Metadata → Search → Pivoting. Each phase business could productize or stop.
Training developers on AI literacy dramatically increases adoption. Daily users see 30% productivity gains vs minimal impact for untrained users.
Example
Booking.com: Training + hackathons transformed non-users into passionate daily users showing 30%+ gains.
Let tax analysts write prompts, let ML engineers focus on quality and metrics. The people who understand the problem should specify the solution.
Example
Intuit: Tax professionals write prompts, ML engineers handle evaluation and quality assurance.
Start with BI experts, then business managers, never give to executives until highly accurate. Executives can't use AI yet—too many errors.
Example
Northwestern Mutual: BI experts → Business managers → (executives not ready yet).
Use Temporal or similar for agent workflows. Automatic retry, state persistence, and crash recovery. Events are wrong abstraction.
Example
Temporal customers: Applications no longer brought down by mismanaged queues or race conditions.
Tight feedback loops make users happy → provide improvement ideas → run experiments → drives adoption → generates prioritization data → flywheel accelerates.
Example
Internal tool: User feedback drove roadmap, healthy competition via leaderboards, managers invested.
GPT-4 is 60x more expensive than 4-mini with minimal quality difference. Better to invest in triggers and data.
Example
Jan Siml: 'When we changed models, only thing that changed were costs and evals. Build for what users need.'
The best UI is one you never need to use. Proactive system had 20 points higher NPS than chat app and order of magnitude higher engagement.
Example
Internal system sending daily digests of what you need to know today.
Provide llms.txt, curl examples, OpenAPI schemas, MCP servers. Documentation is conversation with future AI systems.
Example
Anthropic standardizes on MCP internally for all context plumbing.
Rationale
Cognition (Devin) already top committer in many companies. Evolution from migrations → complex bugs → backlog clearing → full autonomy.
Rationale
Speech-to-speech models (OpenAI real-time API) provide lower latency. Voice more information-dense, carries emotion/tone.
Rationale
Costs dropped 99.7% from 2022-2024. DeepSeek releasing models competitive with frontier models at 'fraction of training cost.'
Rationale
Every (99% AI-written code), Bolt.new ($20M ARR in 60 days, 15 people). Gap between leaders and laggards widening rapidly.
Rationale
Voyage AI betting long-term on RAG. MongoDB integrating retrieval natively. Debate continues with fine-tuning advocates.
Rationale
Google DeepMind: 'deep thinking with millions of inference tokens.' O1/O3 already showing this path.
Rationale
Linear, Factory, others building agent coordination platforms. 'By end of 2025 we'll see more multi-agent collaborations in production.'
Rationale
Strong disagreement. Some say yes, others say AI will augment not replace, creating MORE engineering jobs.
"I saved" / time saved claims
Booking.com calls this 'semi BS'—based on limited research, not statistically relevant. Doesn't account for rework, quality issues, or actual business impact.
Commit count / PR count / lines of code
Stanford found AI generates more code but 30% is rework (fixing bugs AI introduced). More commits ≠ more productivity.
F1 scores, NDCG, offline eval metrics
Jan Siml: "Offline evals never sign a contract. Nobody at board meeting asks for your F1 score."
Track to dollar-based outcomes
Lead time for change, quality metrics, modernization impact, revenue recovered, FTE savings. "Instrument everything until you can say this task led to $20 here."