AI Engineer Summit 2024

The DevOps Engineer Who Never Sleeps: AI Agents at Datadog

Diamond Bishop shares what Datadog learned building AI agents that automate on-call duties, handle incident response, and transform DevOps workflows—from evaluation strategies to predicting when agents will surpass humans as primary SaaS users.

"There's a good chance that agents surpass humans as users in the next five years... this means that you shouldn't just be building for humans or building your own agents, you should really think about agents that might use your product as well."

— Diamond Bishop, AI Engineer at Datadog • 65:00

15+ years

Career in AI

Microsoft, Amazon, Meta, Datadog

2015

AI Since

Proactive alerting & impact analysis

2+

Agent Types

On-call engineer & Software engineer

5 years

Prediction

Agents surpass humans as users

Why This Talk Matters

The Era Shift: Intelligence Too Cheap to Meter

We're experiencing a paradigm shift comparable to the microprocessor or the move to SaaS. Foundation models are making intelligence too cheap to meter, and products like Cursor are growing explosively as people expect more from AI every day.

The #1 Mistake: Building Demos Without Evaluation

Diamond's most important warning: "The number of mistakes we made by not thinking about eval first is frustrating."

  • It's easy to build demos that look like they work
  • Hard to verify and improve over time
  • Start with eval, not with features
  • Make it measurable at each step

From DevOps Platform to AI Agent Platform

Datadog is transitioning from being just an observability platform to becoming AI agents that use that platform for you. This requires developing agents, doing evals, and building new types of observability.

"Use your domain experts but use them more like design Partners or task verifiers, don't use them as the people who will go and kind of write the code or rules for it because there is a big difference in how these kind of stochastic models work versus how experts work."

Diamond Bishop, AI Engineer at Datadog

42:00

About Diamond Bishop

Datadog logo

15 Years Building AI Friends and Co-workers

Diamond has spent their entire career working in AI through multiple AI winters and waves. From Microsoft Cortana to building Alexa at Amazon, working on PyTorch at Meta, and starting a DevOps assistant startup before joining Datadog to build Bits AI.

Microsoft
Cortana
Amazon
Alexa
Meta
PyTorch
Startup
DevOps Assistant

About Datadog

Datadog is the observability and security platform for cloud applications. They've been shipping AI features since 2015 (proactive alerting, root cause analysis, impact analysis, change tracking) and are now transitioning to build AI agents that use the platform for you.

Observability
Security
Cloud Infrastructure
AI Since 2015
Bits AI

Datadog's AI Agents in Private Beta

AI On-Call Engineer

The agent that "wakes up for you in the middle of the night" so you don't have to respond to 2 AM alerts. It proactively investigates incidents, reads runbooks, and suggests remediations.

"Our on call engineer is there to really make it so you can keep sleeping."

Diamond Bishop

14:00

AI Software Engineer

The "proactive developer" that observes errors, analyzes them, identifies causes, proposes solutions, and even generates code fixes with tests to prevent recurrence.

"This workflow significantly reduces the time spent by an engineer manually writing and testing code and greatly reduces human time spent overall."

Diamond Bishop

33:00

How the AI On-Call Engineer Works

1

Situational Orientation

Agent wakes up when alert occurs, reads runbooks, grabs context of the alert

2

Investigation Loop

Looks through logs, metrics, and traces to figure out what's happening

3

Hypothesis Generation

Creates hypotheses about root causes and ways to test them

4

Tool Usage

Uses tools to run queries against logs, metrics, and traces to validate hypotheses

5

Remediation Suggestions

Suggests fixes—page in another team, scale infrastructure, or execute existing workflows

6

Postmortem Generation

Writes incident postmortems documenting what occurred and what humans did

Four Key Learnings from Building AI Agents

Diamond shares hard-won lessons from Datadog's journey building production AI agents.

1. Start with Evaluation, Not Demos

"It's very easy to build out demos quickly, much harder sometimes to scope and eval what's occurring."

  • Scope tasks for evaluation: Define jobs to be done step-by-step from the human angle
  • Make it measurable: Every step should be verifiable
  • Build vertical, task-specific agents: Not generalized ones
  • Domain experts as design partners: Use them as task verifiers, not rule writers

2. Build the Right Team

"You don't have to have a bunch of ML experts... what you really need is you want to seed it with one or two and then have a bunch of optimistic generalists."

  • 1-2 ML experts: Seed the team with experts
  • Optimistic generalists: Good at writing code, willing to try things fast
  • Frontend matters: UX is terribly important for AI collaboration
  • AI-augmented mindset: Teammates excited to be augmented by AI

3. The UX of AI is Changing

"I'm partial to agents that work more and more like human teammates instead of building out a bunch of new pages or buttons."

  • Transparency earns trust: Show reasoning, hypothesis, and evidence
  • Human-AI collaboration: Let humans verify and learn from agent decisions
  • Ask follow-up questions: Enable dialogue like with a junior engineer
  • New UX patterns: Old patterns are changing—be comfortable with ambiguity

4. Observability Matters (Who Watches the Watchmen?)

"Observability is actually really important and don't make it an afterthought... these are complex workflows you really need situational awareness to debug problems."

  • LLM observability: Full stack from GPUs to LLM monitoring to system end-to-end
  • Agent graphs: Visualize complex multi-step calls with human-readable error nodes
  • Hundreds of calls: Agent workflows get messy fast—you need visualization
  • Saved us time: Observability has been critical for debugging agent workflows

The Agent "Bitter Lesson": Generalization Over Fine-Tuning

"General methods that can leverage new off the shelf models are ultimately the most effective... by a large margin. You sit there you fine-tune you do all this work on the specific project and then all of a sudden OpenAI or someone comes out with a new model and it handles all this quickly."

Diamond Bishop, AI Engineer at Datadog

59:00

Rising Tide Lifts All Boats

Don't feel stuck to a particular model you've been working with. Build systems that can easily swap in new models as they're released. The agent or application layer bitter lesson is that flexibility beats fine-tuning.

Five Bold Predictions for the Future

1

Agents Surpass Humans as Users in 5 Years

"There's a good chance that agents surpass humans as users in the next five years... I think we're somewhere around the five-year mark."

This means SaaS companies should design APIs and contexts for agent consumption, not just human UI.

2

"DevSecOps Agents as a Service" For Hire

"I strongly believe that we'll be able to offer a team of DevSecOps agents for hire to each of you soon."

You won't integrate with platforms directly—your agents will do that for you, handling on-call and everything else.

3

AI Agents Will Be Paying Customers

"AI agents will be customers... many of you building out SRE agents and other types of agents, coding agents should use our platform should use our tools just like a human would."

Third-party agents like Claude using Datadog directly via MCP is just the beginning.

4

Order-of-Magnitude More Ideas Become Reality

"Small companies are going to be built by someone who can use auto developers like Cursor or Devin to get their ideas out into the real world and then agents like ours to handle operations and Security."

The combination of coding agents + operations agents enables an order of magnitude more ideas to ship.

5

Intelligence Becomes "Too Cheap to Meter"

"This general shift where intelligence becomes too cheap to meter."

Similar to electricity or bandwidth, intelligence will become so inexpensive it's not worth metering—fundamentally changing software economics.

Key Takeaways for Building AI Agents

Diamond's Actionable Advice

  • Start with Evaluation: Think deeply about your eval before building features. Build offline, online, and living evals with end-to-end measurements.
  • Scope Tasks Carefully: Define jobs to be done step-by-step from the human perspective. Build vertical, task-specific agents, not generalized ones.
  • Build the Right Team: Seed with 1-2 ML experts, then add optimistic generalists who write code fast and are excited to be AI-augmented.
  • UX Matters More Than You Think: Transparency earns trust. Design for human-AI collaboration with agents that work like teammates.
  • Observability is Non-Negotiable: Complex agent workflows require situational awareness. Build LLM observability from day one.
  • Avoid Model Lock-In: General methods that leverage new off-the-shelf models beat fine-tuning. Rising tide lifts all boats.
  • Design for Agent Users: In 5 years, agents will surpass humans as primary SaaS users. Design APIs and contexts for agent consumption.
  • Embrace the Weird Future: The future will be weird but fun. AI is accelerating every day. Be ready for DevSecOps agents for hire.
"The future is going to be weird, it'll be fun, and AI is accelerating each and every day."

Diamond Bishop, AI Engineer at Datadog

68:00

Related Resources

Research Notes & Methodology

This highlight page is based on a comprehensive analysis of Diamond Bishop's talk at the AI Engineer Summit 2024. The analysis covers Datadog's journey building AI agents for DevOps automation, including the AI On-Call Engineer and AI Software Engineer.

Source Material:
  • • Full VTT transcript (452 lines, 16,855 chars)
  • • Complete talk recording (~75 minutes)
  • • Datadog official documentation
  • • Datadog Engineering blog posts
Analysis Method:
  • • Complete transcript analysis
  • • Verbatim quote extraction
  • • Fact-checking against sources
  • • Cross-reference with documentation

Video: The Devops Engineer Who Never Sleeps — Diamond Bishop, Datadog

Speaker: Diamond Bishop, AI Engineer at Datadog • Event: AI Engineer Summit 2024