Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin | Vanishing Gradients

Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin

FEB 18, 202651 MIN

Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin

FEB 18, 202651 MIN

Description

Our thesis is that AI is still just engineering… those people who tell us for fun and profit, that somehow AI is so, so profound, so new, so different from anything that’s gone before that it somehow eclipses the need for good engineering practice are wrong. We need that good engineering practice still, and for the most part, most things are not new. But there are some things that have become more important with AI. One of those is durability.Samuel Colvin, Creator of Pydantic AI, joins Hugo to talk about applying battle-tested software engineering principles to build durable and reliable AI agents.They Discuss:* Production agents require engineering-grade reliability: Unlike messy coding agents, production agents need high constraint, reliability, and the ability to perform hundreds of tasks without drifting into unusual behavior;* Agents are the new “quantum” of AI software: Modern architecture uses discrete “agentlets”: small, specialized building blocks stitched together for sub-tasks within larger, durable systems;* Stop building “chocolate teapot” execution frameworks: Ditch rudimentary snapshotting; use battle-tested durable execution engines like Temporal for robust retry logic and state management;* AI observability will be a native feature: In five years, AI observability will be integrated, with token counts and prompt traces becoming standard features of all observability platforms;* Split agents into deterministic workflows and stochastic activities: Ensure true durability by isolating deterministic workflow logic from stochastic activities (IO, LLM calls) to cache results and prevent redundant model calls;* Type safety is essential for enterprise agents: Sacrificing type safety for flexible graphs leads to unmaintainable software; professional AI engineering demands strict type definitions for parallel node execution and state recovery;* Standardize on OpenTelemetry for portability: Use OpenTelemetry (OTel) to ensure agent traces and logs are portable, preventing vendor lock-in and integrating seamlessly into existing enterprise monitoring.You can also find the full episode on <a target="_blank" href="https://open.spotify.com/show/3yuz89gqAhcMcdy3SZPe4X?si=AKl2jvIARD2Liw1bBH2Nng&nd=1&dlsi=8dfe7221896c4fc3">Spotify</a>, <a target="_blank" href="https://podcasts.apple.com/us/podcast/vanishing-gradients/id1610318868">Apple Podcasts</a>, and <a target="_blank" href="https://youtu.be/qM9wQxSM1ow">YouTube</a>.You can also interact directly with the transcript here in <a target="_blank" href="https://notebooklm.google.com/notebook/f4acc37b-078e-4b00-ba6f-c2caa67e9533">NotebookLM</a>: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our <a target="_blank" href="http://xxx">Building AI Applications course</a>. It’s a live cohort with hands on exercises and office hours. Here is a <a target="_blank" href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs">25% discount code for listeners</a>. 👈LINKS* <a target="_blank" href="https://www.linkedin.com/in/samuel-colvin/">Samuel Colvin on LinkedIn</a>* <a target="_blank" href="https://pydantic.dev/">Pydantic</a>* <a target="_blank" href="https://github.com/pydantic/pydantic-stack-demo">Pydantic Stack Demo repo</a>* <a target="_blank" href="https://github.com/pydantic/pydantic-stack-demo/blob/main/durable-exec/deep_research.py">Deep research example code</a>* <a target="_blank" href="https://temporal.io/">Temporal</a>* <a target="_blank" href="https://docs.dbos.dev/">DBOS (Postgres alternative to Temporal)</a>* <a target="_blank" href="https://luma.com/calendar/cal-8ImWFDQ3IEIxNWk">Upcoming Events on Luma</a>* <a target="_blank" href="https://www.youtube.com/@vanishinggradients">Vanishing Gradients on YouTube</a>* <a target="_blank" href="https://www.youtube.com/live/Qr4eiLbCfg4">Watch the podcast video on YouTube</a>👉Want to learn more about Building AI-Powered Software? Check out our <a target="_blank" href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles">Building AI Applications course</a>. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a <a target="_blank" href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs">25% discount code for listeners</a>.👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs Get full access to Vanishing Gradients at <a href="https://hugobowne.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4">hugobowne.substack.com/subscribe</a>