AI Engineering Podcast
AI Engineering Podcast

AI Engineering Podcast

Tobias Macey

Overview
Episodes

Details

This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.

Recent Episodes

Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty
FEB 25, 2026
Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty
Summary&nbsp;<br />In this episode of the AI Engineering Podcast, Steven Watt, leader of the Office of the CTO at Red Hat, discusses practical paths to achieving AI sovereignty for organizations. He shares his two-decade experience in AI, highlighting how governments are building GPU platforms and protected data hubs to maintain control over AI workloads. Steve emphasizes why self-managed infrastructure is becoming a strategic necessity as companies outgrow cloud costs and require tighter control over models, data, and compliance. The conversation explores the operational substrate for AI sovereignty, including Kubernetes as the scale-out backbone for LLM serving, bridging the gap with PyTorch ecosystems, observability and policy for non-deterministic systems, and emerging security needs such as confidential inference and agentic identity. They also discuss model and hardware optionality (GPUs, CPUs, and new accelerators), the growing demand for energy-efficient inference, and the importance of open models and post-training to create durable differentiation. Steve identifies access to GPUs as the biggest gap hindering sovereign AI adoption today, emphasizing the need for broad access to GPUs for AI workloads to thrive. The conversation also touches on evolving architectures beyond transformers, the interplay between AI and data sovereignty, consolidation pressures from pilot chaos to standardized platforms, and the societal triad of universities, startups, and sovereign infrastructure.&nbsp;<br /><br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at <a href="https://www.aiengineeringpodcast.com/bruin" target="_blank">aiengineeringpodcast.com/bruin</a>, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Stephen Watt about how to adapt your existing infrastructure investments to support your AI workloads and gain "AI Sovereignty"</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in machine learning?</li><li>Can you describe what you mean by the term "AI sovereignty"?</li><li>What are the motivating factors for investing in that as an organizational capability?</li><li>What do you see as the scale, sophistication, regulatory triggers that tip someone from buying off-the-shelf AI services and into operating their own AI stacks?</li><li>There has been substantial investment in MLOps toolchains and patterns over the past decade, along with corresponding evolution of LLMOps techniques. What do you see as the areas of overlap between those technology patterns and the "traditional" infrastructure capabilities that organizations have matured over the past ~20 years?</li><li>What are the aspects that are disjoint and contribute to operational pain for DevOps/platform teams?</li><li>How do AI/agentic workloads strain the ability of existing security and governance frameworks that teams are operating for existing cloud-native workloads?</li><li>What are the options for extending those frameworks and what are the requirements that force a new approach? (e.g. guardrails, LLM interpretability, etc.)</li><li>What are the elements of cloud-native architecture that have left us (as an industry) well situated to absorb the complexity of AI/agentic workloads?</li><li>How does the complexity shift as you go along the continuum of model training to finetuning to inference?</li><li>Beyond the ability to host and execute inference on a model are the various data stores and tool availability that make generative AI a competitive advantage. How much of that (e.g. agentic memory, vector stores, MCP/A2A tools, etc.) are actually net new vs. a new coat of paint on existing techniques?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen teams operationalizing AI workloads on their infrastructure?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on empowering organizations to achieve AI sovereignty?</li><li>When is operating your own AI infrastructure the wrong choice?</li><li>What are your predictions for the future evolution of operational substrates for AI workloads?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/wattsteve/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://www.redhat.com/en" target="_blank">RedHat</a></li><li><a href="https://en.wikipedia.org/wiki/Bayes_classifier" target="_blank">Bayesian Classifier</a></li><li><a href="https://hadoop.apache.org/" target="_blank">Hadoop</a></li><li><a href="https://hbase.apache.org/" target="_blank">HBase</a></li><li><a href="https://deepseek.com/" target="_blank">DeepSeek</a></li><li><a href="https://reflection.ai/" target="_blank">Reflection AI</a></li><li><a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/" target="_blank">Nvidia Blackwell</a></li><li><a href="https://vllm.ai/" target="_blank">vLLM</a></li><li><a href="https://research.ibm.com/blog/spyre-for-z" target="_blank">IBM Spyre</a></li><li><a href="https://docs.vllm.ai/en/stable/getting_started/installation/cpu/" target="_blank">vLLM CPU</a></li><li><a href="https://www.ibm.com/history/watson-jeopardy" target="_blank">IBM Watson on Jeopardy</a></li><li><a href="https://en.wikipedia.org/wiki/Neuromorphic_computing" target="_blank">Neuromorphic Computing</a></li><li><a href="https://kubernetes.io/" target="_blank">Kubernetes</a></li><li><a href="https://pytorch.org/foundation/" target="_blank">PyTorch Foundation</a></li><li><a href="https://en.wikipedia.org/wiki/MLOps" target="_blank">MLOps</a></li><li><a href="https://www.ibm.com/think/topics/llmops" target="_blank">LLMOps</a></li><li><a href="https://github.com/vllm-project/semantic-router" target="_blank">Semantic Router</a></li><li><a href="https://en.wikipedia.org/wiki/BERT_(language_model)" target="_blank">BERT</a></li><li><a href="https://agntcy.org/" target="_blank">AGNTCY</a></li><li><a href="https://www.openpolicyagent.org/" target="_blank">OPA == Open Policy Agent</a></li><li><a href="https://www.cedarpolicy.com/en" target="_blank">CEDAR</a></li><li><a href="https://en.wikipedia.org/wiki/Web_Services_Description_Language" target="_blank">WSDL == Web Services Description Language</a></li><li><a href="https://www.ibm.com/docs/en/rsas/7.5.0?topic=standards-universal-description-discovery-integration-uddi" target="_blank">UDDI</a></li><li><a href="https://spark.apache.org/" target="_blank">Spark</a></li><li><a href="https://github.com/ggml-org/llama.cpp" target="_blank">llama.cpp</a></li><li><a href="https://ollama.com/" target="_blank">Ollama</a></li><li><a href="https://arpa-h.gov/" target="_blank">ARPA-H</a></li></ul><br />The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>
play-circle icon
61 MIN
From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit
FEB 15, 2026
From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit
Summary&nbsp;<br />In this episode of the AI Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational foundations required to run LLM-powered applications in production. He highlights common early blind spots teams face, including opaque model behavior, runaway token costs, and brittle prompt management, emphasizing that strong observability and cost tracking must be established before an MVP ships. Aman explains how OpenLit leverages OpenTelemetry for vendor-neutral tracing across models, tools, and data stores, and introduces features such as prompt and secret management with versioning, evaluation workflows (including LLM-as-a-judge), and fleet management for OpenTelemetry collectors. The conversation covers experimentation patterns, strategies to avoid vendor lock-in, and how detailed stepwise traces reshape system design and debugging. Aman also shares recent advancements like a Kubernetes operator for zero-code instrumentation, multi-database configurations for environment isolation, and integrations with platforms such as Grafana and Dash0. They conclude by discussing lessons learned from building in the open, prioritizing reliability, developer experience, and data security, and preview future work on context management and closing the loop from experimentation to prompt/dataset improvements.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at <a href="https://www.aiengineeringpodcast.com/bruin" target="_blank">aiengineeringpodcast.com/bruin</a>, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Aman Agarwal about the operational investments that are necessary to ensure you get the most out of your AI models</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in the area of AI/data management?</li><li>Can you start by giving your assessment of the main blind spots that are common in the existing AI application patterns?</li><li>As teams adopt agentic architectures, how common is it to fall prey to those same blind spots?</li><li>There are numerous tools/services available now focused on various elements of "LLMOps". What are the major components necessary for a minimum viable operational platform for LLMs?</li><li>There are several areas of overlap, as well as disjoint features, in the ecosystem of tools (both open source and commercial). How do you advise teams to navigate the selection process? (point solutions vs. integrated tools, and handling frameworks with only partial overlap)</li><li>Can you describe what OpenLit is and the story behind it?</li><li>How would you characterize the feature set and focus of OpenLit compared to what you view as the "major players"?</li><li>Once you have invested in a platform like OpenLit, how does that change the overall development workflow for the lifecycle of AI/agentic applications?</li><li>What are the most complex/challenging elements of change management for LLM-powered systems? (e.g. prompt tuning, model changes, data changes, etc.)</li><li>How can the information collected in OpenLit be used to develop a self-improvement flywheel for agentic systems?</li><li>Can you describe the architecture and implementation of OpenLit?</li><li>How have the scope and goals of the project changed since you started working on it?</li><li>Given the foundational aspects of the project that you have built, what are some of the adjacent capabilities that OpenLit is situated to expand into?</li><li>What are the sharp edges and blind spots that are still challenging even when you have OpenLit or similar integrated?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen OpenLit used?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenLit?</li><li>When is OpenLit the wrong choice?</li><li>What do you have planned for the future of OpenLit?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/amanagarwal041/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data/AI management today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://openlit.io/" target="_blank">OpenLit</a></li><li><a href="https://docs.openlit.io/latest/openlit/observability/fleet-hub" target="_blank">Fleet Hub</a></li><li><a href="https://opentelemetry.io/" target="_blank">OpenTelemetry</a></li><li><a href="https://langfuse.com/" target="_blank">LangFuse</a></li><li><a href="https://www.langchain.com/langsmith/evaluation" target="_blank">LangSmith</a></li><li><a href="https://www.tensorzero.com/" target="_blank">TensorZero</a></li><li><a href="https://www.aiengineeringpodcast.com/tensorzero-llm-gateway-prompt-optimization-episode-45" target="_blank">AI Engineering Podcast Episode</a></li><li><a href="https://traceloop.com/" target="_blank">Traceloop</a></li><li><a href="https://www.helicone.ai/" target="_blank">Helicone</a></li><li><a href="https://clickhouse.com/" target="_blank">Clickhouse</a></li></ul><br />The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>
play-circle icon
50 MIN
Taming Voice Complexity with Dynamic Ensembles at Modulate
FEB 8, 2026
Taming Voice Complexity with Dynamic Ensembles at Modulate
Summary&nbsp;<br />In this episode of the AI Engineering Podcast, Carter Huffman, co-founder and CTO of Modulate, discusses the engineering behind low-latency, high-accuracy Voice AI. He explains why voice is a uniquely challenging modality due to its rich non-textual signals like tone, emotion, and context, and how simple speech-to-text-to-speech pipelines can't capture the necessary nuance. Carter introduces Modulate's Ensemble Listening Model (ELM) architecture, which uses dynamic routing and cost-based optimization to achieve scalability and precision in various audio environments. He covera topics such as reliability under distributed systems constraints, watchdogging with periodic model checks, structured long-horizon memory for conversations, and the trade-offs that make ensemble approaches compelling for repeated tasks at scale. Carter also shares insights on how ELMs generalize beyond voice, draws parallels to database query planners and mixture-of-experts, and discusses strategies for observability and evaluation in complex processing pipelines.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at <a href="https://www.aiengineeringpodcast.com/bruin" target="_blank">aiengineeringpodcast.com/bruin</a>, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Carter Huffman about his work building an ensemble approach to low latency voice AI</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in machine learning?</li><li>Can you describe the "Ensemble Listening" approach and the story behind why Modulate moved away from monolithic architectures?</li><li>When designing a real-time voice system, how do you handle the routing logic between specialized models without blowing your latency budget?</li><li>What does the "gatekeeper" or routing layer actually look like in code?</li><li>You’ve mentioned "evals that don’t lie." How do you build a validation pipeline for noisy, adversarial voice data that catches regressions that a simple word-error-rate (WER) might miss?</li><li>In an ensemble of models, a failure in one specialized node might not crash the system, but it can degrade the output quality. How do you monitor for these "silent failures" in real-time without introducing massive overhead?</li><li>For many teams, the default is to call an API for a frontier model. At what point in the scaling or latency curve does it become technically (or economically) necessary to swap a general LLM for a suite of specialized, smaller models?</li><li>How do you track the real-world costs associated with the technical and human overhead of this more complex system?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen orchestrated ensembles used in live conversation environments?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while managing the lifecycle of multiple specialized models simultaneously?</li><li>When is an ensemble approach the wrong choice? (e.g., At what level of complexity or throughput is the overhead of orchestration more trouble than it’s worth?)</li><li>What do you have planned for the future of Ensemble Listening Models?</li><li>Are we looking at self-optimizing routers, or perhaps moving these ensembles closer to the edge?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/carter-huffman-a9aba05b/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://www.modulate.ai/" target="_blank">Modulate</a></li><li><a href="https://www.jpl.nasa.gov/" target="_blank">Nasa Jet Propulsion Laboratory</a></li><li><a href="https://openai.com/index/whisper/" target="_blank">OpenAI Whisper</a></li><li><a href="https://en.wikipedia.org/wiki/Multi-armed_bandit" target="_blank">Multi-Armed Bandit</a></li><li><a href="https://en.wikipedia.org/wiki/Query_optimization#Cost_estimation" target="_blank">Cost-Based Optimizer</a></li><li><a href="https://en.wikipedia.org/wiki/GPT-5" target="_blank">GPT 5</a></li><li><a href="https://en.wikipedia.org/wiki/Attention_(machine_learning)" target="_blank">LLM Attention</a></li><li><a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning)" target="_blank">Transformer Architecture</a></li><li><a href="https://en.wikipedia.org/wiki/Mixture_of_experts" target="_blank">Mixture of Experts</a></li><li><a href="https://www.geeksforgeeks.org/machine-learning/dilated-convolution/" target="_blank">Dilated Convolution</a></li><li><a href="https://en.wikipedia.org/wiki/WaveNet" target="_blank">Wavenet</a></li></ul><br />The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>
play-circle icon
59 MIN
GPU Clouds, Aggregators, and the New Economics of AI Compute
JAN 27, 2026
GPU Clouds, Aggregators, and the New Economics of AI Compute
Summary&nbsp;<br />In this episode I sit down with Hugo Shi, co-founder and CTO of Saturn Cloud, to map the strategic realities of sourcing and operating GPUs across clouds. Hugo breaks down today’s provider landscape—from hyperscalers to full-service GPU clouds, bare metal/concierge providers, and emerging GPU aggregators—and how to choose among them based on security posture, managed services, and cost. We explore practical layers of capability (compute, orchestration with Kubernetes/Slurm, storage, networking, and managed services), the trade-offs of portability on “Kubernetes-native” stacks, and the persistent challenge of data gravity. We also discuss current supply dynamics, the growing availability of on-demand capacity as newer chips roll out, and how AMD’s ecosystem is maturing as real competition to NVIDIA. Hugo shares patterns for separating training and inference across providers, why traditional ML is far from dead, and how usage varies wildly across domains like biotech. We close with predictions on consolidation, full‑stack experiences from GPU clouds, financial-style GPU marketplaces, and much-needed advances in reliability for long-running GPU jobs.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at <a href="https://www.aiengineeringpodcast.com/bruin" target="_blank">aiengineeringpodcast.com/bruin</a>, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Hugo Shi about the strategic realities of sourcing GPUs in the cloud for your training and inference workloads</li></ul><br />Interview<br /><ul><li>Introduction</li><li>How did you get involved in machine learning?</li><li>Can you start by giving a summary of your understanding of the current market for "cloud" GPUs?</li><li>How would you characterize the customer base for the "neocloud" providers?</li><li>How is the access to the GPU compute typically mediated?</li><li>The predominant cloud providers (AWS, GCP, Azure) have gained market share by offering numerous differentiated services and ease-of-use features. What are the types of services that you might expect from a GPU provider?</li><li>The "cloud-native" ecosystem was developed with the promise of enabling workload portability, but the realities are often more complicated. What are some of the difficulties that teams encounter when trying to adapt their workloads to these different cloud providers?</li><li>What are the toolchains/frameworks/architectures that you are seeing as most effective at adapting to these different compute environments?</li><li>One of the major themes in the 2010s that worked against multi-cloud strategies was the idea of "data gravity". What are the strategies that teams are using to mitigate that tax on their workloads?</li><li>That is a more substantial impact when dealing with training workloads than for inference compute. How are you seeing teams think about the balance of cost savings vs. operational complexity for those different workloads?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen teams capitalize on GPU capacity across these new providers?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on enabling teams to execute workloads on these neoclouds?</li><li>When is a "neocloud" or "GPU cloud" provider the wrong choice?</li><li>What are your predictions for the future evolutions of GPU-as-a-service as hardware availability improves and model architectures become more efficient?</li></ul><br />Contact Info<br /><ul><li><a href="https://www.linkedin.com/in/hugo-shi/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br /><ul><li>From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?</li></ul><br />Closing Announcements<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul><br />Links<br /><ul><li><a href="https://saturncloud.io/" target="_blank">Saturn Cloud</a></li><li><a href="https://pandas.pydata.org/" target="_blank">Pandas</a></li><li><a href="https://numpy.org/" target="_blank">NumPy</a></li><li><a href="https://www.mathworks.com/products/matlab.html" target="_blank">MatLab</a></li><li><a href="https://aws.amazon.com/" target="_blank">AWS</a></li><li><a href="https://cloud.google.com/?hl=en" target="_blank">GCP</a></li><li><a href="https://azure.microsoft.com/en-us" target="_blank">Azure</a></li><li><a href="https://www.oracle.com/cloud/" target="_blank">Oracle Cloud</a></li><li><a href="https://www.runpod.io/" target="_blank">RunPod</a></li><li><a href="https://www.fluidstack.io/" target="_blank">FluidStack</a></li><li><a href="https://sfcompute.com/" target="_blank">SFCompute</a></li><li><a href="https://www.kubeflow.org/" target="_blank">KubeFlow</a></li><li><a href="https://lightning.ai/" target="_blank">Lightning AI</a></li><li><a href="https://dstack.ai/" target="_blank">DStack</a></li><li><a href="https://metaflow.org/" target="_blank">Metaflow</a></li><li><a href="https://flyte.org/" target="_blank">Flyte</a></li><li><a href="https://lexsi.ai/" target="_blank">Arya AI</a></li><li><a href="https://dagster.io/" target="_blank">Dagster</a></li><li><a href="https://www.coreweave.com/" target="_blank">Coreweave</a></li><li><a href="https://www.vultr.com/" target="_blank">Vultr</a></li><li><a href="https://nebius.com/" target="_blank">Nebius</a></li><li><a href="https://vast.ai/" target="_blank">Vast.ai</a></li><li><a href="https://www.weka.io/" target="_blank">Weka</a></li><li><a href="https://www.vastdata.com/" target="_blank">Vast Data</a></li><li><a href="https://slurm.schedmd.com/documentation.html" target="_blank">Slurm</a></li><li><a href="https://www.cncf.io/" target="_blank">CNCF == Cloud-Native Computing Foundation</a></li><li><a href="https://kubernetes.io/" target="_blank">Kubernetes</a></li><li><a href="https://developer.hashicorp.com/terraform" target="_blank">Terraform</a></li><li><a href="https://aws.amazon.com/ecs/" target="_blank">ECS</a></li><li><a href="https://helm.sh/" target="_blank">Helm Chart</a></li><li><a href="https://aws.amazon.com/what-is/block-storage/" target="_blank">Block Storage</a></li><li><a href="https://aws.amazon.com/what-is/object-storage/" target="_blank">Object Storage</a></li><li><a href="https://www.redhat.com/en/topics/cloud-native-apps/what-is-a-container-registry" target="_blank">Container Registry</a></li><li><a href="https://www.crusoe.ai/" target="_blank">Crusoe</a></li><li><a href="https://www.alluxio.io/" target="_blank">Alluxio</a></li><li><a href="https://en.wikipedia.org/wiki/Data_virtualization" target="_blank">Data Virtualization</a></li><li><a href="https://www.nvidia.com/en-us/data-center/gb300-nvl72/" target="_blank">GB300</a></li><li><a href="https://www.nvidia.com/en-us/data-center/h100/" target="_blank">H100</a></li><li><a href="https://aws.amazon.com/ec2/spot/" target="_blank">Spot Instance</a></li><li><a href="https://aws.amazon.com/ai/machine-learning/trainium/" target="_blank">AWS Trainium</a></li><li><a href="https://cloud.google.com/tpu?hl=en" target="_blank">Google TPU (Tensor Processing Unit)</a></li><li><a href="https://www.amd.com/en.html" target="_blank">AMD</a></li><li><a href="https://www.amd.com/en/products/software/rocm.html" target="_blank">ROCM</a></li><li><a href="https://pytorch.org/" target="_blank">PyTorch</a></li><li><a href="https://cloud.google.com/vertex-ai?hl=en" target="_blank">Google Vertex AI</a></li><li><a href="https://aws.amazon.com/bedrock/" target="_blank">AWS Bedrock</a></li><li><a href="https://github.com/NVIDIA/cuda-python" target="_blank">CUDA Python</a></li><li><a href="https://www.modular.com/mojo" target="_blank">Mojo</a></li><li><a href="https://xgboost.readthedocs.io/en/stable/" target="_blank">XGBoost</a></li><li><a href="https://en.wikipedia.org/wiki/Random_forest" target="_blank">Random Forest</a></li><li><a href="https://ludwig.ai/latest/" target="_blank">Ludwig</a> - Uber Deep Learning AutoML</li><li><a href="https://www.paperspace.com/" target="_blank">Paperspace</a></li><li><a href="https://www.voltagepark.com/" target="_blank">Voltage Park</a></li><li><a href="https://wandb.ai/site/" target="_blank">Weights &amp; Biases</a></li></ul><br />The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>
play-circle icon
46 MIN
The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
JAN 20, 2026
The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
Summary&nbsp;<br />In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at <a href="https://www.aiengineeringpodcast.com/bruin" target="_blank">aiengineeringpodcast.com/bruin</a>, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Niklas Gustavsson about how Spotify is scaling AI usage in engineering and product work</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in machine learning?</li><li>Can you start by giving an overview of your engineering practices independent of AI?</li><li>What was your process for introducing AI into the developmer experience? (e.g. pioneers doing early work (bottom-up) vs. top-down)</li><li>There are countless agentic coding tools on the market now. How do you balance organizational standardization vs. exploration?</li><li>Beyond the toolchain, what are your methods for sharing best practices and upskilling engineers on use of agentic toolchains for software/product engineering?</li><li>Spotify has been operationalizing ML/AI features since before the introduction of LLMs and transformer models. How has that history helped inform your adoption of generative AI in your overall engineering organization?</li><li>As you use these generative and agentic AI utilities in your day-to-day, how have those lessons learned fed back into your AI-powered product features?</li><li>What are some of the platform capabilities/developer experience investments that you have made to improve the overall effectiveness of agentic coding in your engineering organization?</li><li>What are some examples of guardrails/speedbumps that you have introduced to avoid injecting unreliable or untested work into production?</li><li>As the (time/money/cognitive) cost of writing code drops that increases the burden on reviewing that code. What are some of the ways that you are working to scale that side of the equation?</li><li>What are some of the ways that agentic coding/CLI utilities have bled into other areas of engineering/opertions/product development beyond just writing code?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen your team applying AI/agentic engineering practices?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing and scaling agentic engineering patterns in your teams?</li><li>When is agentic code generation the wrong choice?</li><li>What do you have planned for the future of AI and agentic coding patterns and practices in your organization?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/protocol7/?originalSubdomain=se" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://spotify.com/" target="_blank">Spotify</a></li><li><a href="https://en.wikipedia.org/wiki/Developer_experience" target="_blank">Developer Experience</a></li><li><a href="https://en.wikipedia.org/wiki/Large_language_model" target="_blank">LLM == Large Language Model</a></li><li><a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning)" target="_blank">Transformers</a></li><li><a href="https://backstage.io/" target="_blank">BackStage</a></li><li><a href="https://github.com/features/copilot" target="_blank">GitHub Copilot</a></li><li><a href="https://cursor.com/" target="_blank">Cursor</a></li><li><a href="https://cursor.com/" target="_blank">Claude Skills</a></li><li><a href="https://en.wikipedia.org/wiki/Monorepo" target="_blank">Monorepo</a></li><li><a href="https://modelcontextprotocol.io/docs/getting-started/intro" target="_blank">MCP == Model Context Protocol</a></li><li><a href="https://code.claude.com/docs/en/overview" target="_blank">Claude Code</a></li><li><a href="https://en.wikipedia.org/wiki/Product_manager" target="_blank">Product Manager</a></li><li><a href="https://en.wikipedia.org/wiki/DevOps_Research_and_Assessment" target="_blank">DORA Metrics</a></li><li><a href="https://typing.python.org/en/latest/spec/annotations.html" target="_blank">Type Annotations</a></li><li><a href="https://cloud.google.com/bigquery" target="_blank">BigQuery</a></li><li><a href="https://en.wikipedia.org/wiki/Product_requirements_document" target="_blank">PRD == Product Requirements Document</a></li><li><a href="https://www.ibm.com/think/topics/ai-agent-evaluation" target="_blank">AI Evals</a></li><li><a href="https://www.evidentlyai.com/llm-guide/llm-as-a-judge" target="_blank">LLM-as-a-Judge</a></li><li><a href="https://www.ibm.com/think/topics/ai-agent-memory" target="_blank">Agentic Memory</a></li></ul><br />The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>
play-circle icon
56 MIN