Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast

Latent Space: The AI Engineer Podcast

swyx + Alessio

Overview
Episodes

Details

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space

Recent Episodes

AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner
DEC 12, 2025
AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner
Glean started as a Kleiner Perkins incubation and is now a $7B, $200m ARR Enterprise AI leader. Now KP has tapped its own podcaster to lead it’s next big swing. From building go-to-market the hard way in startups (and scaling Palo Alto Networks’ public cloud business) to joining Kleiner Perkins to help technical founders turn product edge into repeatable revenue, Joubin Mirzadegan has spent the last decade obsessing over one thing: distribution and how ideas actually spread, sell, and compound. That obsession took him from launching the CRO-only podcast Grit (https://www.youtube.com/playlist?list=PLRiWZFltuYPF8A6UGm74K2q29UwU-Kk9k) as a hiring wedge, to working alongside breakout companies like Glean and Windsurf, to now incubating Roadrunner which is an AI-native rethink of CPQ and quoting workflows as pricing models collapse from “seats” into consumption, bundles, renewals, and SKU sprawl.We sat down with Joubin to dig into the real mechanics of making conversations feel human (rolling early, never sending questions, temperature + lighting hacks), what Windsurf got right about “Google-class product and Salesforce-class distribution,” how to hire early sales leaders without getting fooled by shiny logos, why CPQ is quietly breaking the back of modern revenue teams, and his thesis for his new company and KP incubation Roadrunner (https://www.roadrunner.ai/): rebuild the data model from the ground up, co-develop with the hairiest design partners, and eventually use LLMs to recommend deal structures the way the best reps do without the Slack-channel chaos of deal desk.We discuss:How to make guests instantly comfortable: rolling early, no “are you ready?”, temperature, lighting, and room dynamicsWhy Joubin refuses to send questions in advance (and when you might have to anyway)The origin of the CRO-only podcast: using media as a hiring wedge and relationship engineThe “commit to 100 episodes” mindset: why most shows die before they find their voiceFounder vs exec interviews: why CEOs can speak more freely (and what it unlocks in conversation)What Glean taught him about enterprise AI: permissions, trust, and overcoming “category is dead” skepticismDesign partners as the real unlock: why early believers matter and how co-development actually worksWindsurf’s breakout: what it means to be serious about “Google-class product + Salesforce-class distribution”Why technical founders struggle with GTM and how KP built a team around sales, customer access, and demand genHiring early sales leaders: anti-patterns (logos), what to screen for (motivation), and why stage-fit is everythingThe CPQ problem & Roadrunner’s thesis: rebuilding CPQ/quoting from the data model up for modern complexityHow “rules + SKUs + approvals” create a brittle graph and what it takes to model it without tipping overThe two-year window: incumbents rebuilding slowly vs startups out-sprinting with AI-native architectureWhere AI actually helps: quote generation, policy enforcement, approval routing, and deal recommendation loops—JoubinX: https://x.com/JoubinmirLinkedIn: https://www.linkedin.com/in/joubin-mirzadegan-66186854/Where to find Latent SpaceX: https://x.com/latentspacepodSubstack: https://www.latent.space/Chapters00:00:00 Introduction and the Zuck Interview Experience00:03:26 The Genesis of the Grit Podcast: Hiring CROs Through Content00:13:20 Podcast Philosophy: Creating Authentic Conversations00:15:44 Working with Arvind at Glean: The Enterprise Search Breakthrough00:26:20 Windsurf's Sales Machine: Google-Class Product Meets Salesforce-Class Distribution00:30:28 Hiring Sales Leaders: Anti-Patterns and First Principles00:39:02 The CPQ Problem: Why Salesforce and Legacy Tools Are Breaking00:43:40 Introducing Roadrunner: Solving Enterprise Pricing with AI00:49:19 Building Roadrunner: Team, Design Partners, and Data Model Challenges00:59:35 High Performance Philosophy: Working Out Every Day and Reducing Friction01:06:28 Defining Grit: Passion Plus Perseverance
play-circle icon
-1 MIN
The Future of Email: Superhuman CTO on Your Inbox As the Real AI Agent (Not ChatGPT) — Loïc Houssier
DEC 11, 2025
The Future of Email: Superhuman CTO on Your Inbox As the Real AI Agent (Not ChatGPT) — Loïc Houssier
From applied cryptography and offensive security in France’s defense industry to optimizing nuclear submarine workflows, then selling his e-signature startup to Docusign (https://www.docusign.com/company/news-center/opentrust-joins-docusign-global-trust-network and now running AI as CTO of Superhuman Mail (Superhuman, recently acquired by Grammarly https://techcrunch.com/2025/07/01/grammarly-acquires-ai-email-client-superhuman/), Loïc Houssier has lived the full arc from deep infra and compliance hell to obsessing over 100ms product experiences and AI-native email. We sat down with Loïc to dig into how you actually put AI into an inbox without adding latency, why Superhuman leans so hard into agentic search and “Ask AI” over your entire email history, how they design tools vs. agents and fight agent laziness, what box-priced inference and local-first caching mean for cost and reliability, and his bet that your inbox will power your future AI EA while AI massively widens the gap between engineers with real fundamentals and those faking it.We discuss:Loïc’s path from applied cryptography and offensive security in France’s defense industry to submarines, e-signatures, Docusign, and now Superhuman MailWhat 3,000+ engineers actually do at a “simple” product like Docusign: regional compliance, on-prem appliances, and why global scale explodes complexityHow Superhuman thinks about AI in email: auto-labels, smart summaries, follow-up nudges, “Ask AI” search, and the rule that AI must never add latency or frictionSuperhuman’s agentic framework: tools vs. agents, fighting “agent laziness,” deep semantic search over huge inboxes, and pagination strategies to find the real needle in the haystackHow they evaluate OpenAI, Anthropic, Gemini, and open models: canonical queries, end-to-end evals, date reasoning, and Rahul’s infamous “what wood was my table?” testInfra and cost philosophy: local-first caching, vector search backends, Baseten “box” pricing vs. per-token pricing, and thinking in price-per-trillion-tokens instead of price-per-millionThe vision of Superhuman as your AI EA: auto-drafting replies in your voice, scheduling on your behalf, and using your inbox as the ultimate private data sourceHow the Grammarly + Coda + Superhuman stack could power truly context-aware assistance across email, docs, calendars, contracts, and moreInside Superhuman’s AI-dev culture: free-for-all tool adoption, tracking AI usage on PRs, and going from ~4 to ~6 PRs per engineer per weekWhy Loïc believes everyone should still learn to code, and how AI will amplify great engineers with strong fundamentals while exposing shallow ones even faster—Loïc HoussierLinkedIn: https://www.linkedin.com/in/houssier/Where to find Latent SpaceX: https://x.com/latentspacepodSubstack: https://www.latent.space/Chapters00:00:00 Introduction and Loïc's Journey from Nuclear Submarines to Superhuman00:06:40 Docusign Acquisition and the Enterprise Email Stack00:10:26 Superhuman's AI Vision: Your Inbox as the Real AI Agent00:13:20 Ask AI: Agentic Search and the Quality Problem00:18:20 Infrastructure Choices: Model Selection, Base10, and Cost Management00:27:30 Local-First Architecture and the Database Stack00:30:50 Evals, Quality, and the Rahul Wood Table Test00:42:30 The Future EA: Auto-Drafting and Proactive Assistance00:46:40 Grammarly Acquisition and the Contextual Advantage00:38:40 Voice, Video, and the End of Writing00:51:40 Knowledge Graphs: The Hard Problem Nobody Has Solved00:56:40 Competing with OpenAI and the Browser Question01:02:30 AI Coding Tools: From 4 to 6 PRs Per Week01:08:00 Engineering Culture, Hiring, and the Future of Software Development
play-circle icon
-1 MIN
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
DEC 6, 2025
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data) and raising a $134M seed from Khosla (https://techcrunch.com/2025/10/16/general-intuition-lands-134m-seed-to-teach-agents-spatial-reasoning-using-video-game-clips/) to spin out General Intuition, Pim is betting that world models trained on peak human gameplay are the next frontier after LLMs.We sat down with Pim to dig into why game highlights are “episodic memory for simulation” (and how Medal’s privacy-first action labels became a world-model goldmine https://medal.tv/blog/posts/enabling-state-of-the-art-security-and-protections-on-medals-new-apm-and-controller-overlay-features), what it takes to build fully vision-based agents that just see frames and output actions in real time, how General Intuition transfers from games to real-world video and then into robotics, why world models and LLMs are complementary rather than rivals, what founders with proprietary datasets should know before selling or licensing to labs, and his bet that spatial-temporal foundation models will power 80% of future atoms-to-atoms interactions in both simulation and the real world.We discuss:How Medal’s 3.8B action-labeled highlight clips became a privacy-preserving goldmine for world modelsBuilding fully vision-based agents that only see frames and output actions yet play like (and sometimes better than) humansTransferring from arcade-style games to realistic games to real-world video using the same perception–action recipeWhy world models need actions, memory, and partial observability (smoke, occlusion, camera shake) vs. “just” pretty video generationDistilling giant policies into tiny real-time models that still navigate, hide, and peek corners like real playersPim’s path from RuneScape private servers, Tourette’s, and reverse engineering to leading a frontier world-model labHow data-rich founders should think about valuing their datasets, negotiating with big labs, and deciding when to go independentGI’s first customers: replacing brittle behavior trees in games, engines, and controller-based robots with a “frames in, actions out” APIUsing Medal clips as “episodic memory of simulation” to move from imitation learning to RL via world models and negative eventsThe 2030 vision: spatial–temporal foundation models that power the majority of atoms-to-atoms interactions in simulation and the real world—PimX: https://x.com/PimDeWitteLinkedIn: https://www.linkedin.com/in/pimdw/Where to find Latent SpaceX: https://x.com/latentspacepodSubstack: https://www.latent.space/Chapters00:00:00 Introduction and Medal's Gaming Data Advantage00:02:08 Exclusive Demo: Vision-Based Gaming Agents00:06:17 Action Prediction and Real-World Video Transfer00:08:41 World Models: Interactive Video Generation00:13:42 From Runescape to AI: Pim's Founder Journey00:16:45 The Research Foundations: Diamond, Genie, and SEMA00:33:03 Vinod Khosla's Largest Seed Bet Since OpenAI00:35:04 Data Moats and Why GI Stayed Independent00:38:42 Self-Teaching AI Fundamentals: The Francois Fleuret Course00:40:28 Defining World Models vs Video Generation00:41:52 Why Simulation Complexity Favors World Models00:43:30 World Labs, Yann LeCun, and the Spatial Intelligence Race00:50:08 Business Model: APIs, Agents, and Game Developer Partnerships00:58:57 From Imitation Learning to RL: Making Clips Playable01:00:15 Open Research, Academic Partnerships, and Hiring01:02:09 2030 Vision: 80 Percent of Atoms-to-Atoms AI Interactions
play-circle icon
-1 MIN
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
NOV 25, 2025
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
Fei-Fei Li and Justin Johnson are cofounders of World Labs, who have recently launched Marble (https://marble.worldlabs.ai/), a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marble lets creators generate persistent 3D worlds, precisely control cameras, and interactively edit scenes, making it a powerful tool for games, film, VR, robotics simulation, and more. In this episode, Fei-Fei and Justin share how their journey from ImageNet and Stanford research led to World Labs, why spatial intelligence is the next frontier after LLMs, and how world models could change how machines see, understand, and build in 3D.We discuss:The massive compute scaling from AlexNet to today and why world models and spatial data are the most compelling way to “soak up” modern GPU clusters compared to language alone.What Marble actually is: a generative model of 3D worlds that turns text and images into editable scenes using Gaussian splats, supports precise camera control and recording, and runs interactively on phones, laptops, and VR headsets.Fei-fei’s essay (https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence) on spatial intelligence as a distinct form of intelligence from language: from picking up a mug to inferring the 3D structure of DNA, and why language is a lossy, low-bandwidth channel for describing the rich 3D/4D world we live in.Whether current models “understand” physics or just fit patterns: the gap between predicting orbits and discovering F=ma, and how attaching physical properties to splats and distilling physics engines into neural networks could lead to genuine causal reasoning.The changing role of academia in AI, why Fei-Fei worries more about under-resourced universities than “open vs closed,” and how initiatives like national AI compute clouds and open benchmarks can rebalance the ecosystem.Why transformers are fundamentally set models, not sequence models, and how that perspective opens up new architectures for world models, especially as hardware shifts from single GPUs to massive distributed clusters.Real use cases for Marble today: previsualization and VFX, game environments, virtual production, interior and architectural design (including kitchen remodels), and generating synthetic simulation worlds for training embodied agents and robots.How spatial intelligence and language intelligence will work together in multimodal systems, and why the goal isn’t to throw away LLMs but to complement them with rich, embodied models of the world.Fei-Fei and Justin’s long-term vision for spatial intelligence: from creative tools for artists and game devs to broader applications in science, medicine, and real-world decision-making.—Fei-Fei LiX: https://x.com/drfeifeiLinkedIn: https://www.linkedin.com/in/fei-fei-li-4541247Justin JohnsonX: https://x.com/jcjohnssLinkedIn: https://www.linkedin.com/in/justin-johnson-41b43664Where to find Latent SpaceX: https://x.com/latentspacepodSubstack: https://www.latent.space/Chapters00:00:00 Introduction and the Fei-Fei Li & Justin Johnson Partnership00:02:00 From ImageNet to World Models: The Evolution of Computer Vision00:12:42 Dense Captioning and Early Vision-Language Work00:19:57 Spatial Intelligence: Beyond Language Models00:28:46 Introducing Marble: World Labs' First Spatial Intelligence Model00:33:21 Gaussian Splats and the Technical Architecture of Marble00:22:10 Physics, Dynamics, and the Future of World Models00:41:09 Multimodality and the Interplay of Language and Space00:37:37 Use Cases: From Creative Industries to Robotics and Embodied AI00:56:58 Hiring, Research Directions, and the Future of World Labs
play-circle icon
-1 MIN
The PhD Student & Professor Reinventing AI: Fei-Fei Li & Justin Johnson on Spatial Intelligence
NOV 25, 2025
The PhD Student & Professor Reinventing AI: Fei-Fei Li & Justin Johnson on Spatial Intelligence
Fei-Fei Li is the co-founder and CEO of World Labs, where she and Justin Johnson are building Marble, a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marble lets creators generate persistent 3D worlds, precisely control cameras, and interactively edit scenes, making it a powerful tool for games, film, VR, robotics simulation, and more. In this episode, Fei-Fei and Justin share how their journey from ImageNet and Stanford research led to World Labs, why spatial intelligence is the next frontier after LLMs, and how world models could change how machines see, understand, and build in 3D.We discuss:The massive compute scaling from AlexNet to today and why world models and spatial data are the most compelling way to “soak up” modern GPU clusters compared to language alone.What Marble actually is: a generative model of 3D worlds that turns text and images into editable scenes using Gaussian splats, supports precise camera control and recording, and runs interactively on phones, laptops, and VR headsets.The case for spatial intelligence as a distinct form of intelligence from language: from picking up a mug to inferring the 3D structure of DNA, and why language is a lossy, low-bandwidth channel for describing the rich 3D/4D world we live in.Whether current models “understand” physics or just fit patterns: the gap between predicting orbits and discovering F=ma, and how attaching physical properties to splats and distilling physics engines into neural networks could lead to genuine causal reasoning.The changing role of academia in AI, why Fei-Fei worries more about under-resourced universities than “open vs closed,” and how initiatives like national AI compute clouds and open benchmarks can rebalance the ecosystem.Why transformers are fundamentally set models, not sequence models, and how that perspective opens up new architectures for world models, especially as hardware shifts from single GPUs to massive distributed clusters.Real use cases for Marble today: previsualization and VFX, game environments, virtual production, interior and architectural design (including kitchen remodels), and generating synthetic simulation worlds for training embodied agents and robots.How spatial intelligence and language intelligence will work together in multimodal systems, and why the goal isn’t to throw away LLMs but to complement them with rich, embodied models of the world.Fei-Fei and Justin’s long-term vision for spatial intelligence: from creative tools for artists and game devs to broader applications in science, medicine, and real-world decision-making.—Where to find Fei-Fei LiX: https://x.com/drfeifeiLinkedIn: https://www.linkedin.com/in/fei-fei-li-4541247Where to find Justin JohnsonX: https://x.com/jcjohnssLinkedIn: https://www.linkedin.com/in/justin-johnson-41b43664Where to find Shawn WangX: https://x.com/swyxLinkedIn: https://www.linkedin.com/in/shawnswyxwang/Where to find Alessio FanelliX: https://x.com/FanaHOVALinkedIn: https://www.linkedin.com/in/fanahova/Where to find Latent SpaceX: https://x.com/latentspacepodSubstack: https://www.latent.space/Chapters00:00:00 Introduction and the Fei-Fei Li & Justin Johnson Partnership00:02:00 From ImageNet to World Models: The Evolution of Computer Vision00:12:33 Dense Captioning and Early Vision-Language Work00:19:39 Spatial Intelligence: Beyond Language Models00:28:49 Introducing Marble: World Labs' First Spatial Intelligence Model00:33:24 Gaussian Splats and the Technical Architecture of Marble00:35:50 Physics, Dynamics, and the Future of World Models00:44:00 Multimodality and the Interplay of Language and Space00:37:37 Use Cases: From Creative Industries to Robotics and Embodied AI00:57:03 Hiring, Research Directions, and the Future of World Labs
play-circle icon
-1 MIN