Data Engineering Podcast
Data Engineering Podcast

Data Engineering Podcast

Tobias Macey

Overview
Episodes

Details

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Recent Episodes

Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows
FEB 1, 2026
Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows
Summary&nbsp;<br />In this episode Tim Sehn, founder and CEO of DoltHub, talks about Dolt - the world’s first version‑controlled SQL database - and why Git‑style semantics belong at the heart of data systems and AI workflows. Tim explains how Dolt combines a MySQL/Postgres‑compatible interface with a novel storage engine built on a “Prollytree” to enable fast, row‑level branching, merging, and diffs of both schema and data. He digs into real production use cases: powering applications that expose version control to end users, reproducible ML feature stores, managing massive configuration for games, and enabling safe agentic writes via branch‑based review flows. He compares Dolt’s approach to LakeFS, Neon, and PlanetScale, and explores developer workflows unlocked by decentralized clones, full audit logs, and PR‑style data reviews.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at <a href="https://www.dataengineeringpodcast.com/retool" target="_blank">dataengineeringpodcast.com/retool</a> today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.</li><li>Your host is Tobias Macey and today I'm interviewing Tim Sehn about Dolt, a version controlled database engine and its applications for agentic workflows</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you describe what Dolt is and the story behind it?</li><li>What are the key use cases that you are focused on solving by adding version control to the database layer?</li><li>There are numerous projects related to different aspects of versioning in different data contexts (e.g. LakeFS, Datomic, etc.). What are the versioning semantics that you are focused on?</li><li>You position Dolt as "the database for AI". How does data versioning relate to AI use cases?</li><li>What types of AI systems are able to make best use of Dolt's versioning capabilities?</li><li>Can you describe how Dolt and Doltgres are implemented?</li><li>How have the design and scope of the project changed since you first started working on it?</li><li>What are some of the architecture and integration patterns around relational databases that change when you introduce version control semantics as a core primitive?</li><li>What are some anti-patterns that you have seen teams develop around Dolt's versioning functionality?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen Dolt used?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on Dolt?</li><li>When is Dolt the wrong choice?</li><li>What do you have planned for the future of Dolt?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/timothysehn" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://docs.dolthub.com/" target="_blank">Dolt</a></li><li><a href="https://www.dolthub.com/" target="_blank">DoltHub</a></li><li><a href="https://www.dolthub.com/discover" target="_blank">Stockmarket Data</a></li><li><a href="https://lakefs.io/" target="_blank">LakeFS</a></li><li><a href="https://docs.datomic.com/datomic-overview.html" target="_blank">Datomic</a></li><li><a href="https://git-scm.com/" target="_blank">Git</a></li><li><a href="https://www.mysql.com/" target="_blank">MySQL</a></li><li><a href="https://docs.dolthub.com/architecture/storage-engine/prolly-tree" target="_blank">Prolly Tree</a></li><li><a href="https://neon.com/" target="_blank">Neon</a></li><li><a href="https://www.djangoproject.com/" target="_blank">Django</a></li><li><a href="https://www.featurestore.org/" target="_blank">Feature Store</a></li><li><a href="https://modelcontextprotocol.io/docs/getting-started/intro" target="_blank">MCP Server</a></li><li><a href="https://projectnessie.org/" target="_blank">Nessie</a></li><li><a href="https://iceberg.apache.org/" target="_blank">Iceberg</a></li><li><a href="https://planetscale.com/" target="_blank">PlanetScale</a></li><li>O(NlogN) <a href="https://en.wikipedia.org/wiki/Big_O_notation" target="_blank">Big O Complexity</a></li><li><a href="https://en.wikipedia.org/wiki/B-tree" target="_blank">B-Tree</a></li><li><a href="https://git-scm.com/docs/git-merge" target="_blank">Git Merge</a></li><li><a href="https://git-scm.com/docs/git-rebase" target="_blank">Git Rebase</a></li><li><a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree" target="_blank">AST == Abstract Syntax Tree</a></li><li><a href="https://supabase.com/" target="_blank">Supabase</a></li><li><a href="https://supabase.com/" target="_blank">CockroachDB</a></li><li><a href="https://en.wikipedia.org/wiki/Document-oriented_database" target="_blank">Document Database</a></li><li><a href="https://www.mongodb.com/" target="_blank">MongoDB</a></li><li><a href="https://github.com/steveyegge/gastown" target="_blank">Gastown</a></li><li><a href="https://github.com/steveyegge/beads" target="_blank">Beads</a></li></ul><br />The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a>
play-circle icon
56 MIN
Logical First, Physical Second: A Pragmatic Path to Trusted Data
JAN 25, 2026
Logical First, Physical Second: A Pragmatic Path to Trusted Data
Summary&nbsp;<br />In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for ER/Studio, talks about data architecture and its importance in driving business meaning. He discusses how data architecture should start with business meaning, not just physical schemas, and explores the pitfalls of jumping straight to physical designs. Jamie shares his practical definition of data architecture centered on shared semantic models that anchor transactional, analytical, and event-driven systems. The conversation covers strategies for evolving an architecture in tandem with delivery, including defining core concepts, aligning teams through governance, and treating the model as a living product. He also examines how generative AI can both help and harm data architecture, accelerating first drafts but amplifying risk without a human-approved ontology. Jamie emphasizes the importance of doing the hard work upfront to make meaning explicit, keeping models simple and business-aligned, and using tools and patterns to reuse that meaning everywhere.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at <a href="https://www.dataengineeringpodcast.com/retool" target="_blank">dataengineeringpodcast.com/retool</a> today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.</li><li>You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at <a href="https://mongodb.com/Build" target="_blank">MongoDB.com/Build</a></li><li>Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to <a href="https://www.dataengineeringpodcast.com/bruin" target="_blank">dataengineeringpodcast.com/bruin</a> today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Jamie Knowles about the impact that a well-developed data architecture (or lack thereof) has on data engineering work</li></ul><br />Interview<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you start by giving your definition of "data architecture" and what it encompasses?</li><li>How does the nuance change depending on the type of system you are designing? (e.g. data warehouse vs. transactional application database vs. event-driven streaming service)</li><li>In application teams that are large enough there is typically a software architect, but that work often ends up happening organically through trial and error. Who is the responsible party for designing and enforcing a proper data architecture?</li><li>There have been several generational shifts in approach to data warehouse projects in particular. What are some of the anti-patterns that crop up when there is no-one forming a strong opinion on the design/architecture of the warehouse?</li><li>The current stage is largely defined by the ELT pattern. What are some of the ways that workflow can encourage shortcuts?</li><li>Often the need for a proper architecture isn't felt until an organic architecture has developed. What are some of the ways that teams can short-circuit that pain and iterate toward a more sustainable design?</li><li>The common theme in all of the data architecture conversations that I've had is the need for business involvement. There is also a strong push for the business to just want the engineers to deliver data. What are some of the ways that AI utilities can help to accelerate delivery while also capturing business context?</li><li>For teams that are already neck deep in a messy architecture, what are the strategies and tactics that they need to start working toward today to get to a better data architecture?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen teams approach the creation and implementation of their data architecture?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working in data architecture?</li><li>How do you see the introduction of AI at each stage of the data lifecycle changing the ways that teams think about their architectural needs?</li></ul><br />Contact Info<br /><ul><li><a href="https://www.linkedin.com/in/jamieknowlesltd?originalSubdomain=uk" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul><br />Closing Announcements<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul><br />Links<br /><ul><li><a href="https://www.idera.com/" target="_blank">Idera</a></li><li><a href="https://erstudio.com/" target="_blank">ER Studio</a></li><li><a href="https://en.wikipedia.org/wiki/Extract,_load,_transform" target="_blank">ELT</a></li><li><a href="https://en.wikipedia.org/wiki/Resource_Description_Framework" target="_blank">RDF == Resource Description Framework</a></li><li><a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping" target="_blank">ORM == Object-Relational Mapping</a></li></ul><br />The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a>
play-circle icon
40 MIN
Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability
JAN 18, 2026
Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability
Summary&nbsp;<br />In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying lakehouse architectures to observability workloads. Jacob discusses Observe’s decision to leverage cloud-native warehousing and open table formats for scale and cost efficiency. He digs into the core pain points teams face with fragmented tools, soaring costs, and data silos, and how a lakehouse approach - paired with streaming ingest via OpenTelemetry, Kafka-backed durability, curated/columnarized tables, and query orchestration - can deliver low-latency, interactive troubleshooting across logs, metrics, and traces at petabyte scale. He also explore the practicalities of loading and organizing telemetry by use case to reduce read amplification, the role of Iceberg (including v3’s JSON shredding) and Snowflake’s implementation, and why open table formats enable “your data in your lake” strategies.&nbsp;<br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at <a href="https://www.dataengineeringpodcast.com/retool" target="_blank">dataengineeringpodcast.com/retool</a> today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.</li><li>You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at <a href="https://mongodb.com/Build" target="_blank">MongoDB.com/Build</a></li><li>Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to <a href="https://www.dataengineeringpodcast.com/bruin" target="_blank">dataengineeringpodcast.com/bruin</a> today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.</li><li>Your host is Tobias Macey and today I'm interviewing Jacob Leverich about how data lakehouse technologies can be applied to observability for unlimited scale and orders of magnitude improvement on economics</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you start by giving an overview of what the major pain points have been in the observability space? (e.g. limited scale/retention, costs, integration fragmentation)</li><li>What are the elements of the ecosystem and tech stacks that led to that state of the world?</li><li>What are you building at Observe that circumvents those pain points?</li><li>What are the major ecosystem evolutions that make this a feasible architecture? (e.g. columnar storage, distributed compute, protocol consolidation)</li><li>Can you describe the architecture of the Observe platform?</li><li>How have the design of the platform evolved/changed direction since you first started working on it?</li><li>What was your process for determining which core technologies to build on top of?</li><li>What were the missing pieces that you had to engineer around to get a cohesive and performant platform?</li><li>The perennial problem with observability systems and data lakes is their tendency to succumb to entropy. What are the guardrails that you are relying on to help customers maintain a well-structured and usable repository of information?</li><li>Data lakehouses are excellent for flexibility and scaling to massive data volumes, but they're not known for being fast. What are the areas of investment in the ecosystem that is changing that narrative?</li><li>As organizations overcome the constraints of limited retention periods and anxiety over cost, what new use cases does that unlock for their observability data?</li><li>How do AI applications/agents change the requirements around observability data? (collection, scale, complexity, applications, etc.)</li><li>What are the most interesting, innovative, or unexpected ways that you have seen Observe/lakehouse technologies used for observability?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on Observe?</li><li>When is Observe/lakehouse technologies the wrong choice?</li><li>What do you have planned for the future of Observe?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/jacob-leverich/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://www.observeinc.com/" target="_blank">Observe Inc.</a></li><li><a href="https://www.ibm.com/think/topics/data-lakehouse" target="_blank">Lakehouse Architecture</a></li><li><a href="https://www.splunk.com/" target="_blank">Splunk</a></li><li><a href="https://en.wikipedia.org/wiki/Observability" target="_blank">Observability</a></li><li><a href="https://www.rsyslog.com/" target="_blank">RSyslog</a></li><li><a href="https://www.gluster.org/" target="_blank">GlusterFS</a></li><li><a href="https://research.google/pubs/dremel-interactive-analysis-of-web-scale-datasets-2/" target="_blank">Dremel</a></li><li><a href="https://drill.apache.org/" target="_blank">Drill</a></li><li><a href="https://cloud.google.com/bigquery" target="_blank">BigQuery</a></li><li><a href="https://dl.acm.org/doi/10.1145/2882903.2903741" target="_blank">Snowflake SIGMOD Paper</a></li><li><a href="https://prometheus.io/" target="_blank">Prometheus</a></li><li><a href="https://www.datadoghq.com/" target="_blank">Datadog</a></li><li><a href="https://newrelic.com/" target="_blank">NewRelic</a></li><li><a href="https://en.wikipedia.org/wiki/AppDynamics" target="_blank">AppDynamics</a></li><li><a href="https://www.dynatrace.com/" target="_blank">DynaTrace</a></li><li><a href="https://grafana.com/oss/loki/" target="_blank">Loki</a></li><li><a href="https://cortexmetrics.io/" target="_blank">Cortex</a></li><li><a href="https://grafana.com/oss/mimir/" target="_blank">Mimir</a></li><li><a href="https://grafana.com/oss/tempo/" target="_blank">Tempo</a></li><li><a href="https://www.observeinc.com/blog/understanding-high-cardinality-in-observability" target="_blank">Cardinality</a></li><li><a href="https://fluentbit.io/" target="_blank">FluentBit</a></li><li><a href="https://www.fluentd.org/" target="_blank">FluentD</a></li><li><a href="https://opentelemetry.io/" target="_blank">OpenTelemetry</a></li><li><a href="https://opentelemetry.io/docs/specs/otel/protocol/" target="_blank">OTLP == OpenTelemetry Line Protocol</a></li><li><a href="https://kafka.apache.org/" target="_blank">Kafka</a></li><li><a href="https://aws.amazon.com/blogs/aws/vpc-flow-logs-log-and-view-network-traffic-flows/" target="_blank">VPC Flow Logs</a></li><li><a href="https://www.simplyblock.io/glossary/read-amplification/" target="_blank">Read Amplification</a></li><li><a href="https://docs.lancedb.com/lance" target="_blank">Lance</a></li><li><a href="https://iceberg.apache.org/" target="_blank">Iceberg</a></li><li><a href="https://hudi.apache.org/" target="_blank">Hudi</a></li><li><a href="https://prometheus.io/docs/prometheus/latest/querying/basics/" target="_blank">PromQL</a></li></ul><br />The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a>
play-circle icon
72 MIN
Semantic Operators Meet Dataframes: Building Context for Agents with FENIC
JAN 12, 2026
Semantic Operators Meet Dataframes: Building Context for Agents with FENIC
Summary&nbsp;<br />In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired dataframe engine designed to bring LLM-powered semantics into reliable data engineering workflows. Kostas shares why today’s data infrastructure assumptions (BI-first, expert-operated, CPU-bound) fall short for AI-era tasks that are increasingly inference- and IO-bound. He explores how Fenic introduces semantic operators (e.g., semantic filter, extract, join) as first-class citizens in the logical plan so the optimizer can reason about inference, costs, and constraints. This enables developers to turn unstructured data into explicit schemas, compose transformations lazily, and offload LLM work safely and efficiently. He digs into Fenic’s architecture (lazy dataframe API, logical/physical plans, Polars execution, DuckDB/Arrow SQL path), how it exposes tools via MCP for agent integration, and where it fits in context engineering as a companion for memory/state management in agentic systems.&nbsp;<br /><br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at <a href="https://mongodb.com/Build" target="_blank">MongoDB.com/Build</a></li><li>Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to <a href="https://www.dataengineeringpodcast.com/bruin" target="_blank">dataengineeringpodcast.com/bruin</a> today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.</li><li>If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at <a href="https://www.dataengineeringpodcast.com/retool" target="_blank">dataengineeringpodcast.com/retool</a> today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.</li><li>Your host is Tobias Macey and today I'm interviewing Kostas Pardalis about Fenic, an opinionated, PySpark-inspired DataFrame framework for building AI and agentic applications</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you describe what Fenic is and the story behind it?</li><li>What are the core problems that you are trying to address with Fenic?</li><li>Dataframes have become a popular interface for doing chained transformations on structured data. What are the benefits of using that paradigm for LLM use-cases?</li><li>Can you describe the architecture and implementation of Fenic?</li><li>How have the design and scope of the project changed since you first started working on it?</li><li>You position Fenic as a means of bringing reliability to LLM-powered transformations. What are some of the anti-patterns that teams should be aware of when getting started with Fenic?</li><li>What are some of the most common first steps that teams take when integrating Fenic into their pipelines or applications?</li><li>What are some of the ways that teams should be thinking about using Fenic and semantic operations for data pipelines and transformations?</li><li>How does Fenic help with context engineering for agentic use cases?</li><li>What are some examples of toolchains/workflows that could be replaced with Fenic?</li><li>How does Fenic integrate with the broader ecosystem of data and AI frameworks? (e.g. Polars, Arrow, Qdrant, LangChan/Pydantic AI)</li><li>What are the most interesting, innovative, or unexpected ways that you have seen Fenic used?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on Fenic?</li><li>When is Fenic the wrong choice?</li><li>What do you have planned for the future of Fenic?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/kostaspardalis" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul><br />Links<br />&nbsp;<br /><ul><li><a href="https://github.com/typedef-ai/fenic" target="_blank">Fenic</a></li><li><a href="https://www.rudderstack.com/" target="_blank">RudderStack</a></li><li><a href="https://www.dataengineeringpodcast.com/rudderstack-open-source-customer-data-platform-episode-263" target="_blank">Podcast Episode</a></li><li><a href="https://trino.io/" target="_blank">Trino</a></li><li><a href="https://www.starburst.io/" target="_blank">Starburst</a></li><li><a href="https://trino.io/blog/2022/05/05/tardigrade-launch.html" target="_blank">Trino Project Tardigrade</a></li><li><a href="https://www.typedef.ai/" target="_blank">Typedef AI</a></li><li><a href="https://www.getdbt.com/" target="_blank">dbt</a></li><li><a href="https://spark.apache.org/docs/latest/api/python/index.html" target="_blank">PySpark</a></li><li><a href="https://en.wikipedia.org/wiki/User-defined_function" target="_blank">UDF == User-Defined Function</a></li><li><a href="https://lotus-ai.readthedocs.io/en/latest/" target="_blank">LOTUS</a></li><li><a href="https://pandas.pydata.org/" target="_blank">Pandas</a></li><li><a href="https://pola.rs/" target="_blank">Polars</a></li><li><a href="https://en.wikipedia.org/wiki/Relational_algebra" target="_blank">Relational Algebra</a></li><li><a href="https://arrow.apache.org/" target="_blank">Arrow</a></li><li><a href="https://duckdb.org/" target="_blank">DuckDB</a></li><li><a href="https://en.wikipedia.org/wiki/Markdown" target="_blank">Markdown</a></li><li><a href="https://ai.pydantic.dev/" target="_blank">Pydantic AI</a></li><li><a href="https://www.aiengineeringpodcast.com/pydantic-ai-type-safe-agent-framework-episode-63" target="_blank">AI Engineering Podcast Episode</a></li><li><a href="https://www.langchain.com/" target="_blank">LangChain</a></li><li><a href="https://docs.ray.io/en/latest/" target="_blank">Ray</a></li><li><a href="https://www.dask.org/" target="_blank">Dask</a></li></ul><br />The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a>
play-circle icon
56 MIN
Beyond Dashboards: How Data Teams Earn a Seat at the Table
JAN 5, 2026
Beyond Dashboards: How Data Teams Earn a Seat at the Table
Summary&nbsp;<br />In this episode Goutham Budati about his Data–Perspective–Action framework and how it empowers data teams to become true business partners. Gautham traces his path from automating Excel reports to leading high‑impact data organizations, then breaks down why technical excellence alone isn’t enough: teams must pair reliable data systems with deliberate storytelling, clear problem framing, and concrete action plans. He digs into tactics for moving from reactive ticket-taking to proactive influence — weekly one‑page narratives, design-first discovery, sampling stakeholders for real pain points, and treating dashboards as living roadmaps. He also explores how to right-size technical scope, preserve trust in core metrics, organize teams as “build” and “storytelling” duos, and translate business macros and micros into resilient system designs.&nbsp;<br /><br />Announcements&nbsp;<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to <a href="https://www.dataengineeringpodcast.com/bruin" target="_blank">dataengineeringpodcast.com/bruin</a> today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.</li><li>You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at <a href="https://mongodb.com/Build" target="_blank">MongoDB.com/Build</a></li><li>Your host is Tobias Macey and today I'm interviewing Goutham Budati about his data-perspective-action framework for empowering data teams to be more influential in the business</li></ul><br />Interview<br />&nbsp;<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you describe what the Data-Perspective-Action framework is and the story behind it?</li><li>What does it look like when someone operates at each of those three levels?<ul><li>How does that change the day-to-day work of an individual contributor?</li></ul></li><li>Why does technically excellent data work sometimes fail to drive decisions?<ul><li>How do you identify whether a data system or pipeline is actually creating value versus just existing?</li></ul></li><li>What's the moment when you realized that building reliable systems wasn't the same as enabling better decisions?<ul><li>Better decisions still need to be powered by reliable systems. How do you manage the tension of focusing on up-time against focusing on impact?</li></ul></li><li>What does it mean to add "Perspective" to data? How is that different from analysis or insights?</li><li>How do you know when you're overwhelming stakeholders versus giving them what they need?</li><li>What changes when you start designing systems to surface signal rather than just providing comprehensive data?</li><li>How do you learn what business context matters for turning data into something actionable?</li><li>What does it mean to design for Action from day one? How does that change what you build?<ul><li>How do you get stakeholders to actually act on data instead of just consuming it?</li></ul></li><li>Walk us through how you structure collaboration with business partners when you're trying to drive decisions, not just inform them.</li><li>What's the relationship between iteration and trust when you're building data products?</li><li>What does the transition from order-taker to strategic partner actually look like? What has to change?<ul><li>How do you position data work as driving the business rather than supporting it?</li></ul></li><li>Why does storytelling matter for data professionals? What role does it play that technical communication doesn't cover?</li><li>What organizational structures or team setups help data people gain influence?</li><li>Tell us about a time when you built something technically sound that failed to create impact. What did you learn?</li><li>What are the common patterns in dysfunctional data organizations? What causes the breakdown?</li><li>How do you rebuild credibility when you inherit a data function that's lost trust with the business?<ul><li>What's the relationship between technical excellence and stakeholder trust? Can you have one without the other?</li></ul></li><li>When is this framework the wrong lens? What situations call for a different approach?</li><li>How do you balance the demand for technical depth with the need to develop business and communication skills?</li><li>How should data professionals position themselves as AI and ML tools become more accessible?</li><li>What shifts do you see coming in how businesses think about data work?<ul><li>How is your thinking about data impact evolving?</li></ul></li><li>For someone who recognizes they're focused purely on the technical work and wants to expand their impact—where should they start?</li></ul><br />Contact Info<br />&nbsp;<br /><ul><li><a href="https://www.linkedin.com/in/gouthambudati/" target="_blank">LinkedIn</a></li></ul><br />Parting Question<br />&nbsp;<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul><br />Closing Announcements<br />&nbsp;<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul><br />The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a><br />&nbsp;
play-circle icon
49 MIN