Enhancing Data Accessibility and Governance with Gravitino

SEP 1, 202438 MIN
Data Engineering Podcast

Enhancing Data Accessibility and Governance with Gravitino

SEP 1, 202438 MIN

Description

Summary<br />As data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and access the underlying data. Gravitino was created to provide a single interface to locate and query your data. In this episode Junping Du explains how Gravitino works, the capabilities that it unlocks, and how it fits into your data platform.<br />Announcements<br /><ul><li>Hello and welcome to the Data Engineering Podcast, the show about modern data management</li><li>Your host is Tobias Macey and today I'm interviewing Junping Du about Gravitino, an open source metadata service for a unified view of all of your schemas</li></ul>Interview<br /><ul><li>Introduction</li><li>How did you get involved in the area of data management?</li><li>Can you describe what Gravitino is and the story behind it?</li><li>What problems are you solving with Gravitino?<ul><li>What are the methods that teams have relied on in the absence of Gravitino to address those use cases?</li></ul></li><li>What led to the Hive Metastore being the default for so long?<ul><li>What are the opportunities for innovation and new functionality in the metadata service?</li></ul></li><li>The documentation suggests that Gravitino has overlap with a number of tool categories such as table schema (Hive metastore), metadata repository (Open Metadata), data federation (Trino/Alluxio). What are the capabilities that it can completely replace, and which will require other systems for more comprehensive functionality?</li><li>What are the capabilities that you are explicitly keeping out of scope for Gravitino?</li><li>Can you describe the technical architecture of Gravitino?<ul><li>How have the design and scope evolved from when you first started working on it?</li></ul></li><li>Can you describe how Gravitino integrates into an overall data platform?<ul><li>In a typical day, what are the different ways that a data engineer or data analyst might interact with Gravitino?</li></ul></li><li>One of the features that you highlight is centralized permissions management. Can you describe the access control model that you use for unifying across underlying sources?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen Gravitino used?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on Gravitino?</li><li>When is Gravitino the wrong choice?</li><li>What do you have planned for the future of Gravitino?</li></ul>Contact Info<br /><ul><li><a href="https://www.linkedin.com/in/junping-du/" target="_blank">LinkedIn</a></li><li><a href="https://github.com/JunpingDu" target="_blank">GitHub</a></li></ul>Parting Question<br /><ul><li>From your perspective, what is the biggest gap in the tooling or technology for data management today?</li></ul>Closing Announcements<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used. The <a href="https://www.aiengineeringpodcast.com" target="_blank">AI Engineering Podcast</a> is your guide to the fast-moving world of building AI systems.</li><li>Visit the <a href="https://www.dataengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.</li></ul>Links<br /><ul><li><a href="https://gravitino.apache.org/" target="_blank">Gravitino</a></li><li><a href="https://hadoop.apache.org" target="_blank">Hadoop</a></li><li><a href="https://datastrato.ai/" target="_blank">Datastrato</a></li><li><a href="https://pytorch.org/" target="_blank">PyTorch</a></li><li><a href="https://www.ray.io/" target="_blank">Ray</a></li><li><a href="https://www.gartner.com/en/data-analytics/topics/data-fabric" target="_blank">Data Fabric</a></li><li><a href="https://hive.apache.org/" target="_blank">Hive</a></li><li><a href="https://iceberg.apache.org/" target="_blank">Iceberg</a><ul><li><a href="https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52" target="_blank">Podcast Episode</a></li></ul></li><li><a href="https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore" target="_blank">Hive Metastore</a></li><li><a href="https://trino.io/" target="_blank">Trino</a></li><li><a href="https://open-metadata.org/" target="_blank">OpenMetadata</a><ul><li><a href="https://www.dataengineeringpodcast.com/openmetadata-universal-metadata-layer-episode-237/" target="_blank">Podcast Episode</a></li></ul></li><li><a href="https://www.alluxio.io/" target="_blank">Alluxio</a></li><li><a href="https://atlan.com/" target="_blank">Atlan</a><ul><li><a href="https://www.dataengineeringpodcast.com/atlan-data-team-collaboration-episode-179" target="_blank">Podcast Episode</a></li></ul></li><li><a href="https://spark.apache.org/" target="_blank">Spark</a></li><li><a href="https://thrift.apache.org/" target="_blank">Thrift</a></li></ul>The intro and outro music is from <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug" target="_blank">The Hug</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA</a>