The Power of Community in AI Development with Oumi
MAR 16, 202556 MIN
The Power of Community in AI Development with Oumi
MAR 16, 202556 MIN
Description
Summary<br />In this episode of the AI Engineering Podcast Emmanouil (Manos) Koukoumidis, CEO of Oumi, about his vision for an open platform for building, evaluating, and deploying AI foundation models. Manos shares his journey from working on natural language AI services at Google Cloud to founding Oumi with a mission to advance open-source AI, emphasizing the importance of community collaboration and accessibility. He discusses the need for open-source models that are not constrained by proprietary APIs, highlights the role of Oumi in facilitating open collaboration, and touches on the complexities of model development, open data, and community-driven advancements in AI. He also explains how Oumi can be used throughout the entire lifecycle of AI model development, post-training, and deployment.<br /><br /><br />Announcements<br /><ul><li>Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems</li><li>Your host is Tobias Macey and today I'm interviewing Manos Koukoumidis about Oumi, an all-in-one production-ready open platform to build, evaluate, and deploy AI models</li></ul>Interview<br /><ul><li>Introduction</li><li>How did you get involved in machine learning?</li><li>Can you describe what Oumi is and the story behind it?</li><li>There are numerous projects, both full suites and point solutions, focused on every aspect of "AI" development. What is the unique value that Oumi provides in this ecosystem?</li><li>You have stated the desire for Oumi to become the Linux of AI development. That is an ambitious goal and one that Linux itself didn't start with. What do you see as the biggest challenges that need addressing to reach a critical mass of adoption?</li><li>In the vein of "open source" AI, the most notable project that I'm aware of that fits the proper definition is the OLMO models from AI2. What lessons have you learned from their efforts that influence the ways that you think about your work on Oumi?</li><li>On the community building front, HuggingFace has been the main player. What do you see as the benefits and shortcomings of that platform in the context of your vision for open and collaborative AI?</li><li>Can you describe the overall design and architecture of Oumi?<ul><li>How did you approach the selection process for the different components that you are building on top of?</li><li>What are the extension points that you have incorporated to allow for customization/evolution?</li></ul></li><li>Some of the biggest barriers to entry for building foundation models are the cost and availability of hardware used for training, and the ability to collect and curate the data needed. How does Oumi help with addressing those challenges?</li><li>For someone who wants to build or contribute to an open source model, what does that process look like?<ul><li>How do you envision the community building/collaboration process?</li></ul></li><li>Your overall goal is to build a foundation for the growth and well-being of truly open AI. How are you thinking about the sustainability of the project and the funding needed to grow and support the community?</li><li>What are the most interesting, innovative, or unexpected ways that you have seen Oumi used?</li><li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on Oumi?</li><li>When is Oumi the wrong choice?</li><li>What do you have planned for the future of Oumi?</li></ul>Contact Info<br /><ul><li><a href="https://www.linkedin.com/in/koukoumidis/" target="_blank">LinkedIn</a></li></ul>Parting Question<br /><ul><li>From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?</li></ul>Closing Announcements<br /><ul><li>Thank you for listening! Don't forget to check out our other shows. The <a href="https://www.dataengineeringpodcast.com" target="_blank">Data Engineering Podcast</a> covers the latest on modern data management. <a href="https://www.pythonpodcast.com" target="_blank">Podcast.__init__</a> covers the Python language, its community, and the innovative ways it is being used.</li><li>Visit the <a href="https://www.aiengineeringpodcast.com" target="_blank">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li><li>If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.</li><li>To help other people find the show please leave a review on <a href="https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243" target="_blank">iTunes</a> and tell your friends and co-workers.</li></ul>Links<br /><ul><li><a href="https://oumi.ai/" target="_blank">Oumi</a></li><li><a href="https://cloud.google.com/vertex-ai/generative-ai/docs/deprecations/palm" target="_blank">Cloud PaLM</a></li><li><a href="https://deepmind.google/technologies/gemini/" target="_blank">Google Gemini</a></li><li><a href="https://deepmind.google/" target="_blank">DeepMind</a></li><li><a href="https://en.wikipedia.org/wiki/Long_short-term_memory" target="_blank">LSTM == Long Short-Term Memory</a></li><li><a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture" target="_blank">Transfomers</a>)</li><li><a href="https://openai.com/index/chatgpt/" target="_blank">ChatGPT</a></li><li><a href="https://en.wikipedia.org/wiki/Partial_differential_equation" target="_blank">Partial Differential Equation</a></li><li><a href="https://allenai.org/olmo" target="_blank">OLMO</a></li><li><a href="https://opensource.org/ai" target="_blank">OSI AI definition</a></li><li><a href="https://mlflow.org/" target="_blank">MLFlow</a></li><li><a href="https://metaflow.org/" target="_blank">Metaflow</a></li><li><a href="https://docs.skypilot.co/en/latest/docs/index.html" target="_blank">SkyPilot</a></li><li><a href="https://www.llama.com/" target="_blank">Llama</a></li><li><a href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation" target="_blank">RAG</a><ul><li><a href="https://www.aiengineeringpodcast.com/retrieval-augmented-generation-implementation-episode-34" target="_blank">Podcast Episode</a></li></ul></li><li><a href="https://en.wikipedia.org/wiki/Synthetic_data" target="_blank">Synthetic Data</a><ul><li><a href="https://www.aiengineeringpodcast.com/gretel-syntehtic-data-for-ai-episode-46" target="_blank">Podcast Episode</a></li></ul></li><li><a href="https://www.evidentlyai.com/llm-guide/llm-as-a-judge" target="_blank">LLM As Judge</a></li><li><a href="https://github.com/sgl-project/sglang" target="_blank">SGLang</a></li><li><a href="https://github.com/vllm-project/vllm" target="_blank">vLLM</a></li><li><a href="https://gorilla.cs.berkeley.edu/leaderboard.html" target="_blank">Function Calling Leaderboard</a></li><li><a href="https://en.wikipedia.org/wiki/DeepSeek" target="_blank">Deepseek</a></li></ul>The intro and outro music is from <a href="https://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Tales_Of_A_Dead_Fish/Hitmans_Lovesong/" target="_blank">Hitman's Lovesong feat. Paola Graziano</a> by <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/" target="_blank">The Freak Fandango Orchestra</a>/<a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">CC BY-SA 3.0</a>