Women in AI Research (WiAIR) | Podcast Guru

Overview

Episodes

Details

Women in AI Research (WiAIR) is a podcast dedicated to celebrating the remarkable contributions of female AI researchers from around the globe. Our mission is to challenge the prevailing perception that AI research is predominantly male-driven. Our goal is to empower early career researchers, especially women, to pursue their passion for AI and make an impact in this rapidly growing field. You will learn from women at different career stages, stay updated on the latest research and advancements, and hear powerful stories of overcoming obstacles and breaking stereotypes.

Recent Episodes

100% Jailbreak Success? The Hard Truth About AI Safety, with Dr. Saadia Gabriel (Part 2)

APR 17, 2026

100% Jailbreak Success? The Hard Truth About AI Safety, with Dr. Saadia Gabriel (Part 2)

What actually happens when AI systems fail in the real world?In this final part of our conversation with Saadia Gabriel (UCLA), we unpack one of the most urgent challenges in modern AI: why even the most advanced models remain vulnerable to manipulation - and what that means for safety, fairness, and society.From multi-turn jailbreaking attacks with near 100% success rates to misinformation shaping human beliefs, this conversation goes beyond surface-level concerns and dives into how harms actually emerge in deployed systems.We explore:Why current guardrails are not enoughHow realistic attack scenarios differ from academic benchmarksThe connection between model vulnerabilities and societal harmWhat AI can (and cannot) do about misinformation and persuasionThe open research problems that still don’t have solutionsResources & Links:Generative AI in the Era of 'Alternative Facts'ModelCitizens: Representing Community Voices in Online SafetyTranslation as a Scalable Proxy for Multilingual EvaluationConnect with Dr. Saadia Gabriel:https://x.com/GabrielSaadiahttps://bsky.app/profile/skgabrie.bsky.social

33 MIN

From Hate Speech to Best Paper: Building Safer AI Systems, with Dr. Saadia Gabriel (Part 1)

APR 15, 2026

From Hate Speech to Best Paper: Building Safer AI Systems, with Dr. Saadia Gabriel (Part 1)

What does it mean to build AI systems we can actually trust?In this first part of our conversation with Saadia Gabriel (UCLA), we explore the deeply personal and technical journey behind her work on AI safety, misuse, and responsible NLP.From experiencing targeted hate speech firsthand to receiving a best paper nomination, Saadia shares how her lived experience shaped her research — and why language models must be designed with both capability and risk in mind.🧠 In this episode, we cover:How personal experiences influence AI research directionsThe intersection of NLP, security, and privacyWhy LLMs can be both powerful and dangerousWhat it means to build trustworthy AI systemsLessons from working across multiple research paradigmsHow to pursue high-impact research as a PhD or early-career scientistResources & Links:X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-AgentsConnect with Dr. Saadia Gabriel:https://x.com/GabrielSaadiahttps://bsky.app/profile/skgabrie.bsky.social

29 MIN

EACL 2026: LLMs Can Hear… But Can They Reason? A New Benchmark for Audio Intelligence

APR 13, 2026

EACL 2026: LLMs Can Hear… But Can They Reason? A New Benchmark for Audio Intelligence

What does it actually mean for a model to understand audioPaper: https://arxiv.org/abs/2601.19673In this episode, I talk with Iwona Christop, a PhD student at Adam Mickiewicz University, about her recent EACL paper introducing ART (Audio Reasoning Tasks) — a new benchmark designed to evaluate whether multimodal LLMs can truly reason over audio, not just transcribe or classify it.Most existing benchmarks test audio skills in isolation (like ASR or classification). But real-world intelligence requires something deeper: combining signals, comparing sounds, tracking context, and making decisions.This work takes a different approach:No text-only shortcuts — tasks can’t be solved via transcription aloneReasoning-first design — models must combine multiple audio cuesNo expert knowledge required — anyone can verify correctnessWe also dive into the diverse task design, including:Audio arithmetic (counting and comparing sounds)Cross-recording speaker & language identificationSound-based reasoning (e.g., inferring properties from audio)Speech feature comparison (accents, variations)Multimodal reasoning across text and soundThe dataset includes 9 tasks, 9,000 samples, and 30+ hours of audio — all generated in a scalable way using templates and TTS.👉 If you care about multimodal reasoning, evaluation, or the limits of current LLM capabilities, this conversation is for you.Iwona Christop:https://www.linkedin.com/in/iwona-christop/👍 Like & subscribe for more deep dives into cutting-edge AI research🔔 New episodes from EACL 2026 coming soon#WiAIR #EACL2026

18 MIN

EACL 2026: LLMs Can Call Tools -- But Can They Understand Them?

APR 12, 2026

EACL 2026: LLMs Can Call Tools -- But Can They Understand Them?

LLM-based agents are everywhere, but most research focuses on just one step: getting the model to call the right tool. What happens after that?Paper: https://arxiv.org/abs/2510.15955In this talk, Kiran Kate (IBM Research) presents new findings from their EACL 2026 paper on a largely overlooked problem:👉 Can LLMs actually understand and use the outputs returned by tools?As tool-augmented systems become more complex, this question becomes critical. The work dives into how current models handle non-trivial, real-world tool responses, and where they break down.💡 Key ideas covered:Why tool calling is only half the story in LLM agentsThe challenge of processing complex tool outputsFailure modes in current LLM-based systemsWhat this means for building robust, real-world AI agentsThis talk is especially relevant if you're working on:LLM agents and tool useEvaluation of LLM capabilitiesReal-world deployment of AI systemsAgentic workflows and reasoning pipelinesKiran Kate: https://www.linkedin.com/in/kiran-kate-8b98672/👍 Like & subscribe for more deep dives into cutting-edge AI research🔔 New episodes from EACL 2026 coming soon

22 MIN

EACL 2026: Reasoning Can Hurt LLM Safety?! Rethinking Accuracy in AI Systems

APR 10, 2026

EACL 2026: Reasoning Can Hurt LLM Safety?! Rethinking Accuracy in AI Systems

In this episode of #WiAIRpodcast, we dive into a subtle but critical question: Does adding reasoning actually make LLMs safer and more reliable?Paper: https://arxiv.org/abs/2510.21049Atoosa Chegini (University of Maryland, Apple) presents Reasoning's Razor (EACL 2026), where she and her collaborators examine how reasoning impacts high-stakes binary classification tasks, including safety filtering and hallucination detection.Their findings highlight an important nuance:While reasoning can improve overall accuracy, it may degrade performance at low false positive rates -- exactly where real-world systems need to operate.This conversation covers:Why accuracy is a misleading metric for safety-critical LLM applicationsThe importance of evaluating models at fixed false positive rates (FPR)How two models with identical accuracy can behave completely differently in deploymentThe impact of "think-on" (with reasoning) vs "think-off" (no reasoning) settingsPractical implications for RLHF, SFT, and post-training pipelinesIf you're working on:LLM evaluation & reliabilityAI safety or hallucination detectionProduction deployment of language models— this discussion offers a perspective that is both technically grounded and immediately actionable.Atoosa:https://www.linkedin.com/in/atoosa-chegini-6713741a3/https://scholar.google.com/citations?user=5nY9tagAAAAJ&hl=en&oi=ao👍 Like & subscribe for more deep dives into cutting-edge AI research🔔 New episodes from EACL 2026 coming soon

21 MIN