100% Jailbreak Success? The Hard Truth About AI Safety, with Dr. Saadia Gabriel (Part 2)
APR 17, 202633 MIN
100% Jailbreak Success? The Hard Truth About AI Safety, with Dr. Saadia Gabriel (Part 2)
APR 17, 202633 MIN
Description
<p>What actually happens when AI systems fail in the real world?</p><p><br /></p><p>In this final part of our conversation with Saadia Gabriel (UCLA), we unpack one of the most urgent challenges in modern AI: why even the most advanced models remain vulnerable to manipulation - and what that means for safety, fairness, and society.</p><p><br /></p><p>From multi-turn jailbreaking attacks with near 100% success rates to misinformation shaping human beliefs, this conversation goes beyond surface-level concerns and dives into how harms actually emerge in deployed systems.</p><p><br /></p><p>We explore:</p><ul><li>Why current guardrails are not enough</li><li>How realistic attack scenarios differ from academic benchmarks</li><li>The connection between model vulnerabilities and societal harm</li><li>What AI can (and cannot) do about misinformation and persuasion</li><li>The open research problems that still don’t have solutions</li></ul><p><br /></p><p>Resources & Links:</p><ul><li><a href="https://aclanthology.org/2024.emnlp-main.487/" target="_blank" rel="ugc noopener noreferrer">Generative AI in the Era of 'Alternative Facts'</a></li><li><a href="https://aclanthology.org/2025.emnlp-main.1571/" target="_blank" rel="ugc noopener noreferrer">ModelCitizens: Representing Community Voices in Online Safety</a></li><li><a href="https://arxiv.org/abs/2601.11778" target="_blank" rel="ugc noopener noreferrer">Translation as a Scalable Proxy for Multilingual Evaluation</a></li></ul><p><br /></p><p>Connect with Dr. Saadia Gabriel:</p><ul><li><a href="https://x.com/GabrielSaadia" target="_blank" rel="ugc noopener noreferrer">https://x.com/GabrielSaadia</a></li><li><a href="https://bsky.app/profile/skgabrie.bsky.social" target="_blank" rel="ugc noopener noreferrer">https://bsky.app/profile/skgabrie.bsky.social</a></li></ul><p><br /></p>