EACL 2026: LLMs Can Hearโฆ But Can They Reason? A New Benchmark for Audio Intelligence
What does it actually mean for a model to understand audioPaper: https://arxiv.org/abs/2601.19673In this episode, I talk with Iwona Christop, a PhD student at Adam Mickiewicz University, about her recent EACL paper introducing ART (Audio Reasoning Tasks) โ a new benchmark designed to evaluate whether multimodal LLMs can truly reason over audio, not just transcribe or classify it.Most existing benchmarks test audio skills in isolation (like ASR or classification). But real-world intelligence requires something deeper: combining signals, comparing sounds, tracking context, and making decisions.This work takes a different approach:No text-only shortcuts โ tasks canโt be solved via transcription aloneReasoning-first design โ models must combine multiple audio cuesNo expert knowledge required โ anyone can verify correctnessWe also dive into the diverse task design, including:Audio arithmetic (counting and comparing sounds)Cross-recording speaker & language identificationSound-based reasoning (e.g., inferring properties from audio)Speech feature comparison (accents, variations)Multimodal reasoning across text and soundThe dataset includes 9 tasks, 9,000 samples, and 30+ hours of audio โ all generated in a scalable way using templates and TTS.๐ If you care about multimodal reasoning, evaluation, or the limits of current LLM capabilities, this conversation is for you.Iwona Christop:https://www.linkedin.com/in/iwona-christop/๐ Like & subscribe for more deep dives into cutting-edge AI research๐ New episodes from EACL 2026 coming soon#WiAIR #EACL2026