Hello SundAI - our world through the lense of AI

AUG 24, 2025

The Illusion of Thinking: Decoding AI's Reasoning Limits

In this episode, we enter the world of Large Reasoning Models (LRMs). We explore advanced AI systems such as OpenAI’s o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking—models that generate detailed "thinking processes" (Chain-of-Thought, CoT) with built-in self-reflection before answering. These systems promise a new era of problem-solving. Yet, their true capabilities, scaling behavior, and limitations remain only partially understood.By conducting systematic investigations in controlled puzzle environments—including the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World—we uncover both the strengths and surprising weaknesses of LRMs. These environments allow precise control over task complexity while avoiding data contamination issues that often plague established benchmarks in mathematics and coding.A striking finding: LRMs face a complete accuracy collapse beyond certain complexity thresholds. Paradoxically, their reasoning effort (measured in "thinking tokens") first increases with complexity, only to decline after a point—even when token budgets are sufficient.We identify three distinct performance regimes:Low-complexity tasks – where standard Large Language Models (LLMs) still outperform LRMs.Medium-complexity tasks – where LRMs’ additional "thinking" shows a clear advantage.High-complexity tasks – where both LLMs and LRMs collapse entirely.Another challenge is “overthinking.” On simpler problems, LRMs often find correct solutions early but continue to pursue false alternatives, wasting computational resources. Even more surprising is their weakness in exact computation: they fail to leverage explicit algorithms, even when provided, and show inconsistent reasoning across different puzzle types.This episode invites you to rethink assumptions about AI’s capacity for generalizable reasoning. What does it truly mean for a machine to "think" under increasing complexity? And how should these insights shape the next generation of AI design and deployment? Sources: Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. (Unpublished manuscript). https://arxiv.org/abs/2506.06941 Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

15 MIN

JUN 9, 2025

AI Cannot Think: When AI Reasoning Models Hit Their Limit

Join us as we dive into a groundbreaking study that systematically investigates the strengths and fundamental limitations of Large Reasoning Models (LRMs), the cutting-edge AI systems behind advanced "thinking" mechanisms like Chain-of-Thought with self-reflection.Moving beyond traditional, often contaminated, mathematical and coding benchmarks, this research uses controllable puzzle environments like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World to precisely manipulate problem complexity and offer unprecedented insights into how LRMs "think".You'll discover surprising findings, including:Three distinct performance regimes: Standard Large Language Models (LLMs) surprisingly outperform LRMs on low-complexity tasks; LRMs demonstrate an advantage on medium-complexity tasks due to their additional "thinking" processes; but crucially, both model types experience a complete accuracy collapse on high-complexity tasks.A counter-intuitive scaling limit: LRMs' reasoning effort, measured by token usage, increases up to a certain complexity point, then paradoxically declines despite having an adequate token budget.This suggests a fundamental inference-time scaling limitation in their reasoning capabilities relative to problem complexity.Inconsistencies and limitations in exact computation: LRMs struggle to benefit from being explicitly given algorithms, failing to improve performance even when provided with step-by-step instructions for puzzles like the Tower of HanoiThey also exhibit inconsistent reasoning across different puzzle types, performing many correct moves in one scenario (e.g., Tower of Hanoi) but failing much earlier in another (e.g., River Crossing), indicating potential issues with generalizable reasoning rather than just problem-solving strategy discovery"Overthinking" phenomenon: For simpler problems, LRMs often find correct solutions early in their reasoning trace but then continue to inefficiently explore incorrect alternatives, wasting computational effortThis episode challenges prevailing assumptions about LRM capabilities and raises crucial questions about their true reasoning potential, paving the way for future investigations into more robust AI reasoning.Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

15 MIN

APR 27, 2025

The Art and Science of Prompt Engineering by Google

In this show, we break down the art of crafting prompts that help AI deliver precise, useful, and reliable results.Whether you're summarising text, answering questions, generating code, or translating content — we’ll show you how to guide LLMs effectively.We explore real-world techniques, from simple zero-shot prompts to advanced strategies like Chain of Thought, Tree of Thoughts, and ReAct, combining reasoning with external tools.We’ll also dive into how to control AI output — tweaking things like temperature, token limits, and sampling settings — to shape your results.Plus, we’ll share best practices for writing, testing, and refining prompts — including tips on examples, formatting, and structured outputs like JSON.Whether you’re just getting started or already deep into advanced prompting, this podcast will help you sharpen your skills and stay ahead of the curve.Let’s unlock the full potential of AI — one prompt at a time.Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

42 MIN

APR 20, 2025

AI finally passed the Turing Test

Has AI finally passed the Turing Test? Dive into the groundbreaking news from UC San Diego, where research published in March 2025 claims that GPT 4.5 convinced human judges it was a real person 73% of the time, even more often than actual humans in the same test. But what does this historic moment truly signify for the future of artificial intelligence?This podcast explores the original concept of the Turing Test, proposed by Alan Turing in 1950 as a practical measure of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human through conversation. We'll examine the rigorous controlled study that led to GPT 4.5's alleged success, involving 284 participants and five-minute conversations.We'll delve into what passing the Turing Test actually means – and, crucially, what it doesn't. Is this the dawn of true AI consciousness or Artificial General Intelligence (AGI)? The sources clarify that the Turing Test specifically measures conversational ability and human likeness in dialogue, not sentience or general intelligence.Discover the key factors that contributed to this breakthrough, including massive increases in model parameters and training data, sophisticated prompting (especially the use of a "persona prompt"), learning from human feedback, and models designed for conversation. We will also discuss the intriguing finding that human judges often identified someone as human when they lacked knowledge or made mistakes, showing a shift in our perception of AI.However, the podcast will also address the criticisms and limitations of the Turing Test. We'll explore the argument that it's merely a test of functionality and doesn't necessarily indicate genuine human-like thinking. We'll also touch on alternative tests for AI that aim to assess creativity, problem-solving, and other aspects of intelligence beyond conversation, such as the Metzinger Test and the Lovelace 2.0 Test.Finally, we will consider the profound implications of AI systems convincingly simulating human conversation, including the economic impact on roles requiring human-like interaction, the potential effects on social relationships, and the ethical considerations around deception and manipulation.Join us to unpack this milestone in computing history and discuss what the blurring lines between human and machine communication mean for our society, economy, and lives.Source: https://theconversation.com/chatgpt-just-passed-the-turing-test-but-that-doesnt-mean-ai-is-now-as-smart-as-humans-253946Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

17 MIN

APR 15, 2025

Googles approach to AGI - artificial general intelligence

h 145-page paper from Google DeepMind, outlining their strategic approach to managing the risks and responsibilities of AGI development.1. Defining AGI and ‘Exceptional AGI’We begin by clarifying what DeepMind means by AGI: an AI system capable of performing any task a human can. More specifically, they introduce the notion of ‘Exceptional AGI’ – a system whose performance matches or exceeds that of the top 1% of professionals across a wide range of non-physical tasks.(Note: DeepMind is a British AI company, founded in 2012 and acquired by Google in 2014.)2. Understanding the Risk LandscapeAGI, while full of potential, also presents serious risks – from systemic harm to outright existential threats. DeepMind identifies four core areas of concern:Abuse (intentional misuse by actors with harmful intent)Misconduct (reckless or unethical use)Errors (unexpected failures or flaws in design)Structural risks (long-term unintended societal or economic consequences)Among these, abuse and misconduct are given particular attention due to their immediacy and severity.3. Mitigating AGI Threats: DeepMind’s Technical StrategyTo counter these dangers, DeepMind proposes a multi-layered technical safety strategy. The goal is twofold:To prevent access to powerful capabilities by bad actorsTo better understand and predict AI behaviour as systems grow in autonomy and complexityThis approach integrates mechanisms for oversight, constraint, and continual evaluation.4. Debate Within the AI FieldHowever, the path is far from settled. Within the AI research community, there is ongoing skepticism regarding both the feasibility of AGI and the assumptions underlying safety interventions. Critics argue that AGI remains too vaguely defined to justify such extensive safeguards, while others warn that dismissing risks could be equally shortsighted.5. Timelines and TrajectoriesWhen might we see AGI? DeepMind’s report considers the emergence of ‘Exceptional AGI’ as plausible before the end of this decade – that is, before 2030. While no exact date is predicted, the implication is clear: preparation cannot wait.This episode offers a rare look behind the scenes at how a leading AI lab is thinking about, and preparing for, the future of artificial general intelligence. It also raises the broader question: how should societies respond when technology begins to exceed traditional human limits? Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdfDisclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

26 MIN

Hello SundAI - our world through the lense of AI

Details

Recent Episodes