The researchers discovered that current “thinking” AI models still cannot reason at a level consistent with human-like artificial general intelligence.
Apple researchers discovered that top AI models still struggle with thinking, indicating that there is still much work to be done to build artificial general intelligence (AGI).
Large reasoning models (LRMs) have been incorporated into recent updates to popular AI large language models (LLMs), such as OpenAI’s ChatGPT and Anthropic’s Claude. However, the researchers at Apple stated in a June paper titled “The Illusion of Thinking” that their basic capabilities, scaling characteristics, and limitations “remain insufficiently understood.”
They pointed out that existing assessments “emphasize final answer accuracy” and mainly concentrate on accepted mathematics and coding standards.
However, they claimed that this assessment did not shed light on the AI models’ capacity for reasoning.
In contrast to the research, artificial general intelligence is anticipated to be developed within the next few years.
Researchers at Apple test “thinking” AI models.
The researchers created various puzzle games to test “thinking” and “non-thinking” versions of Claude Sonnet, OpenAI’s o3-mini and o1, and DeepSeek-R1 and V3 chatbots beyond the accepted mathematical standards.
In contrast to predictions for AGI skills, “frontier LRMs face a complete accuracy collapse beyond certain complexities,” fail to generalize reasoning effectively, and lose their edge as complexity increases.
“We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles.”

Researchers claim AI chatbots are overthinking.
They discovered that the models’ reasoning was erratic and superficial, and they also noticed overthinking, where AI chatbots would initially produce accurate responses before veering off course.
In contrast to AGI-level reasoning, the researchers found that LRMs imitate reasoning processes without internalizing or generalizing them.
“These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.”

The competition to create AGI
The ultimate goal of AI development is artificial general intelligence (AGI), which is the point at which a computer is as intelligent as a human and is capable of thinking and reasoning like one.
Sam Altman, the CEO of OpenAI, stated in January that the company was now closer than ever to creating AGI. At the time, he declared, “We are now confident we know how to build AGI as we have traditionally understood it.”
Dario Amodei, CEO of Anthropic, said in November that artificial general intelligence would surpass human capabilities within a year or two. “It does give you the impression that we’ll reach that point by 2026 or 2027 if you just look at how quickly these capabilities are growing,” he said.