4 Sources
4 Sources
[1]
Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger -- on specific problems
The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement. Alexia Jolicoeur-Martineau, Senior AI Researcher at Samsung's Advanced Institute of Technology (SAIT) in Montreal, Canada, has introduced the Tiny Recursion Model (TRM) -- a neural network so small it contains just 7 million parameters (internal model settings), yet it competes with or surpasses cutting-edge language models 10,000 times larger in terms of their parameter count, including OpenAI's o3-mini and Google's Gemini 2.5 Pro, on some of the toughest reasoning benchmarks in AI research. The goal is to show that very highly performant new AI models can be created affordably without massive investments in the graphics processing units (GPUs) and power needed to train the larger, multi-trillion parameter flagship models powering many LLM chatbots today. The results were described in a research paper published on open access website arxiv.org, entitled "Less is More: Recursive Reasoning with Tiny Networks." "The idea that one must rely on massive foundational models trained for millions of dollars by some big corporation in order to solve hard tasks is a trap," wrote Jolicoeur-Martineau on the social network X. "Currently, there is too much focus on exploiting LLMs rather than devising and expanding new lines of direction." Jolicoeur-Martineau also added: "With recursive reasoning, it turns out that 'less is more'. A tiny model pretrained from scratch, recursing on itself and updating its answers over time, can achieve a lot without breaking the bank." TRM's code is available now on Github under an enterprise-friendly, commercially viable MIT License -- meaning anyone from researchers to companies can take, modify it, and deploy it for their own purposes, even commercial applications. One Big Caveat However, readers should be aware that TRM was designed specifically to perform well on structured, visual, grid-based problems like Sudoku, mazes, and puzzles on the ARC (Abstract and Reasoning Corpus)-AGI benchmark, the latter which offers tasks that should be easy for humans but difficult for AI models, such sorting colors on a grid based on a prior, but not identical, solution. From Hierarchy to Simplicity The TRM architecture represents a radical simplification. It builds upon a technique called Hierarchical Reasoning Model (HRM) introduced earlier this year, which showed that small networks could tackle logical puzzles like Sudoku and mazes. HRM relied on two cooperating networks -- one operating at high frequency, the other at low -- supported by biologically inspired arguments and mathematical justifications involving fixed-point theorems. Jolicoeur-Martineau found this unnecessarily complicated. TRM strips these elements away. Instead of two networks, it uses a single two-layer model that recursively refines its own predictions. The model begins with an embedded question and an initial answer, represented by variables x, y, and z. Through a series of reasoning steps, it updates its internal latent representation z and refines the answer y until it converges on a stable output. Each iteration corrects potential errors from the previous step, yielding a self-improving reasoning process without extra hierarchy or mathematical overhead. How Recursion Replaces Scale The core idea behind TRM is that recursion can substitute for depth and size. By iteratively reasoning over its own output, the network effectively simulates a much deeper architecture without the associated memory or computational cost. This recursive cycle, run over as many as sixteen supervision steps, allows the model to make progressively better predictions -- similar in spirit to how large language models use multi-step "chain-of-thought" reasoning, but achieved here with a compact, feed-forward design. The simplicity pays off in both efficiency and generalization. The model uses fewer layers, no fixed-point approximations, and no dual-network hierarchy. A lightweight halting mechanism decides when to stop refining, preventing wasted computation while maintaining accuracy. Performance That Punches Above Its Weight Despite its small footprint, TRM delivers benchmark results that rival or exceed models millions of times larger. In testing, the model achieved: * 87.4% accuracy on Sudoku-Extreme (up from 55% for HRM) * 85% accuracy on Maze-Hard puzzles * 45% accuracy on ARC-AGI-1 * 8% accuracy on ARC-AGI-2 These results surpass or closely match performance from several high-end large language models, including DeepSeek R1, Gemini 2.5 Pro, and o3-mini, despite TRM using less than 0.01% of their parameters. Such results suggest that recursive reasoning, not scale, may be the key to handling abstract and combinatorial reasoning problems -- domains where even top-tier generative models often stumble. Design Philosophy: Less Is More TRM's success stems from deliberate minimalism. Jolicoeur-Martineau found that reducing complexity led to better generalization. When the researcher increased layer count or model size, performance declined due to overfitting on small datasets. By contrast, the two-layer structure, combined with recursive depth and deep supervision, achieved optimal results. The model also performed better when self-attention was replaced with a simpler multilayer perceptron on tasks with small, fixed contexts like Sudoku. For larger grids, such as ARC puzzles, self-attention remained valuable. These findings underline that model architecture should match data structure and scale rather than default to maximal capacity. Training Small, Thinking Big TRM is now officially available as open source under an MIT license on GitHub. The repository includes full training and evaluation scripts, dataset builders for Sudoku, Maze, and ARC-AGI, and reference configurations for reproducing the published results. It also documents compute requirements ranging from a single NVIDIA L40S GPU for Sudoku training to multi-GPU H100 setups for ARC-AGI experiments. The open release confirms that TRM is designed specifically for structured, grid-based reasoning tasks rather than general-purpose language modeling. Each benchmark -- Sudoku-Extreme, Maze-Hard, and ARC-AGI -- uses small, well-defined input-output grids, aligning with the model's recursive supervision process. Training involves substantial data augmentation (such as color permutations and geometric transformations), underscoring that TRM's efficiency lies in its parameter size rather than total compute demand. The model's simplicity and transparency make it more accessible to researchers outside of large corporate labs. Its codebase builds directly on the earlier Hierarchical Reasoning Model framework but removes HRM's biological analogies, multiple network hierarchies, and fixed-point dependencies. In doing so, TRM offers a reproducible baseline for exploring recursive reasoning in small models -- a counterpoint to the dominant "scale is all you need" philosophy. Community Reaction The release of TRM and its open-source codebase prompted an immediate debate among AI researchers and practitioners on X. While many praised the achievement, others questioned how broadly its methods could generalize. Supporters hailed TRM as proof that small models can outperform giants, calling it "10,000× smaller yet smarter" and a potential step toward architectures that think rather than merely scale. Critics countered that TRM's domain is narrow -- focused on bounded, grid-based puzzles -- and that its compute savings come mainly from size, not total runtime. Researcher Yunmin Cha noted that TRM's training depends on heavy augmentation and recursive passes, "more compute, same model." Cancer geneticist and data scientist Chey Loveday stressed that TRM is a solver, not a chat model or text generator: it excels at structured reasoning but not open-ended language. Machine learning researcher Sebastian Raschka positioned TRM as an important simplification of HRM rather than a new form of general intelligence. He described its process as "a two-step loop that updates an internal reasoning state, then refines the answer." Several researchers, including Augustin Nabele, agreed that the model's strength lies in its clear reasoning structure but noted that future work would need to show transfer to less-constrained problem types. The consensus emerging online is that TRM may be narrow, but its message is broad: careful recursion, not constant expansion, could drive the next wave of reasoning research. Looking Ahead While TRM currently applies to supervised reasoning tasks, its recursive framework opens several future directions. Jolicoeur-Martineau has suggested exploring generative or multi-answer variants, where the model could produce multiple possible solutions rather than a single deterministic one. Another open question involves scaling laws for recursion -- determining how far the "less is more" principle can extend as model complexity or data size grows. Ultimately, the study offers both a practical tool and a conceptual reminder: progress in AI need not depend on ever-larger models. Sometimes, teaching a small network to think carefully -- and recursively -- can be more powerful than making a large one think once.
[2]
Samsung researchers create tiny AI model that shames the biggest LLMs in reasoning puzzles - SiliconANGLE
Samsung researchers create tiny AI model that shames the biggest LLMs in reasoning puzzles Researchers from Samsung Electronic Co. Ltd. have created a tiny artificial intelligence model that punches far above its weight on certain kinds of "reasoning" tasks, challenging the industry's longheld logic that "bigger means better." Released this week, the Tiny Recursive Model or TRM has just 7 million parameters, far fewer than most other AI models. Yet it shows that it can outperform powerful large language models such as Google LLC's Gemini 2.5 Pro on tough reasoning puzzles such as "Sudoku." Alexia Jolicoeur-Martineau, a senior researcher at the Samsung Advanced Institute of Technology AI Lab in Montreal, published a paper on arXiv that demonstrates how clever design can be more effective than simply increasing the number of parameters in AI models. It uses a special "recursive reasoning" process that enables it to think in "loops," going over the same problem repeatedly in order to improve its answers. The paper, titled "Less is More: Recursive Reasoning with Tiny Networks," reveals how TRM was designed specifically to tackle logic puzzles and reasoning challenges. It's not able to chat with humans, write stories or create images like other models can. But its narrow focus means it can solve some really hard problems with greater accuracy than its much larger counterparts. For instance, TRM achieved 87% accuracy on Sudoku-Extreme, a benchmark that challenges AI models to complete multiple "Sudoku" puzzles. It also racked up an 85% score on Maze-Hard, which tasks models with finding their way through complex mazes in the fastest time possible. And it scored 45% and 8% on the ARC-AGI-1 and ARC-AGI-2 benchmarks, which consist of more abstract reasoning puzzles designed to test for "general intelligence." In each of these tasks, TRM outperformed much larger models. For instance, Gemini 2.5 Pro could only score 4.9% on the ARC-AGI-2 test, while OpenAI's o3-mini-high scored just 3%, DeepSeek Ltd.'s R1 achieved just 1.3% and Anthropic PBC's Claude 3.7 could only muster a 0.7% score. TRM achieved this with less than 0.01% of the parameters used by the most powerful large language models. Rather than build a large neural network, Samsung's researchers looked at the possibility of using recursion, which is a technique humans can also use. Essentially, the model looks at its answer and asks itself, "Is it any good? If not, can I come up with a better answer?" It then attempts to solve the puzzle again, refining its answer, and repeats this process until it's satisfied. To do this, TRM maintains two short-term memories - it remembers the current solution, and also creates a kind of scratchpad to jot down the intermediate steps it takes to try and improve on that. At each step, the model updates the scratchpad by reviewing the task, the current solutions and its previous notes, before generating an improved output based on that information. It repeats this loop multiple times, gradually refining its answers, eliminating the need for lengthy reasoning chains that can only be handled by billions of parameters. Instead, only a small network of a few million parameters is required. The researchers stated in the paper that TRM is programmed to "recursively refine latent and output states without assuming convergence." What this means is that the model is not forced to settle on an answer too soon, but rather, allowed to keep repeating the loop until it's unable to improve its output any more. It uses an "adaptive halting" technique that allows it to figure out for itself when that happens, preventing it from running indefinitely. The model also employs deep supervision, which means it can obtain feedback at multiple steps of its reasoning process, instead of just at the end. This helps the model to learn more effectively, the authors said. Jolicoeur-Martineau said in a blog post that the research is significant because it demonstrates that small, highly targeted models can achieve excellent results on narrow, structured reasoning tasks, and it could be a significant development for the broader AI industry. The obvious benefit is that it makes powerful AI systems more accessible. The biggest LLMs with billions or even trillions of parameters can only be run on enormous clusters of specialized and expensive graphics processing units. These consume vast amounts of energy, which means that only a handful of rich companies and well-funded universities can experiment with them. But a model like TRM, which only has a few million parameters, can be run on commodity hardware, with a much lower energy footprint. It potentially opens the door for many more universities, startups and independent developers to experiment with advanced AI models and accelerate innovation. That said, Jolicoeur-Martineau's team pointed out that their findings don't mean LLMs are obsolete. TRM can only operate effectively when handling well-defined grid problems, and is not suitable for open-ended, text-based or multimodal tasks. Nonetheless, it represents a promising development, and the researchers plan to conduct further experiments to try and adapt recursive learning models to new domains.
[3]
Tiny Model from Samsung AI Lab Beats Gemini 2.5 Pro, o3-mini on ARC-AGI | AIM
A research study from the Samsung Advanced Institute of Technology AI Lab, Montreal, proposes a small AI model called Tiny Recursive Model (TRM). TRM is a 7-million-parameter model, which achieved 45% accuracy on the ARC-AGI 1 benchmark. This benchmark assesses the performance of AI models on human-like, abstract, and visual reasoning tasks. Notably, TRM scored higher than models such as Google's Gemini 2.5 Pro (37%), OpenAI's o3-mini-high (34.5%), and DeepSeek-R1 (15.8%). These models are significantly larger in terms of hundreds of billions of parameters. On the ARC-AGI-2 benchmark, the latest and most challenging iteration, TRM achieved 7.8% accuracy, whereas the Gemini 2.5 Pro scored 4.9%, and o3-mini-high scored 3%. Currently, xAI's Grok 4 leads both the ARC-AGI 1 and 2 benchmarks with 66.7% and 16% accuracy, respectively. Alexia Jolicoeur-Martineau, the author of the paper, confirmed on X that it took less than $500, and just 4 NVIDIA H-100 GPUs, and just two days to train the model. This is significantly less than what it takes to train large, billion-parameter general-purpose language models. "Yes, it's still possible to do cool stuff without a data centre," said Sebastian Raschka, an AI research engineer, reacting to the cost efficiency on X. Instead of relying on billions of parameters, TRM gets smarter by thinking in loops. Simply put, it begins with a rough answer, checks itself, and refines that answer through several incremental steps. "This recursive process allows the model to progressively improve its answer (potentially addressing any errors from its previous answer) in an extremely parameter-efficient manner while minimising overfitting," said the study. The result is a proven indication of the thesis that with architectural innovations, small models can reason better on specific tasks than large ones. And aptly, the study is titled 'Less is More'. For more details on how the model was built, its improvements over the hierarchical reasoning model, and additional information on the evaluations, please refer to the full technical report here. Several voices in the industry took to social media to react to the study, and many believe this can be a potentially huge AI breakthrough. Deedy Das, partner at Menlo Ventures, said in a post on X, "Most AI companies today use general-purpose LLMs with prompting for tasks. For specific tasks, smaller models may not just be cheaper, but far higher quality!". He added that startups could train models for under $1000 for specific subtasks like PDF extraction or time series forecasting. These models would enhance the general model, boost performance, and help build IP for automation tasks.
[4]
Samsung's Tiny AI Model Outperforms Huge LLMs Like Gemini 2.5 Pro On ARC-AGI Puzzles
Samsung's camera division might be bereft of any meaningful innovation at the moment, but the same can't be said of its AI efforts, aptly epitomized by its latest AI model, which just beat some of the other Large Language Models (LLMs) that are around 10,000x larger! In a paper titled "Less is More: Recursive Reasoning with Tiny Networks," Samsung has just detailed the novel architecture of its new Tiny Recursive Model (TRM), which relies on a single, 2-layered model: Samsung's approach, which is akin to a person re-reading their own draft, fixing mistakes with each read through, is quite superior to the more conventional approach, where LLMs often choke on logic problems if a single step goes wrong, collapsing their entire reasoning. Of course, chain-of-thought helps, but remains quite brittle. The takeaway: Keep it simple Samsung tried to increase the model's layers but found that the step decreased generalization due to overfitting. Decreasing the layers but increasing the number of recursions actually improved the TRM's overall performance. Results: Critically, Samsung's TRM either surpasses or closely matches the performance of various LLMs, including DeepSeek R1, Google's Gemini 2.5 Pro, and OpenAI's o3-mini, despite using only a very, very small proportion of their parameters.
Share
Share
Copy Link
Samsung researchers develop a 7-million-parameter AI model that outperforms much larger language models on specific reasoning tasks, challenging the 'bigger is better' paradigm in AI development.
Researchers at Samsung's Advanced Institute of Technology (SAIT) in Montreal have introduced a groundbreaking AI model that challenges the prevailing notion that bigger is always better in artificial intelligence. The Tiny Recursive Model (TRM), developed by Senior AI Researcher Alexia Jolicoeur-Martineau and her team, contains just 7 million parameters yet outperforms language models up to 10,000 times larger on specific reasoning tasks
1
.Source: Wccftech
TRM's success lies in its innovative architecture and use of recursive reasoning. Unlike traditional large language models, TRM employs a single two-layer model that recursively refines its own predictions
1
. This approach allows the model to simulate a much deeper architecture without the associated memory or computational costs.The model starts with an embedded question and an initial answer, then iteratively updates its internal representation and refines the answer until it converges on a stable output. This process can involve up to sixteen supervision steps, enabling progressively better predictions
1
.Despite its small size, TRM has demonstrated remarkable performance on various reasoning benchmarks:
1
3
These results surpass or closely match the performance of much larger models, including Google's Gemini 2.5 Pro, OpenAI's o3-mini, and DeepSeek R1
2
.Source: SiliconANGLE
TRM's small footprint offers significant advantages in terms of efficiency and accessibility. The model was trained in just two days using four NVIDIA H-100 GPUs, costing less than $500
3
. This efficiency opens up possibilities for universities, startups, and independent developers to experiment with advanced AI models without the need for expensive hardware or massive energy consumption2
.Related Stories
The success of TRM challenges the industry's focus on developing ever-larger language models. Jolicoeur-Martineau argues that the idea of relying on massive foundational models trained by big corporations is a trap, and that there's currently too much emphasis on exploiting LLMs rather than exploring new directions
1
.This research demonstrates that small, highly targeted models can achieve excellent results on narrow, structured reasoning tasks. It suggests that recursive reasoning, rather than scale, may be the key to handling abstract and combinatorial reasoning problems
1
2
.While TRM's performance is impressive, it's important to note that the model is designed specifically for structured, visual, grid-based problems. It cannot perform general tasks like chatting, writing stories, or creating images
2
. However, this specialization allows it to excel in its targeted domain.The research opens up new possibilities for AI development, suggesting that startups could train specialized models for under $1000 for specific subtasks like PDF extraction or time series forecasting
3
. This approach could enhance general models, boost performance, and help build intellectual property for automation tasks.Summarized by
Navi
[1]
[2]
[3]
Analytics India Magazine
|