Google Gemini 3.1 Pro doubles reasoning score, beats rivals in key AI benchmarks

Reviewed byNidhi Govil

27 Sources

Share

Google released Gemini 3.1 Pro with dramatically improved problem-solving and reasoning capabilities, more than doubling its predecessor's performance on abstract reasoning tests. The newest and more powerful large language model scored 77.1% on ARC-AGI-2 and achieved a record 44.4% on Humanity's Last Exam, outperforming OpenAI and Anthropic in most benchmarks as accelerating competition among major tech companies intensifies.

Google Unveils Gemini 3.1 Pro With Enhanced Reasoning Abilities

Google released Gemini 3.1 Pro on Thursday, marking another significant advancement in the accelerating competition among major tech companies developing advanced LLMs. The newest and more powerful large language model arrives just months after Gemini 3's November launch, bringing improved problem-solving and reasoning capabilities that position it as one of the most capable AI tools available today

1

2

.

Source: Geeky Gadgets

Source: Geeky Gadgets

The AI model is currently rolling out in preview for developers, enterprises, and consumers through multiple platforms including the Gemini app, NotebookLM, and the Gemini API

3

. Google describes the release as "a step forward in core reasoning," designed for complex tasks where simple answers prove insufficient

3

.

Gemini 3.1 Pro Doubled Its Reasoning Score on Critical Tests

The new model delivers dramatic improvements across industry-standard evaluations. On the ARC-AGI-2 benchmark, which tests novel logic problems that cannot be directly trained into AI systems, Gemini 3.1 Pro scored 77.1%—more than doubling Gemini 3's modest 31.1% performance

4

. This represents a significant leap that addresses a previous weakness where Gemini 3 lagged behind competing models that scored in the 50s and 60s

1

.

Google also announced record benchmark scores on Humanity's Last Exam, a rigorous test designed to measure advanced domain-specific knowledge. Gemini 3.1 Pro achieved 44.4%, surpassing Gemini 3 Pro's 37.5% and OpenAI's GPT 5.2 score of 34.5%

1

. The model also excelled in scientific knowledge assessments, scoring 94.3% on the GPQA Diamond test, ahead of GPT-5.2's 92.4% and Claude Opus 4.6's 91.3%

5

.

Source: GameReactor

Source: GameReactor

How Gemini 3.1 Pro Beats Rivals in Key AI Benchmarks

Across 19 major benchmarks, Gemini 3.1 Pro outperformed competition from OpenAI and Anthropic in 12 categories

5

. On ARC-AGI-2 specifically, it beat GPT-5.2's 52.9% and Claude Opus 4.6's 68.8%

5

. However, the competitive landscape remains nuanced. Claude Opus 4.6 currently edges out Gemini 3.1 Pro on the Arena leaderboard for text capabilities by four points at 1504, while multiple Anthropic and OpenAI models lead in coding benchmarks

1

.

Brendan Foody, CEO of AI startup Mercor, noted that "Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard," with the APEX system measuring how well models perform real professional tasks. He emphasized the results demonstrate "how quickly agents are improving at real knowledge work"

2

.

Deep Think Integration and Practical Applications

Google revealed that Gemini 3.1 Pro serves as the "core intelligence" behind last week's Deep Think tool upgrade, which focuses on tackling tough research challenges in chemistry, physics, math, and coding where problems lack clear solutions and data remains messy

4

. The company positioned the model as the "upgraded core intelligence that makes those breakthroughs possible"

4

.

Practical demonstrations showcase the model's versatility. Google highlighted its ability to create code-based animations including scalable SVG images from text prompts, generate entire websites based on literary characters, and build interactive 3D experiences like a starling murmuration with dynamic soundscapes

3

.

Source: VentureBeat

Source: VentureBeat

Access and Availability Across Platforms

Developers can access Gemini 3.1 Pro in preview through AI Studio, Android Studio, Google Antigravity, Vertex AI, and Gemini CLI

4

. Enterprise customers can try it in Vertex AI and Gemini Enterprise, while regular users will find it in NotebookLM and the Gemini app

4

. Free Gemini users have access but face usage limits before temporarily switching to another model, while paid AI Pro or AI Ultra subscribers enjoy higher usage thresholds

5

.

ZDNET senior contributing editor David Gewirtz cautioned that "model capabilities are ultimately relative," noting that while test numbers suggest substantial improvement over Gemini 3, true performance will only become clear through time and testing

4

. He added that the competitive landscape may shift again when OpenAI releases GPT 5.3, providing a more universal comparison point

4

. The release underscores how rapidly AI development progresses, with new models remaining impressive only in relative terms until the next lab releases its state-of-the-art upgrade.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo