Google launches Gemini 3.1 Pro with record benchmark scores in heated AI model race

18 Sources

Share

Google unveiled Gemini 3.1 Pro on Thursday, marking a significant advancement in AI model capabilities with enhanced reasoning and complex problem-solving. The large language model more than doubled its predecessor's performance on key benchmarks, scoring 77.1% on ARC-AGI-2 compared to Gemini 3's 31.1%. The release intensifies competition with OpenAI and Anthropic as tech giants race to dominate the AI landscape.

News article

Google Unveils Gemini 3.1 Pro with Enhanced Reasoning Capabilities

Google announced the release of Gemini 3.1 Pro on Thursday, positioning the AI model as a major step forward in core reasoning and complex problem-solving. The large language model arrives just months after Gemini 3 debuted in November, reflecting the accelerating pace of the AI model race among tech giants

1

. Available now in preview for developers and consumers, Gemini 3.1 Pro promises substantial improvements over its predecessor, outperforming its predecessor across multiple evaluation metrics

2

.

The release comes as competition intensifies among major AI labs, with OpenAI and Anthropic recently launching their own advanced models. According to CEO Sundar Pichai, Google's first-party models now process over 10 billion tokens per minute via direct API use, while the Gemini App has grown to over 750 million monthly active users

5

. This expansion demonstrates the growing demand for sophisticated AI tools capable of handling increasingly complex tasks.

Record Benchmark Scores Signal Major Performance Leap

Gemini 3.1 Pro achieved record benchmark scores across several industry-standard tests, with particularly impressive gains in reasoning tasks. On the ARC-AGI-2 benchmark, which tests novel logic problems that can't be directly trained into an AI model, Gemini 3.1 Pro scored 77.1% compared to Gemini 3's 31.1%

3

. This more than doubling of performance represents a significant advancement in the model's ability to tackle entirely new logic patterns

4

.

On Humanity's Last Exam, designed to measure advanced domain-specific knowledge, Gemini 3.1 Pro achieved 44.4%, surpassing Gemini 3 Pro's 37.5% and OpenAI's GPT 5.2 at 34.5%

1

. Brendan Foody, CEO of AI startup Mercor, noted that "Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard," adding that the model's impressive results demonstrate "how quickly agents are improving at real knowledge work"

2

.

Competitive Landscape Shows Claude Opus Still Leading in Some Areas

Despite the strong performance, Anthropic's Claude Opus 4.6 maintains an edge in certain categories. On the Arena leaderboard for text capabilities, Claude Opus 4.6 scores 1504, edging out Gemini 3.1 Pro by four points

1

. In code-related tasks, both Opus 4.6 and OpenAI's GPT 5.2 High run ahead of Google's latest offering. However, experts caution that the Arena leaderboard relies on user preferences, which can reward outputs that appear correct regardless of actual accuracy

1

.

The competitive dynamics underscore how model capabilities remain relative in the rapidly evolving AI competition. As ZDNET senior contributing editor David Gewirtz noted, "The test numbers seem to imply that it's got substantial improvement over Gemini 3, and Gemini 3 was pretty good, but I don't think we're really going to know right away"

4

. He emphasized that the upcoming release of GPT 5.3 will provide a more comprehensive comparison point.

Practical Applications and Availability for Developers

Google highlighted several practical use cases demonstrating Gemini 3.1 Pro's capabilities in creative and technical domains. The model can generate code-based animations, creating scalable SVG images ready for websites from simple text prompts. In one demonstration, it generated an entire website based on a character from Emily Brontë's Wuthering Heights, reimagined as a landscape photographer's portfolio

3

. Another example showcased a 3D interactive starling murmuration with dynamically generated soundscapes that respond to bird movements.

The model is now available for developers through the Gemini API in Google AI Studio, Android Studio, Google Antigravity, and Gemini CLI

4

. Enterprise customers can access it via Vertex AI and Gemini Enterprise, while consumers can use it through NotebookLM and the Gemini app for those with AI Pro or Ultra plans

3

. Notably, the model is also accessible through Microsoft services including GitHub Copilot, Visual Studio, and Visual Studio Code

5

.

Deep Think Integration Powers Scientific Breakthroughs

The announcement revealed that Gemini 3.1 Pro serves as the "core intelligence" behind Google's Deep Think tool upgrade announced last week

1

. Google positioned the model as the foundation enabling Deep Think's new capabilities in chemistry and physics, alongside enhanced performance in math and coding. The company described Deep Think as built to address "tough research challenges -- where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete"

4

. While Deep Think scored even higher on some benchmarks (84.6% on ARC-AGI-2 and 48.4% on Humanity's Last Exam), Google emphasized that Gemini 3.1 Pro provides the underlying intelligence making those scientific breakthroughs possible. This integration suggests Google's strategy focuses on building versatile foundation models that can power specialized applications across research and enterprise domains.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo