Anthropic releases Claude Opus 4.7, outperforming GPT-5.4 on coding benchmarks

Reviewed byNidhi Govil

26 Sources

Share

Anthropic has launched Claude Opus 4.7, its most capable publicly available AI model, with benchmark-leading scores on software engineering tasks. The large language model achieves 64.3% on SWE-bench Pro, surpassing OpenAI's GPT-5.4 at 57.7% and Google's Gemini 3.1 Pro at 54.2%. While positioned as less capable than the restricted Claude Mythos Preview, Opus 4.7 introduces new cybersecurity safeguards and delivers significant improvements in coding, vision, and multi-step reasoning.

Anthropic Unveils Claude Opus 4.7 With Benchmark-Leading Performance

Anthropic has released Claude Opus 4.7, its most capable generally available AI model, marking a significant leap in advanced software engineering capabilities just over two months after launching Opus 4.6. The new large language model achieves 64.3% on SWE-bench Pro, the benchmark testing a model's ability to resolve real-world software issues from open-source repositories, decisively outperforming OpenAI's GPT-5.4 at 57.7% and Google's Gemini 3.1 Pro at 54.2%

4

. On SWE-bench Verified, a curated subset of the benchmark, Opus 4.7 scores 87.6%, compared with 80.8% for its predecessor and 80.6% for Gemini 3.1 Pro

4

. The model is available immediately across Claude Pro, Max, Team, and Enterprise plans, as well as through Amazon Bedrock, Vertex AI, and Microsoft Foundry, priced at $5 per million input tokens and $25 per million output tokens

4

.

Source: VentureBeat

Source: VentureBeat

Enhanced Capabilities for Complex Multi-Step Workflows

Claude Opus 4.7 delivers a 14% improvement over Opus 4.6 on complex multi-step workflows while using fewer tokens and producing a third of the tool errors, according to Anthropic

4

. The AI model introduces multi-agent coordination, enabling it to orchestrate parallel workstreams rather than processing tasks sequentially—a capability that translates directly into throughput for enterprise users running simultaneous code review, document analysis, and data processing

4

. Anthropic states that users can now hand off their hardest coding work to Opus 4.7 with confidence, as the model handles complex, long-running tasks with rigor and consistency

2

. The model is the first Claude iteration to pass "implicit-need tests," where it must infer required tools or actions rather than receiving explicit instructions

4

. For agentic reasoning tasks, Opus 4.7 demonstrates improved resilience, designed to continue executing through tool failures that would have halted Opus 4.6, recovering and adapting rather than stopping

4

.

Source: Gizmodo

Source: Gizmodo

Improved Instruction Following and Enhanced Vision Capabilities

The new model takes instructions "literally," where previous models skipped or loosely interpreted prompts, according to Anthropic

1

. This improved instruction following reduces ambiguity but may require developers to adjust existing prompts, as tighter adherence reduces the creative or unexpected outputs that sometimes emerged from earlier versions

4

. For enhanced vision capabilities, Opus 4.7 processes images at resolutions up to 2,576 pixels on the long edge—more than three times the capacity of prior Claude models

4

. The improvement targets enterprise document analysis, where scanned contracts, technical drawings, and financial statements contain fine print that lower-resolution vision models often miss or hallucinate

4

. Anthropic also claims the model is more "tasteful and creative" when creating interfaces, documents, and slide decks, though specifics on what constitutes good versus bad taste remain undisclosed

1

.

Source: Inc.

Source: Inc.

Cybersecurity Safeguards as Testing Ground for Mythos-Class Models

While Claude Opus 4.7 is not as powerful as Claude Mythos Preview—Anthropic's most advanced model that excels at identifying security flaws and is restricted to select companies through Project Glasswing—it serves as a testing ground for new cybersecurity safeguards

3

. Anthropic is releasing Opus 4.7 with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses

1

. These protections represent a watered-down version of what will appear in Mythos-class models, with real-world deployment learnings informing Anthropic's eventual goal of a broad release

1

. The new model scored 73.1% on cybersecurity vulnerability reproduction benchmarks, a slight decrease from Opus 4.6's 73.8%, potentially reflecting the impact of these new safeguards

5

. The API and broader availability of Opus 4.7 allows Anthropic to test these protections at scale before deploying them in more capable models that could pose greater security risks if misused

3

.

Commercial Momentum Amid Intensifying Competition

The release arrives as Anthropic runs at a $30 billion annualized revenue rate and has attracted investor offers at roughly $800 billion, with early IPO discussions underway

4

. Opus 4.7 must justify these valuations not by winning every benchmark but by becoming the model that enterprises and developers choose to build on, according to industry observers

4

. Claude Code alone hit $2.5 billion in annualized revenue in February, and AI-assisted coding has become one of the fastest-growing categories in software

4

. On graduate-level reasoning measured by GPQA Diamond, the field has converged, with Opus 4.7 scoring 94.2%, GPT-5.4 Pro at 94.4%, and Gemini 3.1 Pro at 94.3%—differences within noise that indicate frontier models have effectively saturated this benchmark

4

. This convergence signals that competitive differentiation is shifting from raw reasoning scores toward applied performance on complex, multi-step tasks where Opus 4.7 claims advantages. For developers already using Claude as the default choice in tools like Cursor, where the model scored 70% on CursorBench compared to 58% for Opus 4.6, the improvements directly impact daily workflows

4

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo