26 Sources
26 Sources
[1]
No, Anthropic's New Claude Opus 4.7 Model Is Not Mythos Preview
Anthropic on Thursday released a new AI model, and no, it's not Claude Mythos Preview. Claude Opus 4.7 is now generally available, meant to help developers and vibe coders with their hardest coding tasks. Opus 4.7, like a well-trained dog, is supposedly better at following instructions. Anthropic wrote in its blog post that Opus 4.7 takes instructions "literally," where previous models skipped or loosely interpreted prompts. It has improvements to its file-based memory system, so it should be able to recall information from previous sessions and documents. And it can handle larger image files and analyze data from charts more easily. Anthropic also said the model is more "tasteful and creative" when creating interfaces, documents and slide decks. There are no details on exactly what Anthropic considers bad versus good taste. Anthropic made waves earlier this month when it revealed it had created Claude Mythos Preview, its next-generation model, but the model was so good at finding security gaps that the company would be sharing it with tech and internet infrastructure companies -- like Cisco, CrowdStrike and Amazon Web Services -- so they could address the issues Mythos found. The idea is that if tech companies can improve their systems with the help of AI, they will be more resilient to cyberattacks by bad actors who can use publicly available AI models like everyone else. While Opus 4.7 isn't the same as Mythos, Anthropic is testing some of its new cybersecurity protections in Opus 4.7. These safeguards, which "automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses," are the watered-down version of what will be in "Mythos-class" models, the company's blog post said. But they're still important as cybersecurity becomes increasingly saturated with AI, both for defense and for attack.
[2]
Anthropic has revealed Claude Opus 4.7, and you can use it right now
This article covers a developing story. Continue to check back with us as we will be adding more information as it becomes available. As the LLM wars heat up, some of the biggest players in the AI market are quickly releasing new, more powerful models than the last. It's going at such a breakneck speed that it's easy to get lost, but with big companies seizing huge amounts of investment to work as fast as they can, the competition has to go as quickly as possible, else they'll be left behind. Anthropic know this as well as anyone, which is why it has released Opus 4.7. It claims that it's a lot more useful than Opus 4.6, which is amazing, given how the prior version came out just over two months ago. And if you'd like to try it for yourself, we're getting reports that people can access it right now. Anthropic releases Opus 4.7 to the public You should be able to try it right now As announced on the Anthropic website, the company has pulled back the curtain on Opus 4.7. The company states that you can give this new model a try right now; sure enough, several members of the XDA team report being able to select the model on the Claude website, sporting the description "Most capable for ambitious work." If you want an idea on how fast the world of AI is going, just check out the table above, which pits Opus 4.7 against the most recent version that released on February 5th. That's just over two months of work, and we're already seeing huge jumps in how well the model works. Anthropic wastes no time announcing all the things its new model can do: Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work -- the kind that previously needed close supervision -- to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back. Anthropic also claims that Opus 4.7 is better with vision, rendering things in higher resolution and "producing higher-quality interfaces, slides, and docs." And while Anthropic openly admits that Opus 4.7 isn't quite as mighty as its legendary Mythos model (which you can also see in the table above), this is still the strongest AI model you can use without being a part of a special group of companies.
[3]
Anthropic rolls out Claude Opus 4.7, an AI model that is 'broadly less capable' than Mythos
Claude Mythos Preview is Anthropic's most powerful AI model that excels at identifying weaknesses and security flaws within software. The company announced the model earlier this month and said it would roll out to a select group of companies as part of a new cybersecurity initiative called Project Glasswing. Anthropic said that while Claude Opus 4.7 is not as powerful as Claude Mythos Preview, it still shows improvements over Claude Opus 4.6, which the company announced in February. "We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses," Anthropic said in a release. "What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models."
[4]
Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro
In short: Anthropic has released Claude Opus 4.7, its most capable generally available model, with benchmark-leading scores on SWE-bench Pro (64.3% vs GPT-5.4's 57.7%), multi-agent coordination for hours-long workflows, 3x higher image resolution, and a 14% improvement in multi-step agentic reasoning with a third of the tool errors. Priced at $5/$25 per million tokens, it is available across Claude plans and through Amazon Bedrock, Vertex AI, and Microsoft Foundry. Anthropic has released Claude Opus 4.7, its most capable generally available model to date, with benchmark-leading performance in software engineering and agentic reasoning that widens the gap between Claude and both OpenAI's GPT-5.4 and Google's Gemini 3.1 Pro on the tasks that matter most to developers and enterprise users. The release comes at a moment when Anthropic's commercial momentum is difficult to overstate. The company is running at a $30 billion annualised revenue rate, has attracted investor offers at roughly $800 billion, and is in early IPO talks. Opus 4.7 is the model that has to justify those numbers, not by winning every benchmark, but by being the model that enterprises and developers choose to build on. The headline numbers are in software engineering. On SWE-bench Pro, the benchmark that tests a model's ability to resolve real-world software issues from open-source repositories, Opus 4.7 scores 64.3%, up from 53.4% on Opus 4.6 and well ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. On SWE-bench Verified, a curated subset, the score is 87.6%, compared with 80.8% for its predecessor and 80.6% for Gemini 3.1 Pro. CursorBench, which measures autonomous coding performance in the popular AI code editor, shows a similar jump: 70%, up from 58% on Opus 4.6. For a model that is already the default choice in Cursor and Claude Code, the improvement on the benchmark most directly tied to how developers actually use it is significant. Claude Code alone hit $2.5 billion in annualised revenue in February, and AI-assisted coding has become one of the fastest-growing categories in software. On graduate-level reasoning, measured by GPQA Diamond, the field has converged. Opus 4.7 scores 94.2%, GPT-5.4 Pro scores 94.4%, and Gemini 3.1 Pro scores 94.3%. The differences are within noise. The frontier models have effectively saturated this benchmark, which means the competitive differentiation is shifting away from raw reasoning scores and toward applied performance on complex, multi-step tasks. Opus 4.7's most consequential improvements may not be captured by any single benchmark. Anthropic says the model delivers a 14% improvement over Opus 4.6 on complex multi-step workflows while using fewer tokens and producing a third of the tool errors. It is the first Claude model to pass what Anthropic calls "implicit-need tests," tasks where the model must infer what tools or actions are required rather than being told explicitly. The model also introduces multi-agent coordination, the ability to orchestrate parallel AI workstreams rather than processing tasks sequentially. For enterprise users running Claude across code review, document analysis, and data processing simultaneously, this is the kind of capability that translates directly into throughput. Anthropic says Opus 4.7 is engineered to sustain focus over hours-long workflows, a claim that, if it holds, addresses one of the most common complaints about frontier models: that they lose coherence and precision on extended agentic tasks. Resilience is another emphasis. The model is designed to continue executing through tool failures that would have stopped Opus 4.6, recovering and adapting rather than halting. For automated pipelines where a single failure can cascade, this kind of robustness matters more than marginal benchmark gains. Opus 4.7 processes images at resolutions up to 2,576 pixels on the long edge, more than three times the capacity of prior Claude models. The improvement is aimed at enterprise document analysis, where scanned contracts, technical drawings, and financial statements often contain fine print and detail that lower-resolution vision models miss or hallucinate. The context window remains at one million tokens, half of Gemini 3.1 Pro's two million but sufficient for most enterprise use cases. On long-context research benchmarks, Opus 4.7 tied for the top overall score at 0.715 across six research modules and delivered what evaluators described as the most consistent long-context performance of any model tested. Anthropic notes that the model follows instructions more literally than its predecessors, a change that may require users to adjust existing prompts. This is a trade-off: tighter instruction-following reduces the ambiguity that sometimes produces creative or unexpected outputs, but it also reduces the hallucination and off-task behaviour that frustrates enterprise deployments. Opus 4.7 is available immediately on Claude Pro, Max, Team, and Enterprise plans, and through the API at $5 per million input tokens and $25 per million output tokens. Prompt caching offers up to 90% cost savings, and the Batch API provides a 50% discount on both input and output. The model is also available through Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry. The pricing is unchanged from Opus 4.6, which means Anthropic is delivering substantially better performance at the same cost. Gemini 3.1 Pro undercuts it at $2 and $12 per million tokens for input and output respectively, but Opus 4.7's lead on the benchmarks that enterprise buyers care about, particularly SWE-bench and agentic reasoning, may justify the premium for customers whose workloads demand the highest capability. Anthropic has also added cyber safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses, a nod to the dual-use concerns that led the company to restrict its more powerful Mythos model to just 11 organisations under Project Glasswing. Opus 4.7 is not a paradigm shift. It is a meaningful improvement across every dimension that matters to the people who pay for Claude: better coding, better agentic reasoning, better vision, better instruction-following, and better resilience on long tasks. The model does not win every benchmark against every competitor, but it wins convincingly on the ones most directly tied to real-world productivity. For Anthropic, the release reinforces the position that has driven its extraordinary revenue growth. Claude is the model that developers and enterprises reach for when they need reliable, high-quality output on complex work. Opus 4.7 extends that lead at a moment when the company's commercial trajectory depends on it. The competition is close, and closing. But for now, on the tasks that generate the most revenue, Anthropic has the best model on the market.
[5]
Anthropic Releases Claude Opus 4.7 to Remind Everyone How Great Mythos Is
Anthropic announced Thursday the release of its latest AI model, Claude Opus 4.7, which the company is calling a "notable improvement" over Opus 4.6 but "less broadly capable" than the to-dangerous-to-be-released Opus Mythos Preview. Claude Opus 4.7 is something of a doubling down on what Anthropic's models are already good at. Per the company, the latest iteration of its flagship option comes with jumps in performance on coding, engineering, and multi-step tasks. The company claims it is "more thorough and consistent on difficult work, with better results across professional knowledge work." As with every new model release, this one comes with a fresh set of benchmarking tests to prove its prowess. Claude Opus 4.7 has retaken the top spot for agentic coding among publicly available models, scoring 64.3% on SWE-bench Pro and SWE-bench Verifiedâ€"two of the main tests of a model's capabilities of handling complex engineering tasks. Claude Opus 4.7 also improved on 4.6's standard for agentic computer use (i.e., autonomously navigating across an operating system to complete tasks), and graduate-level reasoning, among other categories. Interestingly, Claude Opus 4.7 represents a slight backsliding compared to Claude Opus 4.6 in cybersecurity vulnerability reproduction. The new model scored 73.1% in benchmarking tests, compared to the previous iteration scoring 73.8%. Per Anthropic, the new model introduces "safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses," so perhaps that has slightly dampened the performance. It's hard to ignore the fact that the release of Claude Opus 4.7 reads as a promotion for Claude Mythos Preview, the company's model that is so powerful that it's currently only inviting specific organizations to use it. The benchmarking test shows Mythos blowing away every other major model in just about every single test that it participated in. Anthropic can't help but compare everything to it, even at the expense of talking up its latest release. "We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview," the company wrote in the blog post for today's model update. At another point, the company describes Opus 4.7 as "less broadly capable than our most powerful model, Claude Mythos Preview." Per Anthropic, Claude Opus 4.7 will be available starting today across all Claude products and through the company's API, with no change in price compared to previous models. So check it out if you want to use the watered-down version of the product that Anthropic really wants you to be thinking about.
[6]
Anthropic reveals new Opus 4.7 model with focus on advanced software engineering - 9to5Mac
Anthropic has announced its latest AI model with Claude Opus 4.7. The new version arrives two months after the previous model upgrade, matching Anthropic's previous upgrade cadence. Claude Opus 4.7 is the latest generally available version of Anthropic's AI with a focus on advanced software development. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work -- the kind that previously needed close supervision -- to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back. Anthropic says its model has better vision and more taste for creating higher-quality work. The model also has substantially better vision: it can see images in greater resolution. It's more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. The company shows favorable benchmarks across a range of uses, including agentic coding and computer use, that put Opus 4.7 ahead of 4.6, GPT-5.4, and Gemini 3.1 Pro, but behind the more broadly capable Claude Mythos Preview. However, Mythos isn't generally available like Opus 4.7 since Anthropic is only sharing it with key software platform vendors like Apple. You can see the benchmark comparison table in Anthropic's blog post here. Anthropic highlights improvements to instruction following, multimodal support, real-world work, and memory as other improvements in Opus 4.7. "Opus 4.7 is better at using file system-based memory," the company says. "It remembers important notes across long, multi-session work, and uses them to move on to new tasks that, as a result, need less up-front context." Notably, Anthropic has established a more predictable cadence for directly upgrading its Claude Opus model. Opus 4.7 arrives two months after Opus 4.6, which arrived two months after Opus 4.5. There was a three month gap between Opus 4.1 and Opus 4.5. Anthropic's announcement includes a note to users about how token usage is handled with Optus 4.7: Opus 4.7 is a direct upgrade to Opus 4.6, but two changes are worth planning for because they affect token usage. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens -- roughly 1.0-1.35× depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens. The company has a separate post detailing migration, and the Claude Opus 4.7 System Card is available here. In addition to new models, Anthropic has been iterating on Claude Code, part of the Claude Mac app, in recent weeks:
[7]
I tested Anthropic's new Claude Opus 4.7 -- and it's the first AI that actually 'reasons' through tasks
7 stress tests to see if Claude Opus 4.7 can really code, research, and design autonomously Anthropic just released Claude Opus 4.7, and after testing it, I can say that the difference I noticed immediately is how the model listens and responds to my questions. I put the latest update through its paces with a battery of tasks designed to trip it up: autonomous coding, self-verifying research and even some home design advice. What I found was a model that's moving away from being a mere chatbot and toward being a reliable digital architect. If you're used to AI taking 'creative liberties' with your requests, Opus 4.7 is about to be a major wake-up call. Here are seven prompts that highlight what the model can do now. 1. Autonomous engineering Prompt: "Build me a full task-tracking web app with drag-and-drop columns, filters, and local storage. Don't ask me clarifying questions, just ship it." Within a few minutes, Claude Opus 4.7 built a single-file HTML task board with drag-and-drop for both tasks (between and within columns) and column reordering, plus filtering by priority, due date, labels and live search. Everything persists to localStorage, and there's export/import/reset for moving data around. The model did exactly what I asked to create a sandboxed Linux environment where it can write files and then share them with me via the outputs directory, which is how I got a downloadable .html file instead of just a code block in chat. 2. Self-verification Prompt: "Research which electric SUVs have the best real-world range in cold weather, then check your own answer and flag anything you're not sure about before you give it to me." The model took my two-part request seriously and separated what it knows from what it was guessing then built the response around that split. It ran a web search for recent info and then fetched the full Recurrent study directly. After that, it organized the answer around one authoritative ranking (using Recurrent's actual ranked list and their specific findings). Lastly, it ran a self-check and split uncertainties into categories. 3. High-resolution vision Prompt: "Here's an image of my kitchen. Tell me everything you notice about the layout, lighting, and what's on the counters, then suggest three changes." The AI described the layout, lighting and surface contents of my breakfast nook based on what was visible in the photo, then offered three suggested changes. The suggestions focused on adding layered lighting, editing the pillow arrangement and restyling the tabletop centerpiece. Then, it closed by offering to go deeper or recommend specific products. Not sure if I will implement these changes, but it was interesting to see what Claude would do. 4. Creative 'taste' Prompt: "Write me a one-page cover letter for a senior product manager role at a climate-tech startup. I want it to sound like a human wrote it." This was a really interesting test, especially because I am not actually applying for a job. I told Claude to "make something up" about me and it pushed back several times (something ChatGPT rarely does). It wanted real specifics. Once I gave some details it then drafted it using my actual words about climate and tech working together. It left bracketed placeholders for professional wins and included a candid paragraph about the corporate-to-startup trade to make it read as human-written. Seriously impressive! 5. Taste with autonomous engineering Prompt: "Design a an app for my ready-to-eat cold pizza company 'Crusted.' Make it look like something a real design studio would ship, not a generic SaaS template." The model built a single-screen ordering app mockup for Crusted with an editorial/deli aesthetic. The app has a warm paper palette, stylish fonts and CSS-only pizza illustrations that vary by variety. After the model finished it suggested that I do this type of project in Claude Cowork next time. 6. Vision and self-verification Prompt: "Take this messy PDF research paper and extract the key findings, cross-check the numbers in the charts against the text and flag any inconsistencies." The first upload for this prompt, I purposely made a mistake to see if Claude would recognize it. Sure enough, it did immediately and even said, "I can't go any further. This appears to be the wrong document." Once I uploaded a whitepaper on Insomnia and Mental Health it extracted the main statistical claims and flagged problems within the text and even said the images were wrong/didn't match the text well enough. Finally, it noted that the information should be cross-checked, which was a correct suggestion as the PDF is several years old and just happened to be one of the files on my computer. 7. Decision making Prompt: "I'm trying to decide between three job offers. Aks me the questions that actually matter, then give me your real recommendation." This prompt is completely hypothetical, but I could see it being useful for someone with a tough decision. Claude immediately asks questions in a multiple choice fashion. They can be answered or skipped (at which time Claude will ask a new question). Each question I answered led to Claude diving deeper into the decision process. For tricky life scenarios, this seems like it could be a good place to start to help lay all the pros and cons on the table. The takeaway After stress-testing these prompts, the conclusion is that Opus 4.7 is effectively the most sophisticated AI currently available to the public. It has crossed the threshold from a reactive tool to a genuine collaborator. It's obvious that there is a huge degree of 'thought' behind its outputs and a level of discernment evident whether it was pushing back on my fictional job details or identifying errors in a legacy PDF. We're paying a higher 'token tax' for this version, but given the autonomy and self-correction on display, it's an easy trade-off to make. Have you tried it yet? Let me know in the comments what you think of this new model. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.
[8]
Anthropic: Claude Opus 4.7 has a 92% honesty rate, fewer hallucinations
Anthropic released a new hybrid reasoning model on Thursday: Claude Opus 4.7. Anthropic has a reputation as a safety-first AI company, and the Opus 4.7 system card reports that the model is less likely to hallucinate or engage in sycophancy than both prior Anthropic models and other frontier AI models. We dived into the Opus 4.7 system card to see exactly what Anthropic had to say about the model's safety, honesty, and sycophancy. Don't miss out on our latest stories: Add Mashable as a trusted news source in Google. Why put the TL;DR version at the end? Anthropic says Claude Opus 4.7 makes improvements on various types of hallucinations and overall honesty. Anthropic also gave the new model top marks on sycophancy and encouragement of user delusions. (Anthropic also reports that Opus 4.7 scores much better on these behaviors than Gemini 3.1 Pro and Grok 4.20.) "Claude Opus 4.7 is more reliably honest than Opus 4.6 or Sonnet 4.6, with large reductions in the rate of important omissions, and moderate improvements in factuality and rates of hallucinated input," Anthropic reports. Want to learn more about getting the best out of your tech? Sign up for Mashable's Top Stories and Deals newsletters today. Anthropic measures Claude's honesty and hallucination rates in multiple ways, but let's look at one representative example -- the Model Alignment between Statements and Knowledge (MASK) benchmark. MASK was developed by Scale AI and the Center for AI Safety. Claude Opus had a MASK honesty rate of 91.7 percent, compared to 90.3 percent for Opus 4.6 and 89.1 percent for Sonnet 4.6. While that's lower than the 95.4 percent score achieved by Claude Opus 4.5, the new model performs better on other hallucination scores (more on that below). Interestingly, Claude Mythos was more honest still, with an honesty rate of 95.4 percent. Since Anthropic repeatedly compares Opus 4.7 to Claude Mythos, let's quickly review the differences between the two models. Claude Opus 4.7 is the latest hybrid reasoning model available to paid Claude subscribers. Claude Mythos is an unreleased model that Anthropic has only made available to partners via Project Glasswing. Under normal circumstances, we would expect Claude Opus 4.7 to be Anthropic's most advanced and powerful model to date. However, Anthropic says it lags behind the unreleased Claude Mythos in key areas. Because of its advanced cybersecurity capabilities, Anthropic deemed Claude Mythos too dangerous to release to the public. Still, Claude Opus 4.7 improves upon Opus 4.6 in many ways, particularly advanced coding, visual intelligence, and document analysis, Anthropic says. When using Opus 4.7, how likely is Claude to tell a lie, invent facts, or deceive users? There isn't a single hallucination rate that Anthropic provides, because there are multiple types of hallucinations. So, this section is for the AI nerds. Anthropic identifies a few different ways to measure hallucination and honesty: We've already covered the MASK honesty rate, and Claude Opus 4.7 shows similar gains on these other measures, according to Anthropic. At this time, we cannot independently verify Anthropic's results. To measure factual hallucinations, Anthropic used four different tests and recorded correct responses, incorrect responses, and abstentions. In this case, abstentions are good -- the model should decline to answer a question rather than guessing. Across all four tests, Opus 4.7 scored higher than Opus 4.6 and Sonnet 4.6 but lower than Claude Mythos. Anthropic measured Opus 4.7's input hallucination in two ways: "prompts requesting an unavailable tool" and "prompts referencing missing context." Opus 4.7 scored 89.5 percent on the former, beating Claude Mythos's 84.8 percent; on the latter, Opus 4.7 scored 91.8 percent, two points lower than Claude Mythos's 93.8 percent. This shows just how stubborn AI hallucinations are, with even leading AI companies like Anthropic recording input hallucination rates around 90 percent. Anthropic's reported hallucination rates are similar to the latest OpenAI models, which provide responses with incorrect information up to 5.8 percent of the time (with browsing enabled) to 10.9 percent (browsing disabled), per OpenAI. What about Opus 4.7's honesty rate for false premises, i.e., will Claude tell a user they're wrong? According to the system card, Claude will push back on false premises 77.2 percent of the time. That's better than all other recent Anthropic models except for -- you guessed it -- Claude Mythos, which will reject false premises 80 percent of the time. There's not much new to report in terms of sycophancy. While Anthropic's expert red-team testers reported that Opus 4.7 was prone to "sycophantic agreement under pushback," it has very similar scores to prior models from Anthropic and OpenAI, and noticeably lower scores than Gemini 3.1 Pro and Grok 4.20. Again, this is according to Anthropic. To measure bad behaviors like sycophancy and "encouragement of user delusion," Anthropic uses Petri 2.0, its open-source behavioral audit tool. This test scores models on a 1-10 scale, with lower scores reflecting better behavior. The Petri score isn't akin to a percentage, as it measures both the rate of a behavior and the severity. Anthropic scored Opus 4.7 highly (or, lowly, with this particular scale) on both sycophancy and user delusions. Mashable reached out to Anthropic for comment but did not receive a response in time for publication.
[9]
Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM
Anthropic is publicly releasing its most powerful large language model yet, Claude Opus 4.7, today -- as it continues to keep an even more powerful successor, Mythos, restricted to a small number of external enterprise partners for cybersecurity testing and patching vulnerabilities in the software said enterprises use (which Mythos exposed rapidly). The big headlines are that Opus 4.7 exceeds its most direct rivals -- OpenAI's GPT-5.4, released in early March 2026, scarcely more than a month ago; and Google's latest flagship model Gemini 3.1 Pro from February -- on key benchmarks including agentic coding, scaled tool-use, agentic computer use, and financial analysis. But also, it's notable how tight the race is getting: on directly comparable benchmarks, Opus 4.7 only leads GPT-5.4 by 7-4. It currently leads the market on the GDPVal-AA knowledge work evaluation with an Elo score of 1753, surpassing both GPT-5.4 (1674) and Gemini 3.1 Pro (1314). Yet, the model does not represent a "clean sweep" across all categories. Competitors like GPT-5.4 and Gemini 3.1 Pro still hold the lead in specific domains such as agentic search, where GPT-5.4 scores 89.3% compared to Opus 4.7's 79.3%, as well as in multilingual Q&A and raw terminal-based coding. This positioning defines Opus 4.7 not as a unilateral victor in all AI tasks, but as a specialized powerhouse optimized for the reliability and long-horizon autonomy required by the burgeoning agentic economy. Claude Opus 4.7 is available today across all major cloud platforms, including Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry, with API pricing held steady at $5/$25 per million tokens. Claude Opus 4.7 is a direct evolution of the Opus 4.6 architecture, but its performance delta is most visible in the "hard" sciences of agentic workflows: software engineering and complex document reasoning. At its core, the model has been re-tuned to exhibit what Anthropic describes as "rigor". This isn't just marketing parlance; it refers to the model's new ability to devise its own verification steps before reporting a task as complete. For example, in internal tests, the model was observed building a Rust-based text-to-speech engine from scratch and then independently feeding its own generated audio through a separate speech recognizer to verify the output against a Python reference. This level of autonomous self-correction is designed to reduce the "hallucination loops" that often plague earlier iterations of agentic software. The most significant architectural upgrade is the move to high-resolution multimodal support. Opus 4.7 can now process images up to 2,576 pixels on their longest edge -- roughly 3.75 megapixels. This represents a three-fold increase in resolution compared to previous iterations. For developers building "computer-use" agents that must navigate dense, high-DPI interfaces or for analysts extracting data from intricate technical diagrams, this change effectively removes the "blurry vision" ceiling that previously limited autonomous navigation. This visual acuity is reflected in benchmarks from XBOW, where the model jumped from a 54.5% success rate in visual-acuity tests to 98.5%. On the benchmark front, Opus 4.7 has claimed the top spot in several critical categories: Crucially, Anthropic warns that this increased precision requires a shift in how users approach prompting. Opus 4.7 follows instructions literally. While older models might "read between the lines" and interpret ambiguous prompts loosely, Opus 4.7 executes the exact text provided. This means that legacy prompt libraries may require re-tuning to avoid unexpected results caused by the model's strict adherence to the letter of the request. The "agentic" nature of Opus 4.7 -- its tendency to pause, plan, and verify -- comes with a trade-off in token consumption and latency. To address this, Anthropic is introducing a new "effort" parameter. Users can now select an xhigh (extra high) effort level, positioned between high and max, allowing for more granular control over the depth of reasoning the model applies to a specific problem. Internal data shows that while max effort yields the highest scores (approaching 75% on coding tasks), the xhigh setting provides a compelling sweet spot between performance and token expenditure. To manage the costs associated with these more "thoughtful" runs, the Claude API is introducing "task budgets" in public beta. This allows developers to set a hard ceiling on token spend for autonomous agents, ensuring that a long-running debugging session doesn't result in an unexpected bill. These product changes signal a maturing market where AI is no longer a novelty but a production line item that requires fiscal and operational guardrails. Furthermore, Opus 4.7 utilizes an updated tokenizer that improves text processing efficiency, though it can increase the token count of certain inputs by 1.0-1.35x. Within the Claude Code environment, the update brings a new command. Unlike standard code reviews that look for syntax errors, is designed to simulate a senior human reviewer, flagging subtle design flaws and logic gaps. Additionally, "auto mode" -- a setting where Claude can make autonomous decisions without constant permission prompts -- has been extended to Max plan users. Anthropic continues to walk a narrow line regarding cybersecurity. The recent announcement of the aforementioend cybersecurity partnership around Mythos with external industry partners -- known as "Project Glasswing" -- highlighted the dual-use risks of high-capability models. Consequently, while the flagship Mythos Preview model remains restricted, Opus 4.7 serves as the testbed for new automated safeguards. The model includes systems designed to detect and block requests that suggest high-risk cyberattacks, such as automated vulnerability exploitation. To bridge the gap for the security industry, Anthropic is launching the Cyber Verification Program. This allows legitimate professionals -- vulnerability researchers, penetration testers, and red-teamers -- to apply for access to use Opus 4.7's capabilities for defensive purposes. This "verified user" model suggests a future where the most capable AI features are not universally available, but gated behind professional credentials and compliance frameworks. In cybersecurity vulnerability reproduction (CyberGym), Opus 4.7 maintains a 73.1% success rate, trailing Mythos Preview's 83.1% but leading GPT-5.4's 66.3%. Early testimonials from enterprise customers shared by Anthropic indicate there has been a tangible shift in model perception of Opus 4.7 from 4.6, going from "impressed by the tech" to "relying on the output". Clarence Huang, VP of Technology at Intuit, noted that the model's ability to "catch its own logical faults during the planning phase" is a game-changer for velocity. This sentiment was echoed by Replit President Michele Catasta, who stated that the model achieved higher quality at a lower cost for tasks like log analysis and bug hunting, adding, "It really feels like a better coworker". Other specific reactions included: Perhaps the most telling reaction came from Aj Orbach, CEO of a dashboard-building firm, who remarked on the model's "design taste," noting that its choices for data-rich interfaces were of a quality he would "actually ship". For enterprise leaders, Claude Opus 4.7 represents a shift from generative AI as a "creative assistant" to a "reliable operative." But importantly, it is not a "clean win" for every use case. Instead, it is a decisive upgrade for teams building autonomous agents or complex software systems. The primary value proposition is the model's new capability for self-verification and rigor; it no longer just generates an answer but creates internal tests to verify that the answer is correct before responding. This reliability makes it a superior choice for long-horizon engineering tasks where the cost of human supervision is the primary bottleneck. However, an immediate, wholesale migration from Opus 4.6 requires caution. The model's increased literalism in instruction following means that prompts engineered to be "loose" or conversational with previous versions may now produce unexpected or overly rigid results. Furthermore, enterprises must prepare for a significant increase in operational costs. Opus 4.7 uses an updated tokenizer that can increase input token counts by 1.0-1.35x, and its tendency to "think harder" at high effort levels results in higher output token consumption. For legacy applications where prompts are fragile and margins are thin, a phased rollout with significant re-tuning is recommended. This release arrives at a paradoxical moment for Anthropic. Financially, the company is an undisputed juggernaut, with venture capital firms reportedly extending investment offers at a staggering $800 billion valuation -- more than double its $380 billion Series G valuation from February 2026. This momentum is fueled by explosive growth, with the company's annual run-rate revenue skyrocketing to $30 billion in April 2026, driven largely by enterprise adoption and the success of Claude Code. Yet, this commercial success is being contested by intense regulatory and technical friction. Anthropic is currently embroiled in a high-stakes legal battle with the U.S. Department of War (DoW), which recently labeled the company a "supply chain risk" after Anthropic refused to allow its models to be used for mass surveillance or fully autonomous lethal weapons. While a San Francisco judge initially blocked the designation, a federal appeals panel recently denied Anthropic's bid to stay the blacklisting, leaving the company excluded from lucrative defense contracts during an active military conflict. Simultaneously, Anthropic is fending off a growing rebellion from its most loyal power users. Despite the company's "market leader" status, developers have flooded GitHub and X with accusations of "AI shrinkflation," claiming that the preceding Opus 4.6 model and Claude Code product have been quietly degraded. Users report that recent versions are more prone to exploration loops, memory loss, and ignored instructions, leading some to describe the newly released Claude Code desktop app as "unpolished" and unbefitting a firm with a near-trillion-dollar valuation. Opus 4.7 is Anthropic's attempt to silence these critics by proving that "deep thinking" can be paired with the rigorous execution that its enterprise clients now demand. Ultimately, Opus 4.7 is a model defined by its discipline. In a market where models are often incentivized to be "helpful" to a fault -- sometimes hallucinating answers to please the user -- Opus 4.7 marks a return to rigor. By allowing users to control effort, set budgets, and verify outputs, Anthropic is moving closer to the goal of a truly autonomous digital labor force. For the engineering teams at Replit, Notion, and beyond, the shift from "watching the AI work" to "managing the AI's results" has officially begun.
[10]
Anthropic releases Claude Opus 4.7, its most capable public model
Anthropic released Claude Opus 4.7, its most capable generally available AI model, while acknowledging the model is "less broadly capable" than Claude Mythos Preview, a more powerful system the company has declined to release publicly. According to Anthropic, the release marks an advancement over Claude Opus 4.6 in several areas, among them software engineering, adherence to instructions, and the ability to carry out practical tasks. It also supports higher-resolution images -- up to 2,576 pixels on the long edge, more than three times the limit of prior Claude models -- and includes a new "xhigh" effort level that gives users finer control over the tradeoff between reasoning depth and response speed. During training, Anthropic took deliberate steps to pull back on what the model can do in cybersecurity contexts, a process the company characterized as working to "differentially reduce" those capabilities relative to Mythos Preview. Built into the release are automated protections intended to intercept queries that fall into prohibited or elevated-risk cybersecurity categories before they can be acted on. Researchers and practitioners who want to apply the model to sanctioned security work have been pointed toward a dedicated application pathway, the Cyber Verification Program. The release connects directly to Project Glasswing, Anthropic's cybersecurity initiative announced earlier this month, which brought in partners including AWS, Apple $AAPL, Microsoft $MSFT, Google $GOOGL, and Cisco $CSCO to test Claude Mythos Preview. That model, described in internal materials as "by far the most powerful AI model we've ever developed," has been made available only to a select group of companies. Anthropic has said its goal is to use what it learns from deploying less capable models -- such as Opus 4.7 -- to work toward a broader release of Mythos-class systems. The existence of Mythos first became known after draft materials describing the model were left in a publicly accessible data store on Anthropic's website. Those documents described it as more advanced in cybersecurity tasks than any competing AI model and warned it could allow attacks to scale faster than defenders could respond. Pricing for the new model holds at $5 per million input tokens and $25 per million output tokens, matching what Anthropic charged for Opus 4.6. Access is available through the company's consumer-facing Claude products, its API, and a range of third-party cloud infrastructure including Amazon $AMZN Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry. Anthropic also noted that Mythos Preview remains the best-aligned model the company has trained according to its own evaluations, and that Opus 4.7's alignment assessment concluded the model is "largely well-aligned and trustworthy, though not fully ideal in its behavior." Two additional developer features launched alongside the model: task budgets in public beta, which allow developers to guide how Claude allocates token spend across longer runs, and a new "/ultrareview" command in Claude Code that produces a dedicated review session to flag bugs and design issues.
[11]
Anthropic Preps Opus 4.7 and Full-Stack AI Studio -- While Sitting on Something Much Scarier - Decrypt
The industry still can't reliably measure AI improvements, making claims about Opus 4.7's gains hard to verify. Anthropic is gearing up to release Claude Opus 4.7 alongside a new AI-powered design tool that lets users build websites, presentations, and landing pages with plain English prompts -- news that caused a dip in Adobe, Wix, and Figma shares on Monday, according to The Information. The products could drop as soon as this week, a person with knowledge of the plans told The Information. The design tool targets developers and non-technical users alike, putting it on a collision course with startups like Gamma and Google's Stitch. Anthropic did not respond to Decrypt's request for comment. Opus 4.7 isn't even Anthropic's most powerful model. That title belongs to Claude Mythos -- a cybersecurity-focused beast the company is quietly handing to select security firms while keeping it away from the public. The UK's AI Security Institute recently evaluated Mythos Preview and found it can autonomously execute sophisticated cyber attacks at rates no other model has matched. It became the first AI to complete "The Last Ones," a 32-step corporate network attack simulation that typically takes human red teams 20 hours. Mythos nailed it in three out of ten attempts, averaging 22 of 32 steps -- compared to Opus 4.6's 16. This matters beyond enterprise security. Measuring what AI can actually do has become an industry-wide headache. OpenAI recently called the leading coding benchmark "contaminated," yet models continue to be compared using those same tests. A separate ARC-AGI-3 evaluation saw Gemini score 0.37% and GPT-5.4 hit 0.26% -- while humans got 100%. The result is a landscape where benchmarks are both contested and still used as evidence, making it difficult to contextualize claims about Opus 4.7's gains until Anthropic releases a detailed model card. The relationship between Opus and Mythos is closer than most realize. Anthropic builds its frontier models by fine-tuning atop the Opus line -- the same backbone powering public Claude products gets stress-tested and hardened into Mythos. Opus 4.7 is the foundation that eventually gets the cybersecurity kung fu beaten into it. Also, Anthropic's efforts have been steering more towards the development/enterprise use case. The leak of Claude code, the release of the skills system and MCP protocol, the focus on agentic AI and the care on coding benchmarks make this even more apparent. While Anthropic hasn't formally announced it, the leaks reinforce the broader shift from LLM provider to something that resembles a full-stack "AI studio" model, where Claude doesn't just generate text but builds and deploys complete products.
[12]
Anthropic just released a 'Civilian' version of its 'Mythos' AI that's too dangerous for the public
Opus 4.7 was purposely 'nerfed' to meet Project Glasswing safety standards Today, Anthropic officially released Claude Opus 4.7, the most powerful AI model available to the general public. On paper, it is promised to be a beast: a notable leap in advanced software engineering, substantially better vision for analysis capabilities and a new "self-verification" mode that allows it to audit its own work before it reports back to the user. But there is a shadow hanging over this launch. For the first time in the history of frontier AI, a company has admitted to purposely making a model dumber in order to protect the world from it. Let me explain. Opus 4.7 is the 'civilian-safe' version of the Mythos model To truly get why the release of Opus 4.7 is such a milestone, you first have to understand the implications of Anthropic's Claude Mythos Preview. I'm mentioning it alongside today's launch mainly because Mythos remains the company's most powerful model. However, its release is strictly limited to cyber defenders and critical infrastructure partners. While Opus 4.7 is a "notable improvement" over previous versions, it is fundamentally a secondary tier. In the release notes for Opus 4.7, Anthropic dropped a bombshell stating that during the training of Opus 4.7, the team experimented with efforts to "differentially reduce" the model's cyber-offensive capabilities. For you and me, that means the company intentionally nerfed the model's ability to be used as a digital weapon. Project Glasswing and the first real-world test Opus 4.7 serves as the first live guinea pig for Project Glasswing, the security initiative Anthropic unveiled last week. This framework introduces automated safeguards that detect and block prohibited or high-risk cybersecurity requests in real-time. For the average developer, this means a more helpful assistant. For the security community, it means a gatekeeper. If you are a professional researcher, you can no longer access these features anonymously. You must now apply for Anthropic's new Cyber Verification Program. That move effectively puts "Frontier AI" behind a background check. Opus 4.7 upgrades Even with its wings clipped in cybersecurity, Opus 4.7 is promised to be a massive upgrade for professional workflows. If you aren't trying to hack a mainframe, here is what you're getting: * Autonomous engineering: This new model makes it easier than ever to hand off your hardest coding work. Anthropic promises tasks that previously required "close supervision" can now be done with total confidence. * Self-verification: Opus 4.7 no longer just "guesses." It devises ways to verify its own outputs, running internal logical checks before reporting back. This is huge for hallucination reduction and fact-checking. * High-resolution vision: While image generation is still not part of Claude's features, the model can now see images in significantly greater resolution. This breakthrough could be useful for parsing complex technical diagrams, UI/UX mockups and even professional slides for your next presentation. * Creative "taste": Anthropic claims the model is more "tasteful" when generating professional documents, producing higher-quality interfaces and docs that feel less "AI-generated" and more human-refined. This is something I'm still eager to play around with, as it's been studied that "taste" is one of the hardest human aspects to replicate. The takeaway Claude Opus 4.7 is a "safe" powerhouse with pricing remaining the same as Opus 4.6: $5/M input tokens, $25/M output tokens. It promises to deliver a massive 3x increase in production task completion and nearly perfect vision accuracy (98.5%), all at the same price as its predecessor. However, I'm cautiously optimistic because the real story here is that it's the "civilian" version of Anthropic's secret Mythos model; purposefully limited in its hacking abilities to test a new era of gated, identity-verified AI. We've entered a new era of AI and I'll be watching (and reporting) closely. Have you tried it yet? Let me know in the comments what you think. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.
[13]
Anthropic releases Claude Opus 4.7: How to try it, benchmarks, safety
Anthropic has been shipping products and making news at a blistering pace in 2026, and on Thursday, the AI company announced the launch of Claude Opus 4.7. Claude Opus 4.7 is Anthropic's most intelligent model available to the general public. Notably, Anthropic said in a press release that Opus 4.7 is not as powerful as Claude Mythos, which Anthropic deemed too dangerous for public release. Claude Opus is a family of hybrid reasoning models capable of multi-step reasoning and advanced coding. Until the announcement of Claude Mythos on April 7, Claude Opus was considered Anthropic's most advanced series of AI models. Don't miss out on our latest stories: Add Mashable as a trusted news source in Google. Claude Opus 4.7 is available now via Claude AI, the Claude API, and Anthropic partners such as Microsoft Foundry. The new model is priced the same as Claude Opus 4.6. However, Anthropic noted that because "Opus 4.7 thinks more at higher effort levels," it uses more ouput tokens than its predecessor. Users can read more about how to optimize token usage in the Opus 4.7 migration guide. As expected, Claude Opus 4.7 offers improved capabilities across the board. In particular, Anthropic says Claude Opus 4.7 is better at advanced coding tasks, visual intelligence, and document analysis. Anthropic also says Opus 4.7 is "more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs." "Users report being able to hand off their hardest coding work -- the kind that previously needed close supervision -- to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back," reads an Anthropic blog post. Anthropic released a detailed model card outlining how Claude Opus 4.7 compares to other Anthropic models and frontier models from OpenAI, Google, and xAI. Opus 4.7 lags behind the unreleased Claude Mythos, which Anthropic reports scores significantly higher on common benchmarks such as Humanity's Last Exam. "Claude Opus 4.7 is less capable than Claude Mythos Preview on every relevant axis we measured and does not advance our capability frontier," the model card states." That means Claude Opus 4.7 is not evidence that AI development has accelerated beyond existing trend lines. On Humanity's Last Exam (without tools), Anthropic reports that Claude Opus 4.7 outperforms all other frontier models except Claude Mythos. With tools, GPT-5-4-Pro scored 58.7 percent compared to Opus 4.7's 54.7 percent. Mythos beat them both with 64.7 percent. Mashable has not independently verified these benchmark results. Full results are available in the Opus 4.7 model card. Overall, Anthropic scored Opus 4.7 above other leading models in some benchmarks, though Gemini 3.1 Pro and GPT-5-4 score higher in some areas. Anthropic also reports that Opus 4.7 shows a low risk of misaligned behaviors, with a similar risk profile as Opus 4.6. For example, Anthropic says Opus 4.7 is less likely to hallucinate and shows lower rates of reward hacking. "Claude Opus 4.7 is more reliably honest than Opus 4.6 or Sonnet 4.6, with large reductions in the rate of important omissions, and moderate improvements in factuality and rates of hallucinated input," the model card states.
[14]
Anthropic's Claude Opus 4.7 is finally here
Anthropic announced the release of its latest AI model, Claude Opus 4.7, which the company describes as a notable improvement over Opus 4.6 but less capable than the unreleased Opus Mythos Preview. The launch of Claude Opus 4.7 emphasizes enhancements in performance for tasks related to coding, engineering, and multi-step processes. The model reportedly shows improvements in thoroughness and consistency, particularly in challenging professional knowledge applications. Benchmarking tests confirm that Claude Opus 4.7 has regained the top position for agentic coding, achieving a score of 64.3% on both SWE-bench Pro and SWE-bench Verified. These are key metrics assessing a model's ability to manage complex engineering tasks. The new version also surpassed its predecessor, Opus 4.6, in agentic computer use and graduate-level reasoning metrics. However, the model recorded a minor decline in cybersecurity vulnerability reproduction, scoring 73.1%, down from 73.8% for Opus 4.6. Anthropic attributed this decrease to new safety measures that block requests associated with high-risk cybersecurity scenarios. The release of Claude Opus 4.7 also appears to support the promotion of the more powerful Claude Mythos Preview. This upcoming model has demonstrated exceptional performance in tests, significantly outperforming others. "We stated that we would keep Claude Mythos Preview's release limited and test new cyber safeguards on less capable models first," Anthropic noted. They added that Opus 4.7 is the first model to have these new safeguards implemented. Claude Opus 4.7 is now available across all Claude products and through the company's API, with pricing unchanged from previous versions.
[15]
Anthropic's Claude Opus 4.7 Is Here, and It's Already Outperforming Gemini 3.1 Pro and GPT 5
This comes shortly after Anthropic launched Claude Opus 4.6 in February. And the model is "less broadly capable" than its most recent offering, Claude Mythos Preview. But at this time Anthropic has no plans to release Claude Mythos Preview to the general public. It says the effort is aimed at understanding how models of that caliber could eventually be deployed at scale. How Does Opus 4.7 Compare? According to The Next Web, the most striking gains are in software engineering: on SWE-bench Pro, an AI evaluation benchmark, Opus 4.7 scored 64.3 percent -- up from 53.4 percent on Opus 4.6 and ahead of both GPT-5.4 at 57.7 percent and Gemini 3.1 Pro at 54.2 percent. Opus 4.7 Token Usage Users upgrading from Opus 4.6 should note two changes that affect token usage. An updated tokenizer improves how the model processes text but can increase token counts by roughly 1.0 to 1.35 times depending on content type. The model also thinks more deeply at higher effort levels, particularly in later turns of agentic tasks, which boosts reliability on complex problems but produces more output tokens.
[16]
Anthropic's New Claude Opus 4.7 Model Is Still Less Capable Than Claude Mythos
This was done intentionally to make the model safer for public release Anthropic, on Thursday, released another major update to its Opus model, dubbed Claude Opus 4.7. The new artificial intelligence (AI) model comes just days after the San Francisco-based startup released Claude Mythos, a model which is so capable at cybersecurity tasks that the company has limited its access. Opus 4.7 is built using the same architecture but has been intentionally kept less advanced to ensure that it cannot be used to carry out cyberattacks. Compared to the older Opus model, the latest iteration also brings improvements across coding and vision-related tasks. Anthropic Releases Claude Opus 4.7 In a newsroom post, Anthropic announced that Claude Opus 4.7 is now generally available. It can now be accessed across all Claude products as well as via the application programming interface. Additionally, third-party enterprise platforms, including Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry, will also host it. The company has kept the pricing the same as Opus 4.6, with input tokens priced at $5 per million and output tokens at $25 per million. One of the key highlights of the latest model is the improved multimodal support. The large language model (LLM) can better analyse high-resolution images, with support for up to 2,576 pixels or approximately 3.75MP, which is a 3X improvement compared to the older version. Anthropic says this will let Opus 4.7 process dense visual information from charts, screenshots, and PDFs. Another area of improvement is software engineering, otherwise known as coding tasks. The company claims that the model has made significant improvements in difficult tasks that previously required close supervision. Opus 4.7 is said to handle complex and long-running tasks with consistency and can verify its own output before notifying the user. In terms of internal benchmark evaluations, Anthropic claimed that the model performed better than OpenAI's GPT-5.4 and Google's Gemini 3.1 Pro. However, the scores reveal that the Opus 4.7 is still less capable than the Claude Mythos Preview, which is currently only available to the 40 organisations affiliated with Project Glasswing.
[17]
Anthropic to launch Claude Opus 4.7 this week
Anthropic is set to launch Claude Opus 4.7, its next flagship AI model, alongside a new tool for designing websites and presentations, The Information reports. Both products could launch as soon as this week, according to a person familiar with the plans. The release of Opus 4.7 represents an incremental upgrade to Anthropic's Claude lineup, building on Claude Opus 4.6, which launched in February. This prior model introduced enhancements for coding, task execution, and featured a one-million-token context window. Internal references to Opus 4.7 have emerged in recent weeks, indicating an imminent release. Claude Opus 4.7 is distinct from Claude Mythos, a more powerful AI model that Anthropic has withheld from the public due to cybersecurity concerns. Anthropic operates a dual-track strategy, with Opus 4.7 serving as the commercial alternative while Mythos is being controlled through Project Glasswing, expected to unveil in May in San Francisco. The new AI design tool is designed for creating websites and presentations, with news of its development impacting shares of design software companies. Specifically, Figma and Wix experienced declines on Monday following the announcement. This move intensifies competition within the design tool market at a time when the S&P 500 Software and Services Index has fallen nearly 26% this year. Concerns hover over the potential for AI tools to decrease demand for traditional software products. Previous launches from Anthropic, such as the Claude Cowork assistant and related automation plugins, have spurred considerable selloffs in software stocks earlier this year. Anthropic's foray into design tools marks an expansion into visual and creative workflows. The company has partnered with Figma to convert AI-generated code into editable design files and integrated Claude into Microsoft Word and PowerPoint. Since January 2026, Anthropic has released major updates approximately every two weeks, including new models and enhancements. The week of April 14 is being highlighted as potentially one of the busiest in AI history, with expected announcements from OpenAI and Meta hosting their LlamaCon event.
[18]
Claude Opus 4.7 hits 92% honesty rate -- are we closer than ever to human-like AI with less hallucination? Here's what Anthropic's new AI model is capable of
Claude Opus 4.7 benchmarks explained start with a strong data point: 87.6% on SWE-bench Verified. This jump signals real coding gains in 2026. Developers now see better issue resolution and faster workflows. Claude Opus 4.7 benchmarks explained also highlight 64.3% on SWE-bench Pro, beating GPT-5.4 and Gemini 3.1 Pro. Tool use leads at 77.3% on MCP-Atlas. Computer use reaches 78.0%. However, BrowseComp drops to 79.3%. This means weaker research performance. Overall, Claude Opus 4.7 benchmarks explained show a focused upgrade for coding, automation, and real-world AI agents.
[19]
Anthropic Releases Claude 4.7, Introducing Mythos-Inspired Cybersecurity Protections - Apple (NASDAQ:AAPL
Anthropic has released its latest AI model, Claude Opus 4.7, which will test new cyber capabilities "not as advanced" as those of Mythos Preview. "We stated that we would keep Claude Mytho's Preview's release limited and test new cyber safeguards on less capable models first," a company press release stated. Last week, Anthropic announced the creation of Project Glasswing, a security-focused collaboration that includes big-name companies spanning both finance and tech. The group plans to use the unreleased Anthropic model, Claude Mythos Preview, to hunt and fix software flaws in an effort to "reshape" cybersecurity. A New Cyber Front Opens This new launch, although limited to approximately 40 companies, brought scrutiny from regulators regarding potential cybersecurity concerns with its usage. Many of the largest U.S. banks still run core systems on legacy code dating back decades. If Mythos can surface flaws that every existing security tool missed, banks might be one of the more vulnerable sectors. The updated Claude Opus 4.7 will be released with safeguards to automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. "What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models," Anthropic stated. The updated model will also allow users to see images in a higher resolution, and will assist with the "most difficult tasks." Although it's less broadly capable than Mythos, it provides better results than Opus 4.6, the company pointed out. Early testing of the updated model suggests improved instruction-following capabilities, along with stronger performance in a finance analyst role. The release also notes that Opus 4.7 will include a file system-based memory, enabling it to retain prior work and operate with less upfront context. Photo: Shutterstock Market News and Data brought to you by Benzinga APIs To add Benzinga News as your preferred source on Google, click here.
[20]
Internet roasts Claude Opus 4.7 as it flunks viral car wash puzzle in bizarre blunder
Claude Opus 4.7 fails car wash puzzle: Anthropic launched Claude Opus 4.7, its most powerful generally available model for advanced software engineering. Despite improvements, a viral "car wash" puzzle highlighted a reasoning gap, sparking debate about AI's common sense. This incident contrasts with Mythos Preview, Anthropic's more advanced, yet restricted, cybersecurity model. Claude Opus 4.7 fails car wash puzzle: Anthropic has released Claude Opus 4.7, its most powerful "generally available" model to date, positioning it as a step up from Opus 4.6 for advanced software engineering tasks, particularly complex coding work that previously required more hand-holding. The company also says it improves image analysis, instruction following, and can display more "creativity" when generating slides and documents. The launch comes alongside Mythos Preview, Anthropic's cybersecurity-focused model announced earlier this month and described by the company as its most powerful overall. However, Opus 4.7 does not advance the company's "capability frontier," as Mythos Preview reportedly achieved higher results across all relevant evaluations in Anthropic's system card. Mythos Preview is currently limited to select partners including Nvidia, JPMorgan Chase, Google, Apple, and Microsoft for security reasons. Anthropic says Opus 4.7 is being used to test cybersecurity safeguards on less capable models before a broader release of Mythos-class systems, and it is also being tested with early customers including Intuit, Harvey, Replit, Cursor, Notion, Shopify, Vercel, and Databricks. Pricing remains unchanged from Opus 4.6. Meanwhile, attention on social media quickly shifted after a viral "car wash" puzzle circulated, where a widely shared screenshot posted by AI builder Min Choi showed Claude Opus 4.7 suggesting walking to avoid getting dirt on the return trip while missing that the car itself needed to be taken to the car wash. The response drew sarcastic reactions and jokes online. As per the screenshot shared by Choi, he asked Claude Opus 4.7, "I want to wash my car. The car wash is 100ft away. Should I walk or drive." The AI model responded, saying, "Walk. It's about 30 seconds on foot, and driving 100 ft to a car wash just gets your freshly-cleaned car dirty on the way back." @DevinSoto commented, "LMFAOOO I was thinking walk at first then I was like wait.. Can't wash a car without a car LOL." @arammelkoumov said, "Well I guess we aren't being replaced any time soon by AI." @Athashri_k wrote, "Imagine AGI robots walking to car wash without the car lolðŸ˜" @burkov reacted, saying, "For those living under a rock: LLMs stopped becoming smarter around summer 2025," adding, "Everything impressive you see since then is about finetuning them for specific tasks (mainly coding and software-tool-based task solving) and building tooling around them (such as agentic coding systems)." @manigopal1111 said, "Peak AI intelligence, had enough compute to calculate walking distance but zero common sense." While, @damoosmann pointed out that, "The win isn't the walk suggestion. It's noticing that driving to a car wash re-dirties the clean car on the trip back. That second-order effect is the kind of thing old models flattened into 'shortest path wins.'" The incident comes amid broader discussion that even top models, including GPT-5.4, reportedly struggle with similar reasoning tests at 20-40% success rates, while models like Grok and Gemini were said to have solved it correctly, as per a summary of posts on X. The moment has highlighted ongoing challenges in reasoning despite reported improvements, including around 13% gains in coding performance, as per the summary. What is Claude Opus 4.7? It's Anthropic's latest generally available AI model focused on coding and complex tasks. Is it better than Opus 4.6? Yes, especially in advanced software engineering and instruction-following.
[21]
Anthropic's New Design Tool Rivals Adobe and Figma | PYMNTS.com
By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions. Until now, Anthropic's products covered chat interfaces and developer tools. The design tool is its first move into visual and creative workflows. Dataconomy reported that the company has already partnered with Figma to convert AI-generated code into editable design files. Anthropic has also integrated Claude into Microsoft Word and PowerPoint. But the new tool goes further. It does not augment an existing design workflow. It replaces the starting point. A user describes what they want. The model builds it. No prior design experience is required. That is a different proposition than what Adobe and Figma currently offer. Adobe Firefly is embedded across Photoshop, Illustrator and Premiere. It assists designers already working inside those tools. Figma AI works the same way inside its own interface. The Next Web reported that Figma commands an estimated 80 to 90% market share in UI and UX design. Both products assume a trained designer is in the loop. Anthropic's tool does not. Adobe reported $23.77 billion in revenue for fiscal year 2025, but its stock has declined as investors question whether its per-application model can survive a market where competitors offer capable tools at lower prices. The competitive pressure predates Anthropic's announcement. Claude Opus 4.7's debut adds a new front to that pressure. The design tool fits a larger pattern. Decrypt noted that internal signals point toward Anthropic repositioning from a language model provider toward a full-stack AI studio, where Claude builds and deploys complete products. Venture capitalists are valuing Anthropic at up to $800 billion, more than double the $380 billion valuation from its February funding round. Annualized revenue has jumped from $9 billion to $30 billion. Design is an upstream input to digital commerce. Product interfaces and landing pages drive conversion. When a model generates those assets from a prompt, the cost and time of building digital products falls. Agencies and in-house teams that bill for design work face direct competition from the tool itself. Google launched Stitch with Claude Code integration already built in. Microsoft embedded AI design into Designer. Since January, Anthropic has released major updates approximately every two weeks. The pace is not slowing. Opus 4.7 is not Anthropic's most capable model; Claude Mythos holds that distinction. Mythos is currently being tested by early partners, using it to find security vulnerabilities in their software, PYMNTS reported. Anthropic has not made it available to the public. The design tool and Opus 4.7 are the commercial layer, while Mythos is the frontier. Anthropic seems to be operating a dual-track strategy, with Opus 4.7 serving as the commercial product while Mythos remains under restricted access. The Opus line has been building toward this moment. PYMNTS reported that Claude Opus 4.6, released in February, was built around three enterprise outcomes: finding information, analyzing it and producing finished outputs closer to production-ready quality on the first attempt. It also integrated directly into Microsoft PowerPoint, reading existing layouts and generating slides that preserve those design elements. Opus 4.7 is the next step in that progression.
[22]
Anthropic rolls out Claude Opus 4.7 with major coding and AI agent upgrades
Anthropic has introduced Claude Opus 4.7, now generally available across Claude products and major cloud platforms. The model is a direct upgrade over Opus 4.6, delivering stronger performance in advanced software engineering, long-running AI agent tasks, instruction accuracy, multimodal vision, and real-world knowledge work. Claude Opus 4.7 is designed for complex, long-running workflows that require consistency, accuracy, and structured reasoning. It improves performance in advanced coding, multi-step tasks, and professional knowledge work. The model shows stronger instruction following, better self-verification, improved memory usage, and enhanced multimodal capabilities. It can now handle difficult coding and analysis tasks with reduced supervision, while remaining less broadly capable than Claude Mythos Preview. Claude Opus 4.7 maintains a safety profile similar to Opus 4.6, with targeted improvements in honesty and resistance to prompt injection attacks. Anthropic also notes that Opus 4.7 has reduced cybersecurity capability compared to Claude Mythos Preview under Project Glasswing, with built-in safeguards that block high-risk cybersecurity requests. Security professionals can apply for the Cyber Verification Program for legitimate use cases such as vulnerability research and red-teaming. Anthropic recommends testing real workloads before full migration due to changes in token behavior.
[23]
Claude Opus 4.7 launched: Smarter, safer AI - but why isn't it more powerful than Anthropic's Claude Mythos explained
Claude Opus 4.7: Anthropic has introduced a new artificial intelligence model, Claude Opus 4.7, positioning it as a practical upgrade focused on real-world tasks rather than cutting-edge power. The company said the latest version improves performance in areas like software engineering, following instructions, and handling everyday work tasks, as per a report. It is now its most powerful model available to the public. However, Anthropic made it clear that Opus 4.7 is not as broadly capable as its more advanced system, Claude Mythos Preview, which is currently limited to select companies, as per a CNBC report. Unlike Mythos Preview, which is part of the company's cybersecurity-focused Project Glasswing, Opus 4.7 comes with built-in safeguards. These protections are designed to automatically detect and block requests linked to high-risk or prohibited cybersecurity uses. Anthropic said, "We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses," adding, "What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models," as quoted by CNBC. Since its founding in 2021, Anthropic has focused on building a reputation around responsible AI development. Its latest release reflects that approach, balancing improved capabilities with tighter controls. The launch also follows growing attention from policymakers and industry leaders, including discussions involving members of the Trump administration, tech executives, and banking leaders about the risks tied to powerful AI systems. Anthropic noted that while Mythos-class models are not yet ready for broad release, the company aims to eventually scale them responsibly. In the meantime, Opus 4.7 builds on its predecessor, Claude Opus 4.6, outperforming it across several benchmarks such as coding, reasoning, and tool use. The model is now available across Anthropic's Claude products, API, and cloud platforms including Microsoft, Google, and Amazon, at the same price as the previous version. What is Claude Opus 4.7? It is Anthropic's latest AI model designed to handle real-world tasks more effectively. How is Opus 4.7 different from Mythos Preview? Opus 4.7 is publicly available but less advanced than Mythos Preview, which is limited to select users.
[24]
Anthropic's Claude Opus 4.7: The AI is so powerful it's spooking web design tools
Anthropic's rumored Claude Opus 4.7 is poised to revolutionize web design by generating websites and prototypes from simple text prompts. This potential shift has already impacted design-focused stocks like Figma and Adobe, as investors anticipate AI-driven automation in the creative space. The new model could democratize design, allowing non-technical users to build digital products with ease. The AI race may be entering a new phase, and this time the focus appears to be web design. Reports suggest Anthropic is preparing to launch Claude Opus 4.7, a new model that could help users create websites, landing pages, presentations, and prototypes using simple prompts. Even before its release, the market reaction has been swift. For design and SaaS companies, the buzz around the new Claude tool has already sparked concern. Stocks linked to the design space, including Figma, Adobe, Wix, and GoDaddy, reportedly slipped following the leak, as per reports. Anthropic appears to be moving beyond conversational AI and coding assistance into a much broader productivity space. According to reports, Claude Opus 4.7 could be aimed at both technical and non-technical users, allowing them to generate websites, landing pages, product mockups, and presentation decks using just a single natural-language prompt. That seems to be the biggest talking point around the rumored launch. If the reports are accurate, Anthropic's next move is centered on design automation, an area currently dominated by companies such as Figma, Adobe, Wix, and Google's Stitch. The tool is said to simplify complex design workflows by turning plain text prompts into usable visual outputs. This means users may no longer need advanced design expertise to build prototypes or launch-ready landing pages. The market reaction was immediate. Reports indicate that Figma shares fell around 6%, while Adobe, Wix, and GoDaddy also saw declines after news of the potential launch surfaced. The sharp drop appears to be driven by investor concerns that AI-generated design tools could automate a large portion of traditional UI and web design work. The possibility that users can create polished websites and presentations through prompts alone has raised concerns about how existing design platforms may be affected, as per Investing.com UK. If launched as reported, Claude Opus 4.7 may mark a major shift in how digital products are created. Instead of starting from scratch in traditional design software, users may be able to describe what they need in plain language and receive an instant prototype or full layout. All eyes remain on Anthropic as speculation continues to build around what could be one of the biggest AI design launches of the year. What is Claude Opus 4.7 expected to do? It is reportedly designed to create websites, presentations, and prototypes from natural language prompts. Why did Figma and Adobe stocks fall? Investors reacted to reports that Anthropic may launch a competing AI design tool. (You can now subscribe to our Economic Times WhatsApp channel)
[25]
Claude Opus 4.6 to 4.7: What Anthropic actually changed
AI model upgrades have a habit of sounding more incremental than they really are. Is Claude Opus 4.7 continuing that tradition or is it actually better? According to Anthropic, it doesn't reinvent what Claude does, but fixes the things that made you distrust it. Also read: Claude Opus 4.7 announced: Three interesting things you should know While Opus 4.6 was able but overly cautious in its interpretation of prompts, with many tasks being abandoned partway because of difficulty at agentic processing level, Opus 4.7 interprets prompts very literally (to almost the point of being uncomfortable). Anthropic has cautioned developers that prompts they developed for Opus 4.6 may get unexpected results from Opus 4.7 because it provides very straightforward implementations based on exactly what you've asked for versus what you were probably wanting. That's a feature, not a bug. With this change, the progress with coding has followed suit- for example, when trying to measure dev teams using the same benchmarks internally, Cursor had a respective 70% completion rate with Opus 4.7 compared to 58% with Opus 4.6. Notion also got 14% better performance with fewer tool errors when performing multi-step workflows. These increases are not just minor improvements, but show that the new model is able to finish what it started. Also read: OpenAI's Agents SDK 2026 Update: What's New in Building AI Agents The resolution of images processed by Opus 4.6 was sufficient for general use, but it was unreliable for precise applications. With Opus 4.7, images may now be fully processed up to 3.75 megapixels. This significant increase from 4.6 translates into a substantial increase in the quality of stored images including complicated diagrams, graphic depictions of chemical structures, and complex technical diagrams. According to Oege de Moor, CEO of XBOW, the scores on their internal visual acuity benchmark went from 54.5% on 4.6 to 98.5% on 4.7. While Opus 4.6 started each session fresh (with a minimal number of previous files in memory), Opus 4.7 provides improved multi-session file system-based memories. For users running long agentic workflows, this memory will significantly enhance the user experience. It would be dishonest to call 4.7 a pure upgrade with no caveats. he new tokenizer means the same input can map to 1.0 -1.35 times more tokens than before, and higher effort levels generate more output. Costs can creep up. Anthropic acknowledges this and offers mitigation through effort parameters and task budgets, but teams should measure before assuming efficiency gains. Still, the overall picture is of a model that has grown up. Opus 4.6 was impressive and occasionally unreliable. Opus 4.7 feels like the version Anthropic always meant to ship.
[26]
Claude Opus 4.7 announced: Three interesting things you should know
Anthropic's latest flagship model arrived today, and while the benchmarks are as you would expect with a new model, the devil is in the details. Here's what actually matters about Claude Opus 4.7. Also read: Claude announces ID verification: What it means for your account and privacy The main focus of Opus 4.7 is writing code, however it is more than just a way to make your experience quicker. The model not only creates code, but also verifies it and determines if there are logical errors in the design phase. Prior models would quit working on difficult tasks or produce plausible but incorrect results, whereas Opus 4.7 continues to solve those problems.Caitlin Colgrove, co-founder and CTO of HEX stated that "correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks". This has been a common problem with any AI-based work for a long time. This results in software teams being able to rethink how much supervision agentic actions actually require. Tests have shown the model is already achieving 14% better score than Predecessor models at many companies including: Cursor, Warp, and Notion, and verified that it has successfully completed tasks where previous Claude models couldn't. The difference between a model producing results compared to a model producing results that customer can ship. Also read: Apple and Google reportedly hosting deepfake nudity apps, despite breach of policy Vision has always been something of an afterthought in language model upgrades, functional, but rarely transformative. Opus 4.7 is a genuine step change. The model now accepts images up to roughly 3.75 megapixels, more than three times the resolution of earlier Claude models. That might sound like a technical footnote, but the practical applications are significant. For anyone doing computer-use automation, reading dense screenshots or complex technical diagrams is no longer a coin flip. One life sciences company testing the model flagged major improvements in reading chemical structures. Another noted near-perfect visual acuity on benchmarks that Opus 4.6 had scored just above 50% on. The upgrade isn't cosmetic, it opens up an entire class of multimodal work that simply wasn't reliable before. In all of the excitement surrounding launches, it is easy to overlook this one; however, it is something worth noting. Anthropic recently revealed Project Glasswing where they raised a number of serious concerns about the implications of artificial intelligence in regard to cybersecurity. Opus 4.7 is the first time where the training has been specifically done based on these concerns. The company indicates that they experimented with purposely limiting a selection of cyber capabilities throughout the training of this model. In addition, they developed the ability to automatically detect and block requests to the Opus Model related to prohibited or higher risk cyber uses. For the entire industry, this is a strong signal of what is possible with model-generated content. It is one thing to simply add a content filter; it is considerably different to make decisions regarding the safety of a model at the model level and then to be transparent regarding those decisions and the associated tradeoffs. Security professionals who are legitimately involved, such as penetration testers; red teamers; and vulnerability researchers will be able to apply to a new Cyber Verification Program that will allow them to gain full access to Opus. For all others, they will receive it with purposeful limitations. Opus 4.7 will be available across Claude's entire suite of products and through the API at a cost of $5 per million tokens input and $25 for every million tokens output. Also read: Anthropic uses AI agents for AI alignment breakthrough, but at what cost?
Share
Share
Copy Link
Anthropic has launched Claude Opus 4.7, its most capable publicly available AI model, with benchmark-leading scores on software engineering tasks. The large language model achieves 64.3% on SWE-bench Pro, surpassing OpenAI's GPT-5.4 at 57.7% and Google's Gemini 3.1 Pro at 54.2%. While positioned as less capable than the restricted Claude Mythos Preview, Opus 4.7 introduces new cybersecurity safeguards and delivers significant improvements in coding, vision, and multi-step reasoning.
Anthropic has released Claude Opus 4.7, its most capable generally available AI model, marking a significant leap in advanced software engineering capabilities just over two months after launching Opus 4.6. The new large language model achieves 64.3% on SWE-bench Pro, the benchmark testing a model's ability to resolve real-world software issues from open-source repositories, decisively outperforming OpenAI's GPT-5.4 at 57.7% and Google's Gemini 3.1 Pro at 54.2%
4
. On SWE-bench Verified, a curated subset of the benchmark, Opus 4.7 scores 87.6%, compared with 80.8% for its predecessor and 80.6% for Gemini 3.1 Pro4
. The model is available immediately across Claude Pro, Max, Team, and Enterprise plans, as well as through Amazon Bedrock, Vertex AI, and Microsoft Foundry, priced at $5 per million input tokens and $25 per million output tokens4
.
Source: VentureBeat
Claude Opus 4.7 delivers a 14% improvement over Opus 4.6 on complex multi-step workflows while using fewer tokens and producing a third of the tool errors, according to Anthropic
4
. The AI model introduces multi-agent coordination, enabling it to orchestrate parallel workstreams rather than processing tasks sequentially—a capability that translates directly into throughput for enterprise users running simultaneous code review, document analysis, and data processing4
. Anthropic states that users can now hand off their hardest coding work to Opus 4.7 with confidence, as the model handles complex, long-running tasks with rigor and consistency2
. The model is the first Claude iteration to pass "implicit-need tests," where it must infer required tools or actions rather than receiving explicit instructions4
. For agentic reasoning tasks, Opus 4.7 demonstrates improved resilience, designed to continue executing through tool failures that would have halted Opus 4.6, recovering and adapting rather than stopping4
.
Source: Gizmodo
The new model takes instructions "literally," where previous models skipped or loosely interpreted prompts, according to Anthropic
1
. This improved instruction following reduces ambiguity but may require developers to adjust existing prompts, as tighter adherence reduces the creative or unexpected outputs that sometimes emerged from earlier versions4
. For enhanced vision capabilities, Opus 4.7 processes images at resolutions up to 2,576 pixels on the long edge—more than three times the capacity of prior Claude models4
. The improvement targets enterprise document analysis, where scanned contracts, technical drawings, and financial statements contain fine print that lower-resolution vision models often miss or hallucinate4
. Anthropic also claims the model is more "tasteful and creative" when creating interfaces, documents, and slide decks, though specifics on what constitutes good versus bad taste remain undisclosed1
.
Source: Inc.
Related Stories
While Claude Opus 4.7 is not as powerful as Claude Mythos Preview—Anthropic's most advanced model that excels at identifying security flaws and is restricted to select companies through Project Glasswing—it serves as a testing ground for new cybersecurity safeguards
3
. Anthropic is releasing Opus 4.7 with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses1
. These protections represent a watered-down version of what will appear in Mythos-class models, with real-world deployment learnings informing Anthropic's eventual goal of a broad release1
. The new model scored 73.1% on cybersecurity vulnerability reproduction benchmarks, a slight decrease from Opus 4.6's 73.8%, potentially reflecting the impact of these new safeguards5
. The API and broader availability of Opus 4.7 allows Anthropic to test these protections at scale before deploying them in more capable models that could pose greater security risks if misused3
.The release arrives as Anthropic runs at a $30 billion annualized revenue rate and has attracted investor offers at roughly $800 billion, with early IPO discussions underway
4
. Opus 4.7 must justify these valuations not by winning every benchmark but by becoming the model that enterprises and developers choose to build on, according to industry observers4
. Claude Code alone hit $2.5 billion in annualized revenue in February, and AI-assisted coding has become one of the fastest-growing categories in software4
. On graduate-level reasoning measured by GPQA Diamond, the field has converged, with Opus 4.7 scoring 94.2%, GPT-5.4 Pro at 94.4%, and Gemini 3.1 Pro at 94.3%—differences within noise that indicate frontier models have effectively saturated this benchmark4
. This convergence signals that competitive differentiation is shifting from raw reasoning scores toward applied performance on complex, multi-step tasks where Opus 4.7 claims advantages. For developers already using Claude as the default choice in tools like Cursor, where the model scored 70% on CursorBench compared to 58% for Opus 4.6, the improvements directly impact daily workflows4
.Summarized by
Navi
[2]
[4]
24 Nov 2025•Business and Economy

06 Aug 2025•Technology

23 May 2025•Technology

1
Policy and Regulation

2
Technology

3
Policy and Regulation
