28 Sources
[1]
Anthropic Says a Mythos-Class AI Model Will Be Available Soon
Expertise Artificial intelligence, home energy, heating and cooling, home technology. Anthropic's most capable generative AI model is still in the hands of a select few organizations and cybersecurity professionals, but the most powerful Claude model you can use now is getting an upgrade. Claude Opus 4.8, released Thursday, is a "modest but tangible improvement" over Opus 4.7, Anthropic said in its announcement blog post. But the company also said it is making significant progress on producing a version of its Claude Mythos Preview model that it's willing to release to the public. Right now, it has restricted access to Mythos to a consortium of partners as part of what it called Project Glasswing, explaining that the model's cybersecurity capabilities were advanced enough to warrant giving cybersecurity experts and major tech companies some lead time to patch flaws found by the model. "Models of this capability level require stronger cyber safeguards before they can be generally released," Anthropic said. "We're making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks." Mythos for everyone? Anthropic's decision to withhold Mythos Preview from the general public, at least for now, has been an interesting one. Was it a wise, forward-thinking move to protect the internet's critical infrastructure from potential flaws? Was it an easy way to churn up marketing hype? Security researchers have found that the model is certainly capable of finding exploits much more quickly than human hackers, even if it isn't necessarily pushing beyond human capabilities. Mozilla's latest version of Firefox included more than 200 fixes identified by Mythos Preview. But the fact that Mythos will soon be available to anyone, even with significant cybersecurity guardrails, means we'll finally get to see if the model lives up to the hype, with all the risks that might entail. Darren Williams, founder and CEO of the cybersecurity firm BlackFog, told CNET in an email that big model releases are often tense moments. "On one hand, Anthropic's decision to stage the release, holding back until safeguards are developed, shows the right instincts," he said. "But the more capable a model is, the higher the stakes if those safeguards fall short or if the model is eventually misused. The window between a powerful model's release and broad adoption of defenses against it is always a vulnerable moment." Mythos is going to be far more expensive to run than other AI models, however, and that could limit its usefulness to hackers. Jake Williams, a cybersecurity researcher and faculty member at IANS Research, said Mythos was 30 times as expensive in tests as the previous Opus model. "This is outside the reach of many, including any commodity threat actors," Williams told CNET in an email. "Nation state actors already had better technology for finding vulnerabilities. This is only changing the game for a small percentage of threat actors." What's new about Claude Opus 4.8 As for Opus 4.8, Anthropic said it's an improvement across benchmarks compared to Opus 4.7. Tests found that Opus 4.8 was less likely to make unsupported claims and more likely to indicate uncertainty, the company said. A few new features are also coming to Anthropic's AI products, including the ability to control how much "effort" a model will use to respond to a prompt on Claude.ai and in Claude Cowork. Higher effort will likely get better results as the model spends more time on a response, but it will burn through your usage limits faster. A lower setting will respond faster and hit rate limits more slowly.
[2]
Claude's new model is more 'honest' when it messes up
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest -- for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence." The AI lab claims that early testers have found that Opus 4.8 "is more likely to flag uncertainties about its work and less likely to make unsupported claims." In the company's evaluations, Opus 4.8 is "around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked." In addition to the honesty improvements, with Opus 4.8, users can direct the amount of effort Claude puts into a task. Higher-effort responses will use more tokens, giving users the option of lower-effort responses if they don't want to burn through their rate limits as quickly. Anthropic is also launching a feature called "dynamic workflows" in research preview, which the company says will let Claude "take on even bigger tasks." With dynamic workflows, "Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user."
[3]
Anthropic launches Opus 4.8, with honesty as its killer feature
Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways * Claude Opus 4.8 promises more honest AI answers. * Dynamic workflows can run hundreds of Claude subagents. * Fast mode gets cheaper, while regular Opus pricing stays put. Diogenes was a fourth-century B.C. Greek philosopher known for his performance art. He is said to have roamed the streets of Athens in the middle of the day, carrying a lit lantern, crying out, "I am looking for an honest man." If that myth were modernized for the present day, we'd all be looking for an honest AI. Anthropic is both announcing and releasing Claude Opus 4.8, a large language model it believes might have satisfied Diogenes' quest. "One of the most prominent improvements in Opus 4.8 is its honesty," the company said Thursday in a blog post. Also: Your Claude agents can 'dream' now - how Anthropic's new feature works Now, perhaps, this new frontier model will behave itself better. Anthropic reports that Opus 4.8 is less likely to make unsupported claims. It's also more likely to tell you when it's uncertain of an answer. "This is borne out in our evaluations, which show that Opus 4.8 is around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked," the company said. In Claude Code, I found Opus 4.7 to be a substantial improvement over 4.6. While 4.6 would often misinterpret instructions or deliver erroneous results, Opus 4.7 regularly tells me that the way it first looked at a problem didn't work, and it's taking a different tactic. Recent project assignments have shown a much greater degree of understanding than with 4.6. So, given the jump in quality from 4.6 to 4.7, which was subjectively quite noticeable over many sessions, I'm hoping we'll see the same in the jump from 4.7 to 4.8. Also: The 5 myths of the agentic coding apocalypse It would seem this is the case, at least according to Tom Pritchard, staff engineer at Spotify, who has already tested Opus 4.8. "Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes. It's a great model to build with," he said in the blog post. That'll be nice. A matter of effort Claude Code has had the ability to set effort since at least 4.7 (at least, that's when I first noticed it). Effort is essentially a measure of how much AI oomph the model throws at a problem, measured in tokens. In Opus 4.8, Claude Code's default of high effort produces what the company said is "the best overall balance of quality and user experience." In coding tasks, this default spends a similar number of tokens as the default level offered in Claude Code Opus 4.7, but with better performance. Also: Anthropic's Mythos is evolving faster than expected, reports AI safety agency This effort capability is now moving into Claude.ai and Cowork. With higher effort settings, Claude will "think more frequently and more deeply." With a lower effort set, Claude responds faster, and users will find their AI experiences are throttled less. Dynamic workflows At launch time, this feature hasn't been fully defined, but it's interesting. Launching as a research preview, Opus 4.8 can plan work, run hundreds of parallel subagents in one session, and verify outputs before reporting back. This feature is designed for very large-scale tasks. The example Anthropic gave was codebase-scale migrations across hundreds of thousands of lines. It seems like Claude can generate and manage the workflow as the task evolves. Rather than running off a fixed plan, agents can change their priorities and tasks based on what they find while doing their work. This could be powerful. Also: Anthropic's new Claude Security tool scans your codebase for flaws - and helps you decide what to fix first Anthropic said that the subagents verify their results before reporting back to users. If Claude is coordinating hundreds of subagents, users need it to notice uncertainty, bad assumptions, and failed outputs. Interestingly, this connects right back to the honesty claims discussed at the beginning of the article. If Claude is going to launch "thousands of agents," getting back reliable and vetted results really matters, because there's no way human oversight can keep up on its own. The dynamic workflows capability will be available to Claude Code users on Enterprise, Team, and Max plans. Price and availability Anthropic said Claude Opus 4.8 is available everywhere Thursday through Claude and the Claude API as claude-opus-4-8. In practice, especially if you're using Claude Code, you might find that you'll need to restart your session or wait a day or so for Claude Code to notice it. When Anthropic jumped Opus 4.6 to 4.7, I kept asking Claude Code what model it was using, and it wasn't until the next morning that it stopped reporting Opus 4.6 and started reporting Opus 4.7. Overall pricing hasn't changed since Opus 4.7. Regular token-based pricing remains $5 per million input tokens and $25 per million output tokens. Also: This exec offers 4 ways to be a successful innovator in the age of agentic AI The company said that fast mode, which enables the model to work at 2.5 times the speed of normal mode, will be "three times cheaper than it was for previous models." While I don't spend on fast mode, I do see the appeal. I've watched a lot of YouTube, waiting for Claude Code to respond to a prompt, hour after hour. Would you rather have Claude respond faster with lower effort or think longer with higher effort? Let us know in the comments below. You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
[4]
Claude Opus 4.7 was just released last month, and 4.8 is already here with some massive improvements
* Anthropic just launched Claude Opus 4.8, upgrading 4.7 barely a month after its release. * Big gains: ~5 pts in agentic coding and ~8+ pts in agentic terminal coding. * Opus 4.8 is notably more honest -- flags uncertainty more and hallucinates less. If you ever need proof that the world of AI tech moves at the speed of light, you only need to take a peek at what Anthropic is doing. Just over a month ago, I took to the stand to report that Anthropic had just released Opus 4.7 to the public, and that the new model posted some impressive improvements over the last model. Well, here I am again to say that Opus 4.7 is old news, and that 4.8 is here to make the previous model look outdated. Claude Opus 4.8 posts even better benchmarks than its previous month-old model Anthropic will run out of version numbers before its momentum fizzles out Anthropic broke down what's new with Claude Opus 4.8 over on its blog. You can see the progress the company made over 4.7 in the chart above. The most notable upgrades are in agentic coding (with almost a 5 percentage point increase in capabilities) and in agentic terminal coding (with over an 8 percentage point increase). Remember, Claude Opus 4.7 came out mid-April, and Anthropic is already serving up a new version with big improvements. However, apart from the statistical upgrades, Claude's Opus 4.8 model comes with a nice additional trait: it will no longer lie to your face as often. As Anthropic puts it: One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to be honest -- for instance, to avoid making claims that they can't support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. You can give Claude Opus 4.8 a spin right now, so be sure to try it out and see how it improves your workflow. And I suppose I'll see you again next month when Anthropic inevitably releases Opus 4.9 then. One of the best LLMs for programming just got even better at it, and you can try it out for free The improvements are looking very good. Posts 1 By Simon Batt
[5]
Anthropic to roll out Claude Mythos in coming weeks, launches Opus 4.8
May 28 (Reuters) - Anthropic on Thursday said it is launching an upgraded Claude Opus 4.8 model even as it works to release its powerful, market-moving Mythos model to all customers in the coming weeks. Mythos is the AI lab's large language model with advanced cybersecurity capabilities that have raised concerns among executives and world leaders about its impact. As a part of Project Glasswing, major tech firms such as Amazon (AMZN.O), opens new tab, Microsoft (MSFT.O), opens new tab and Apple (AAPL.O), opens new tab are permitted to use Mythos for cybersecurity purposes. The company said Opus 4.8 will be available for the same price as its predecessor and shows improvements across benchmarks, particularly in honesty. Early testers report the model is more likely to flag uncertainties around its work and less likely to make unsupported claims, Anthropic added. "A general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin," the Claude maker said. Reporting by Zaheer Kachwala in Bengaluru; Editing by Vijay Kishore Our Standards: The Thomson Reuters Trust Principles., opens new tab
[6]
Anthropic Unveils Claude Opus 4.8, a New, More Powerful Model
The new system, called Opus 4.8, tops industry standard benchmarks in areas related to computer programming. Companies like Anthropic and OpenAI continue to sharpen the coding skills of their artificial intelligence technologies, with no ceiling in sight. On Thursday, Anthropic unveiled a new flagship model, called Claude Opus 4.8, that is significantly better than its predecessor, Claude Opus 4.7, at generating computer code, according to independent benchmark tests. It is particularly good at vibe coding, which is when A.I. technologies create software in response to prompts written in conversational English. The new model outperforms all other publicly available A.I. technologies on the vibe coding benchmark from Vals AI, a company that tracks the performance of the latest A.I. technologies. Opus 4.8 scored 10 percent higher on this benchmark test than the previous Anthropic model, said Rayan Krishnan, Vals AI's chief executive. The new model also outperformed its predecessor in mathematics, another area where the leading A.I. technologies continue to improve by leaps and bounds. In a blog post on Thursday, Anthropic said Opus 4.8 topped its predecessor when operating as an A.I. agent, which is when A.I. technologies use other software programs to automate various tasks. Last year, Anthropic and OpenAI released technologies that took an unexpected leap in their ability to write code. This accelerated the development of new software and suddenly unleashed a new wave of A.I. agents. As A.I. systems have improved at writing computer code, they have also become better at identifying security vulnerabilities in software -- a skill that is fundamentally changing cybersecurity. Last month, Anthropic unveiled a new technology called Claude Mythos and said the technology was too dangerous to be released publicly because it could be used to identify security vulnerabilities in the computer software that underpins the internet. It shared Mythos with a small group of tech companies and organizations. OpenAI unveiled similar technology soon after and took a different tack, sharing the technology with a much larger group and cybersecurity experts. It also upgraded its consumer chatbot with the technology. Anthropic still has not released Mythos publicly, but has indicated it will eventually share it with a wider group of companies and government agencies. Mythos is essentially a more powerful version of Opus 4.8, the technology that Anthropic released on Thursday. (The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)
[7]
Anthropic's Claude Opus 4.8 is four times more honest, Mythos next
Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that is four times less likely to let code flaws pass unremarked. The company also teased Mythos-class models, which have already found more than 10,000 critical software vulnerabilities through Project Glasswing, and announced a $65 billion Series H round at a $965 billion post-money valuation. Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that the company says is more honest, more reliable in agentic tasks, and better at catching its own mistakes. The model is available immediately at the same price as its predecessor, $5 per million input tokens and $25 per million output tokens, and is rolling out across all Anthropic products including claude.ai, Claude Code, and the API. The headline improvement is honesty. Anthropic says Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code it has written pass unremarked. Early testers report the model is more willing to flag uncertainties about its work and less likely to make unsupported claims, a persistent problem across AI models that tend to project confidence regardless of whether it is warranted. Benchmark gains across the board Opus 4.8 improves on its predecessor across Anthropic's published benchmarks. On agentic coding (Terminal-Bench 2.1), the score rises from 64.3% to 69.2%. Multidisciplinary reasoning with tools improves from 54.7% to 57.9%. Agentic computer use moves from 82.8% to 83.4%, and knowledge work scores rise from 1,753 to 1,890. Anthropic's alignment assessment found that Opus 4.8 reaches new highs on measures of prosocial traits, including supporting user autonomy and acting in the user's best interest. Rates of misaligned behaviour such as deception or cooperation with misuse are substantially lower than in Opus 4.7, and comparable to Claude Mythos Preview, Anthropic's best-aligned model. Early testers see practical gains The release is accompanied by endorsements from companies already using the model. Cognition, the company behind the AI coding agent Devin, said Opus 4.8 uses tools cleanly and fixes comment-verbosity and tool-calling issues that appeared in Opus 4.7. Cursor, the AI-powered code editor, reported improvements across every effort level on its CursorBench evaluation. Harvey, which builds AI for legal work, said Opus 4.8 delivers the highest score recorded on its Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard. Databricks reported that Opus 4.8 handles deeper multistep questions faster in its Genie AI agent, at 61% cheaper token cost than Opus 4.7. Thomson Reuters said CoCounsel Legal saw meaningful improvements in consistency and reasoning quality. Hebbia, which builds AI for financial document analysis, noted better citation precision and more token efficiency on retrieval tasks. New features alongside the model Anthropic is launching several features alongside Opus 4.8. A new effort control in claude.ai and Cowork lets users choose how much computation Claude applies to a response, trading speed against quality. Claude Code gains a dynamic workflows feature that allows it to plan work and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations across hundreds of thousands of lines of code. For developers, the Messages API now accepts system entries inside the messages array, allowing instructions to be updated mid-task without breaking the prompt cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than it was for previous models. Mythos is the bigger story The more significant announcement may be what comes next. Anthropic said it plans to release a new class of model with higher intelligence than Opus, based on the Claude Mythos architecture. A small number of organisations are already using Claude Mythos Preview through Project Glasswing, an initiative focused on using the model for cybersecurity work. Anthropic and roughly 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, have used Mythos Preview to find more than 10,000 high- or critical-severity vulnerabilities across critical software infrastructure. Mythos-class models require stronger cyber safeguards before general release, Anthropic said, but the company expects to bring them to all customers in the coming weeks. The model sits a full capability tier above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the excitement and the caution around its deployment. A company approaching $1 trillion The Opus 4.8 launch arrives as Anthropic's valuation continues to climb. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day, up from the $380 billion valuation at which it closed its $30 billion Series G in February. Revenue has grown from roughly $1 billion at the end of 2024 to an estimated $30 billion annualised run rate in 2026, driven by enterprise adoption of Claude. Anthropic also opened a new office in Milan on 28 May, its sixth in Europe, and appointed KiYoung Choi as Representative Director of Korea ahead of a Seoul office opening. The expansion reflects growing demand for Claude in enterprise markets outside the United States. The competitive context Opus 4.8 enters a market where the pace of model releases has accelerated sharply. OpenAI launched GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records on professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI, and Google, with each company releasing incremental model upgrades at an increasing pace. For Anthropic, the distinction it is trying to draw with Opus 4.8 is not raw capability but reliability. A model that catches its own mistakes, flags its uncertainties, and follows instructions consistently is more useful in agentic workflows where AI systems operate with limited human oversight. Whether that positioning holds as Mythos-class models arrive, promising higher intelligence with new safety constraints, will determine whether Anthropic can maintain its lead in the enterprise market it has worked to dominate.
[8]
Anthropic Debuts Claude Opus 4.8, Teases Upcoming Launch of 'Mythos-Class Models'
On Thursday, Anthropic launched Claude Opus 4.8, the latest and most advanced version of its flagship AI model. It's available everywhere at the same price as its predecessor, Opus 4.7 ($5 per million input tokens and $25 per million output tokens). Opus 4.8 boasts industry-leading scores on tasks like agentic coding and agentic computer use, which is par for the course for a new Anthropic model. The key differentiator that's being underscored by the company is the model's "honesty" -- and by extension, its overall reliability. According to a company blog post, Opus 4.8 specializes in catching its own mistakes and flagging them to users: "a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin," the company wrote. "Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims." For example, Michael Ran, a senior investment associate at asset management firm Bridgewater, was quoted in Anthropic's blog post saying that Opus 4.8 was able to "proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch." Opus 4.8 also presents a "substantially lower" risk of misaligned and dangerous behaviors, including the generation of harmful sexual content and "undermining liberal democracy," according to the model's system card. Dynamic workflows and effort control In addition to the new model, Anthropic also announced the launch of "dynamic workflows," a new feature now available as a research preview, which allows Claude to handle more complex coding tasks by deploying hundreds of subagents that can work in parallel to one another. Users can expect a noticeable improvement from Opus 4.8, especially for bigger coding tasks, but it's not a game-changer. Anthropic even tried to hedge expectations, writing in its new blog post that Opus 4.8 is "a modest but tangible improvement on its predecessor," Opus 4.7. That model debuted a little over a month ago and received a tepid early response from users, some of whom complained that its "adaptive thinking" feature sometimes caused it to spend too much time on tasks that should've been quick and easy, and not enough time on tasks that deserve more effort. Perhaps in direct response to that complaint, Anthropic also announced on Thursday the launch of a new "effort control" panel (found in the model selector dropdown menu) for Claude, which lets you manually choose the amount of effort -- and tokens -- you want it to spend on a given task. It's set to "Low" by default, and you can switch it to "Medium," "High," and "Max," or toggle on adaptive thinking mode. "Mythos-class models" Anthropic also teased the upcoming debut of "a new class of model" with capabilities that are allegedly on par with those of Mythos, the mysterious model that's been sending cold shivers up Silicon Valley's spine. The company has yet to release the model publicly, citing the model's unprecedented power and its cybersecurity risks. According to its new blog post, Anthropic is currently working on testing safeguards for Mythos and expects to release "Mythos-class models to all our customers in the coming weeks." That's obviously extremely vague, probably intentionally so. Time will tell whether these new models live up to the paradigm-shattering early rumors that have been circulating around Mythos, or if (more likely than not) the new category of models is a substantially watered-down version of the original behemoth. AI developers, after all, tend to hype up their own models' abilities and dangers before they're released, and in most cases, the reality doesn't quite meet the expectations. (Remember all the excitement about GPT-5 being AGI?) Then again, maybe Anthropic is actually ready to unleash world-shattering models that were deemed an existential threat to global security just a couple of months ago. Time will tell, and we'll report back as soon as we know more.
[9]
Anthropic says Claude Opus 4.8 is more honest and better at coding
Anthropic has launched Claude Opus 4.8, a new version of its flagship AI model that the company says is less likely to hide mistakes or make unsupported claims while performing complex tasks. The upgrade builds on Claude Opus 4.7 and arrives as AI firms race to make autonomous systems more reliable for coding, research, and enterprise workflows. Anthropic said the model shows improvements across coding, reasoning, and agentic benchmarks while also becoming more transparent about uncertainty. One of the biggest changes in Opus 4.8 is its focus on honesty during long-running tasks. AI models often present incorrect information confidently or claim progress without enough evidence. Anthropic said the new model is better at flagging uncertainty and identifying flaws in its own outputs instead of silently passing errors through. The company said internal evaluations showed Opus 4.8 was "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked." Anthropic said early testers found the model more reliable when handling agentic tasks, where AI systems independently plan and execute actions over multiple steps. The company also highlighted improvements in alignment and safety behavior. According to Anthropic, its alignment team concluded that Opus 4.8 "reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user's best interest." The assessment also found lower rates of misaligned behavior, including deception and cooperation with misuse, compared with Opus 4.7. Alongside the model upgrade, Anthropic introduced new features aimed at expanding how Claude handles large-scale coding and reasoning tasks. One of them is a research preview feature called Dynamic Workflows for Claude Code. The system allows Claude to break large tasks into smaller jobs handled by hundreds of parallel AI subagents operating within a single session. Anthropic said the feature can carry out codebase-scale migrations involving hundreds of thousands of lines of code while checking outputs against existing test suites before reporting results back to users. The company also added an effort control setting on claude.ai and Cowork. Users can now decide how much computational effort the model spends on a task. Lower effort settings prioritize faster responses and reduced token use, while higher settings allow the model to spend more time reasoning through difficult prompts. Anthropic said Opus 4.8 defaults to a high-effort mode designed to balance quality and user experience. The company additionally reduced pricing for its fast mode, which now runs at 2.5 times the speed of earlier models while costing less than previous versions. Anthropic said it is also preparing more advanced "Mythos-class" models under Project Glasswing. These systems are currently being tested for cybersecurity applications with a small group of organizations before broader release. The company said stronger cyber safeguards are still being developed before those models can become widely available. Claude Opus 4.8 is now available through claude.ai and the Claude API.
[10]
Claude Opus 4.8 is learning to say AI's three hardest words: "I don't know
Opus 4.8 represents a significant step forward in making AI systems more transparent about their knowledge limitations and uncertainties. Honesty is a key sticking point with even the most powerful LLMs. It's not so much that they're intentionally lying to you; instead, they'll confidently tell you things they're not 100 percent (or even 50 percent) sure about. With Opus 4.8, its latest Claude model, Anthropic says it's made Claude more honest about telling you what it doesn't know, or if it has a low level of confidence in what it's telling you. Released Thursday, Claude Opus 4.8 is not Claude Mythos Preview, Anthropic's new "frontier" model that's so powerful, only a handful of "trusted partners" have been allowed to test it for security reasons. There's still no solid release date for Claude Mythos. Arriving about six weeks after Claude Opus 4.7, Opus 4.8 takes over as Anthropic's most powerful model in general availability, and for the most part, it marks a "modest" improvement over its predecessor, while Mythos Preview handily bests it in cybersecurity tasks, Anthropic says. But according to the company's benchmarks, Opus 4.8 is tops in a key category: honesty, with the model snaring "near-perfect" scores when it comes to admitting it doesn't know the answer to a coding question. Even the crazy-powerful Mythos Preview couldn't best Opus 8.7 in this particular honesty test, coming in a close second, while Opus 4.7 finished a distant fourth. Of course, these are Anthropic's benchmarks we're seeing; we'll have to wait for third-party testing to get more objective results, not to mention reports from the wild. I plan on taking Opus 4.8 for a spin in the coming days. Anthropic also shared some "concerning hints related to evaluation awareness"-meaning that Opus 4.8 showed signs that it knew it was being tested-while noting a "tendency for the model to reason about how its outputs will be graded." Those concerns aren't unique to Opus 4.8; indeed, the latest "frontier" models often seem to know when they're being poked and prodded. Still, it's good to see that models like Opus 4.8 are dialing down the BS, at least on paper. Hopefully it'll maintain that level of honesty in practice.
[11]
Claude Opus 4.8 launches today with agentic improvements, new features
Anthropic has today announced Claude Opus 4.8, the latest version of its AI model, with a focus on improving agentic capabilities. Available now, Claude Opus 4.8 delivers improvements across a number of different benchmarks, but primarily in agentic tasks. There are also several new features available, as Anthropic explains: Opus 4.8 launches alongside several new features. Users on claude.ai now have control over the amount of effort Claude puts into a task. Claude Code has a new "dynamic workflows" feature that allows it to tackle very large-scale problems. And fast mode for Opus 4.8 -- where the model can work at 2.5× the speed -- is now three times cheaper than it was for previous models. In a comparison with Opus 4.7, there are some clear areas of improvement in agentic coding, computer use, and reasoning. Anthropic specifically says that Opus 4.8 is meant to be a "modest but tangible improvement" on 4.7 and, as such, will be pricing it the same - $5 per million input tokens and $25 per million output tokens. Agents were also a huge focus of Google I/O last week, with the upcoming Gemini 3.5 Pro model expected to deliver further improvements there. Anthropic didn't compare Opus 4.8 to Gemini 3.5 Flash as they're a separate class of model, but there were some comparisons made in Anthropic's Opus 4.8 system card.
[12]
Anthropic upgrades Claude with new Opus 4.8 model, here's what's new
Anthropic has released its latest AI model with Claude Opus 4.8. The new version arrives less than two months after the previous model upgrade, ramping up Anthropic's upgrade cadence. Opus 4.8 is a 'more effective collaborator,' Anthropic says Anthropic describes Claude Opus 4.8 as having "sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors." "Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims," according to Anthropic. The company also notes that pricing remains the same between Claude Opus 4.7 and Opus 4.8. Here are some benchmarks between 4.7 and 4.8 models shared by Anthropic: * Agentic coding score increases from 64.3% to 69.2% * Multidisciplinary reasoning with tools jumps from 54.7% to 57.9% * Agentic computer use moves from 82.8% to 83.4% * Knowledge work score increases from 1753 to 1890 * Agentic financial analysis improves from 51.5% to 53.9% Anthropic says Opus 4.8 fast mode is roughly 2.5 times quicker now. The upgraded fast mode also costs three times less than before, the company says. In addition to Opus 4.8's release, Anthropic is launching three more updates today: * Dynamic workflows. This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. * Effort control in claude.ai and Cowork. A new control alongside the model selector lets users choose how much effort Claude puts into a response. * The Messages API now accepts system entries inside the messages array. Developers can update Claude's instructions mid-task without breaking the prompt cache or routing the update through a user turn. You can learn more much about these features and the new Claude Opus 4.8 upgrade from Anthropic's announcement. Anthropic's previous Opus model upgrade arrived on April 16 with the release of version 4.7. Today's release comes just six weeks later. The new model is available globally from today.
[13]
Anthropic Launches Claude Opus 4.8 With Gains in Coding and Honesty
Anthropic today announced the launch of its latest AI model, Claude Opus 4.8. Anthropic claims the model is a "more effective collaborator" with improvements in agentic coding, multidisciplinary reasoning, agentic computer use, knowledge work, and agentic financial analysis. Testers have found Opus 4.8 to be "more reliable and sharper in its judgement" when doing agentic tasks, and the model also made gains in honesty. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. Alignment assessments suggest the model hits new highs on measures of prosocial traits like supporting user autonomy and acting in the user's best interest. Rates of misaligned behavior like deception are lower than Opus 4.7 and similar to the Claude Mythos Preview. Anthropic benchmarks indicate Opus 4.8 scored a 69.2% on SWE-Bench Pro, outperforming GPT-5.5 and Gemini 3.1 Pro on the test and several other benchmarks, though GPT-5.5 leads on the terminal-coding benchmark. Opus 4.8's fast mode also runs at 2.5x the speed, and it is now three times cheaper than prior models. Along with Opus 4.8, Anthropic is adding new features to its product lineup. * Dynamic workflows (research preview) - Claude can complete bigger tasks in Claude Code. It is able to plan work and run hundreds of parallel subagents in a single session. It is able to complete codebase-scale migrations across hundreds of thousands of lines of code. The feature is available for Claude Code for Enterprise, Team, and Max plans. * Effort control - In Claude.ai and Cowork, users can choose how much effort Claude puts into a response. With a lower setting, Claude will respond faster and use up rate limits more slowly. Opus 4.8 defaults to high effort, which Anthropic says is the best balance of quality and user experience. * Messages API - The Messages API accepts system entries inside the messages array, so developers can update Claude's instructions mid-task. Claude Opus 4.8 is available everywhere today. Pricing for regular use has not changed compared to Opus 4.7. Anthropic is working on models that have the same capabilities as Opus 4.8 at a lower cost, and a new class of model that's even more intelligent than Opus. Anthropic says it has been developing safeguards for the Claude Mythos model it is testing with a small number of organizations, and it expects to be able to bring Mythos-class models to all customers "in the coming weeks."
[14]
Claude Opus 4.8 just launched -- and Anthropic says it's far less likely to 'fake' answers
Anthropic has officially launched Claude Opus 4.8, the latest version of its flagship AI mode. With this rollout the company promises that it may have fixed one of AI's biggest flaws. According to Anthropic, Claude Opus 4.8 is designed to be more honest, more thoughtful and significantly less likely to confidently pretend it knows something when it doesn't. The company says Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it writes to pass without warning the user. At a time when AI companies are racing to make models faster, more agentic and more autonomous, Anthropic appears to be focusing on the one thing that seems to be ignored: AI should know when it might be wrong and admit it. Claude Opus 4.8 is being positioned as a better collaborator Anthropic says early testers described working with Opus 4.8 as feeling more like working with a real collaborator than previous models. According to the company, testers said the model asks better questions, catches its own mistakes and pushes back when plans don't make sense instead of simply agreeing with users. One of the biggest criticisms surrounding chatbots is that many of them have become "yes-men," often validating bad ideas, weak assumptions or outright incorrect information rather than challenging users. Anthropic appears to be leaning heavily into the opposite approach. The company also says Opus 4.8 showed major improvements in legal reasoning, coding, browser agents and long-form analysis tasks, with several early access partners claiming it outperformed previous Opus models and even GPT-5.5 in some agentic workflows. Users can now control how much the model thinks Alongside the new model, Anthropic is launching a new effort control system inside Claude. Users can now decide how much thinking Claude puts into a task. Higher effort modes allow the AI to spend more time reasoning through responses, while lower effort settings prioritize speed and lower token usage. Anthropic says Opus 4.8 defaults to "high effort" because it offers the best balance between quality and usability. Instead of chatbots instantly firing back responses, companies increasingly seem focused on making models pause, reason and verify information before answering. OpenAI, Google and Anthropic are all moving in this direction as AI systems become more autonomous and agent-driven. Claude can now run more dynamic workflows In addition to the new model, a new research preview feature called Dynamic Workflows was also announced. The feature allows Claude to launch hundreds of parallel subagents during a single task, verify the work and combine the results before responding back to the user. According to Anthropic, Claude Code can now handle massive codebase migrations involving hundreds of thousands of lines of code from start to finish. The future increasingly looks less like a chatbot answering one prompt at a time and more like autonomous systems quietly coordinating multiple AI processes behind the scenes. Anthropic hints at something even bigger coming next by revealing it's already working on "a new class of model with even higher intelligence than Opus." The company says some organizations are already testing a system called Claude Mythos Preview for cybersecurity work, though Anthropic claims models at that level will require stronger safeguards before broader release. Looking ahead At the same time AI companies continue pushing toward more powerful systems, they're also increasingly acknowledging that the next generation of models may carry risks requiring entirely new safety standards. It's clear that Anthropic is trying to position AI like a careful collaborator that questions itself, flags uncertainty and thinks longer before responding, a strategy that seems long overdue. Ironically, that restraint may end up becoming one of the most valuable AI features of all. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.
[15]
Anthropic's Claude Opus 4.8 is here with 3X cheaper fast mode and near-Mythos level alignment
Anthropic today released Claude Opus 4.8, an upgrade to its flagship model that ships at the same price as its predecessor, alongside a dramatically cheaper "fast mode" tier and a new feature that lets the model spawn hundreds of parallel subagents for codebase-scale work. The model is available immediately across Anthropic's surfaces -- claude.ai, Claude Code, the API, and Cowork -- at unchanged pricing: $5 per million input tokens and $25 per million output tokens. Developers can call it as . The headline efficiency story is fast mode. Anthropic has slashed the price of running Opus 4.8 in fast mode -- where the model produces tokens at roughly 2.5x normal speed -- to $10 per million input tokens and $50 per million output tokens, down from $30/$150 for Opus 4.7 That's a 3X reduction from the fast-mode pricing of previous models, and brings high-throughput inference within reach of latency-sensitive production workloads. Fast mode is available immediately in Claude Code via the command; API access is gated, with a waitlist at claude.com/fast-mode. In regular mode, Claude Opus 4.8 remains among the more expensive of leading frontier models, but still comes in under chief rival OpenAI's GPT-5.5. Frontier AI Model API Pricing Snapshot Modest gains over 4.7, but Mythos-class capabilities coming On benchmarks, Opus 4.8 is a step up rather than a leap. It scores 88.6% on SWE-bench Verified (vs. 87.6% for Opus 4.7), 69.2% on the harder SWE-bench Pro (vs. 64.3%), and 74.6% on Terminal-Bench 2.1 (vs. 66.1%). Anthropic itself characterizes the model as "a modest but tangible improvement on its predecessor." It beats GPT-5.5 regular across at least 12 benchmarks, including most knowledge-work, coding (issue-level), agentic tool-use, and long-context benchmarks. GPT-5.5 wins on terminal/CLI workflows and is roughly tied on web browsing and graduate-level science. The bigger signal sits in Anthropic's internal capability ladder: Opus 4.8 lands between Opus 4.7 and the more capable Claude Mythos Preview, which is currently restricted to a small number of organizations under Project Glasswing for cybersecurity work. Anthropic says it expects to bring "Mythos-class models to all our customers in the coming weeks" once additional cyber safeguards are in place. Several enterprise partners cited material gains. Databricks reported that Opus 4.8 unlocks "a step change in agentic reasoning" inside its Genie data agent, at "61% cheaper token cost than Opus 4.7" thanks to multimodal efficiency on PDFs and diagrams. Hebbia cited better citation precision and token efficiency on dense financial filings. Devin-maker Cognition said the release "translates directly into faster capability gains for engineers" and noted Opus 4.8 fixed comment-verbosity and tool-calling issues from 4.7. A computer-use vendor reported 84% on Online-Mind2Web, a jump over both Opus 4.7 and GPT-5.5. Dynamic workflows: hundreds of parallel subagents Alongside the model, Anthropic launched a research preview of dynamic workflows in Claude Code -- a feature designed for tasks too large for a single context window. Claude plans the work, spawns hundreds of parallel subagents, then verifies its own outputs before reporting back. Anthropic's example: a codebase-scale migration "across hundreds of thousands of lines of code from kickoff to merge, with the existing test suite as its bar." Dynamic workflows is available on Claude Code's Enterprise, Team, and Max plans. Two smaller additions round out the release: Honesty, and an "evaluation awareness" caveat Anthropic is leading with honesty as a headline trait. The company's alignment team reports Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked," and that misaligned behavior rates are now "substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview." Indeed, a bar chart released by Anthropic shows how close Opus 4.8 is to the still selectively released Mythos in terms of its misalignment (a lower score is better), coming in at roughly 1.9, down from 2.5 for Opus 4.7 and effectively tied with the more capable, restricted Mythos Preview. The score is based on roughly 2,600 simulated investigation sessions per model. The 244-page system card publicly released by Anthropic also goes into greater detail on specific categories of misalignment -- whether a model produces potentially harmful content around "military-grade weapons," "harmful sexual content", "disallowed cyberoffense", and "undermining liberal democracy," and again, across all of them, Opus 4.8 scores markedly better than 4.7 or Sonnet 4.6, and comes quite close to Mythos. Anthropic flags one finding it considers "the most concerning" from training: Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn't told it was being evaluated. In other words: the model knows it is likely being graded, and produces a response it thinks will earn it a good grade on the test, not one it would necessarily produce if it thought it wasn't being graded. Anthropic says this didn't translate into worse observable behavior -- Opus 4.8 shows fewer misleading task-success claims than prior models -- but calls it "a concerning trend that could complicate training in the future." Preliminary interpretability work also found unverbalized grader-related reasoning in roughly 5% of training episodes. Anthropic ran the model through a one-week live bug bounty for prompt injection -- a first -- and concluded Opus 4.8 sits between Opus 4.7 and Sonnet 4.6 on robustness, ahead of "all comparable frontier models" tested, with deployed safeguards bringing browser-use attack success rates to near zero. What's next? Anthropic teased two trajectories. Near-term: cheaper models that provide "many of the same capabilities as Opus." Longer-term: the Mythos-class models, which the company says represent higher intelligence than Opus but require stronger cyber safeguards before general release. For now, Opus 4.8 is positioned as the new go-to enterprise and development workhorse -- slightly smarter than 4.7, dramatically cheaper to run fast, and noticeably more honest about what it doesn't know.
[16]
Anthropic releases new model, Opus 4.8
Why it matters: The new model still lags the performance of Mythos, Anthropic's most advanced, but the company says Mythos-class models are expected "in the coming weeks." Driving the news: Anthropic said Opus 4.8 outperformed competitors on a number of key benchmarks, including agentic coding, reasoning, financial analysis and knowledge work. * The model has a fast mode that allows it to work at 2.5x the speed while being three times cheaper than it was for previous models. * The company is launching other updates alongside the model, including a "dynamic workflow" that lets Claude run multiple subagents at once and a control panel that lets users decide how much "effort" Claude puts into a response. Zoom in: Opus 4.8 scored well on "prosocial" traits like supporting user autonomy and acting in the user's best interest, achieving a similar level to Mythos, Anthropic's most powerful model, the company said. * This kind of metric may get more attention after a study on AI agents running simulated towns went viral. It showed Claude's agents were the least likely among the models to commit crime. Between the lines: Customers are increasingly looking for ways to use AI more affordably. * This launch from Anthropic emphasizes tools that allow customers to tailor their usage to how much they want to spend. * High effort settings, for example, will "think" for longer, while lower effort settings provide quicker responses that take up fewer tokens, letting users hit rate limits more slowly. Yes, but: Opus 4.8 still isn't Mythos. * The company first released Mythos to just a handful of partners due to security concerns. * A Mythos-class model should be available to all customers "in the coming weeks" following the development of stronger safeguards, according to Anthropic. What we're watching: How model updates account for the increased customer focus on price going forward.
[17]
Anthropic's Claude Opus 4.8 Is Here: Better AI Coding, Smarter Safety -- Same Huge Price
Opus 4.8's alignment scores are now comparable to Claude Mythos Preview, Anthropic's restricted frontier model, with rates of deceptive or misuse-friendly behavior substantially lower than its predecessor. Six weeks. That's how long it took Anthropic to go from Opus 4.7 to Opus 4.8. The new model is faster and smarter on benchmark tests, and ships with a suite of new features -- but the price didn't move: It's $5 per million input tokens and $25 per million output tokens, same as before. There's also a fast mode that runs the same model at 2.5 times the speed for $10 input and a whopping $50 output per million. Anthropic says that rate is now three times cheaper than what fast mode cost on previous models, which is a nice way of saying it was much more expensive before. SWE-bench Pro is probably the most important benchmark to watch and have an idea of how good this model is. It measures whether an AI can actually solve hard, multi-language software engineering problems drawn from real production codebases -- scored as a percentage of problems passing. On that test, Opus 4.8 hit 69.2%, up from 64.3% for Opus 4.7. OpenAI's GPT-5.5 scored 58.6%, and Google's Gemini 3.1 Pro trailed at 54.2%. For a model at the same price point, that's a meaningful jump. On Humanity's Last Exam -- expert-level questions across dozens of academic disciplines, scored as a percentage correct -- Opus 4.8 reached 49.8% without tools and 57.9% with them, ahead of all three rivals. OSWorld-Verified, which tests real-world computer use tasks like navigating software UIs, came in at 83.4%, nudging past Opus 4.7's score of 82.8%. The one loss: Terminal-Bench 2.1, which measures AI performance on command-line tasks. GPT-5.5 leads at 78.2%, while Opus 4.8 scores 74.6% -- better than Opus 4.7's 66.1% and ahead of Gemini's 70.3%, but second place is still ultimately losing. Five ways to think Anthropic is now letting users control how hard the model thinks. "High" is the default and handles most tasks well, while "Extra" -- called "xhigh" inside Claude Code -- spends more compute for harder problems. "Max" is the deep end. "Low" and "Medium" dedicate less tokens to the same task, saving some time in exchange for accuracy. The effort control sits alongside the model selector in claude.ai and Cowork, available on all plans. Anthropic says default high uses roughly the same tokens as Opus 4.7's default with better results -- which is either impressive engineering or good messaging, and probably both. It's also important to remember that Anthropic's new tokenizer for Opus uses more tokens per task. So Claude users are inevitably bound to burn a lot more money to get things done, should they choose Opus instead of Claude Sonnet -- a less capable model, but probably good enough for everyday tasks and complex problems that don't reach the level of frontier science or coding. Rate limits in Claude Code were also raised to absorb the higher token spend that the Extra and Max settings produce. Almost as safe as Claude Mythos Anthropic's alignment team said Opus 4.8 "reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user's best interest." More concretely: deception rates and misuse-cooperation rates came in substantially lower than Opus 4.7, and comparable to Claude Mythos Preview -- Anthropic's most locked-down model. Opus 4.8 is also four times less likely than 4.7 to let bugs in its own code slide past without flagging them. That Mythos comparison deserves context. Mythos is a tier above Opus entirely -- Anthropic describes it as "larger and more intelligent than our Opus models." It currently exists only as a preview, accessible to a handful of vetted organizations doing cybersecurity work through Project Glasswing. The U.K.'s AI Security Institute found it could autonomously complete "The Last Ones," a 32-step corporate network attack simulation that usually takes human red teams 20 hours. That's why it's not yet for sale. Anthropic says stronger cyber safeguards are in progress, and expects to bring Mythos-class models to everyone "in the coming weeks." Also shipping today: dynamic workflows in Claude Code, in research preview. The feature lets Claude write its own orchestration scripts and spin up parallel subagents in a single session, verify their outputs, and report back -- just like what Hermes has been doing for a while now. Dynamic workflows are available for Enterprise, Team, and Max plan users, and Anthropic is upfront that they burn significantly more tokens than a standard Claude Code session. The widening price gap Anthropic's $5/$25 pricing looks very different next to what China has been doing lately. DeepSeek V4 Pro made its 75% discount permanent last week: $0.435 per million input tokens and $0.87 per million output tokens. Xiaomi MiMo V2.5 Pro runs at the same rates via providers like OpenRouter. Anthropic's fast mode costs $10 input and $50 output per million -- more expensive than standard Opus 4.8 itself, and roughly 57 times more per output token than DeepSeek V4 Pro. Corporations have already spent millions of dollars in inference on American models. Go wild with Opus and your enterprise may reach millions of dollars pretty quickly. Anthropic's answer to the price gap is quality and safety. On SWE-bench Pro, Opus 4.8 beats both Chinese models. On alignment, neither comes close to Anthropic's published benchmarks. Those things matter in production environments where a model quietly cooperating with bad inputs is an actual risk -- regulated industries, legal work, and anything where "it seemed fine" isn't an acceptable post-incident report. For everyone else, the gap is hard to ignore. We tested it We ran a quick coding test to create a 3D zombie game to see how Claude Opus 4.8 stacks against ChatGPT and DeepSeek, arguably its most popular competitors from the U.S. and China. We set Opus 4.8 on default high, GPT-5.5 on high effort, and DeepSeek V4 Pro on high effort -- three models, one prompt, no retries. GPT-5.5 finished first. Its game had no zombie visuals and no sound effects. It was fast, sure, but it missed the brief entirely. DeepSeek V4 Pro came in second with mouse movement, actual zombie characters, sound effects, solid mechanics, and a clean aesthetic. No complaints there. Opus 4.8 took roughly three times as long as GPT-5.5, but delivered the best splash screen, the best zombie designs, the best game mechanics, and decent sound effects. It was the slowest, but the best output. Still, that's probably not enough to justify using it over DeepSeek, given the cost gap. All the games are available on our Itch.io Profile. GPT-5.5 generated Zombie Typing, Opus generated Typing Dead, and DeepSeek v4 Pro generated a game without a name that takes you right into the action. Let's call it TypeSeek. A full comparative review is coming. For now: Claude Opus 4.8 codes better than GPT-5.5 and Opus 4.7 for this kind of task, at the same price Anthropic has charged since 4.7. Developers who were already paying $5 per million tokens just got a better model for free.
[18]
Anthropic's Claude Mythos AI Model Nearing Release After Raising Cybersecurity Alarms
Anthropic has limited Claude Mythos access through its Project Glasswing program. Anthropic said Thursday that it expects to expand access to its Claude Mythos AI models "in the coming weeks," signaling the company may soon move beyond the tightly restricted rollout surrounding the cybersecurity-focused system. "We're making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks," Anthropic said alongside the announcement of its new Opus 4.8 model release. The statement is Anthropic's clearest indication yet that wider availability for Mythos is in the works, following months of warnings from researchers, governments, and cybersecurity experts about the model's capabilities. Anthropic did not say what safeguards still need to be completed before Mythos receives a wider rollout, or whether all customers will receive the same level of access once the model becomes more broadly available. Anthropic did not immediately respond to a request for comment by Decrypt. Users on Myriad -- a prediction market platform operated by Decrypt's parent company, Dastan -- increasingly believe that the model will release by the end of June, giving that possibility a 44% chance as of that writing. That's up from just 17.5% this morning. Claude Mythos first surfaced in March after draft Anthropic blog materials leaked online. At the time, Anthropic described Mythos as "the most powerful AI model we've ever developed," and positioned it as a new tier above the company's cutting-edge Opus models. "Although Mythos is currently far ahead of any other AI model in cyber capabilities, it presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders," the company wrote. Anthropic has since restricted access to Mythos through Project Glasswing, a program that gives select technology companies, security researchers, and government partners access to the model under controlled conditions. The company has argued that Mythos could help defenders identify and patch software vulnerabilities before attackers exploit them. However, security researchers and government agencies warned that the same capabilities could speed up cyberattacks. Those concerns intensified after the U.K.'s AI Security Institute found Mythos could autonomously complete a 32-step simulated corporate network attack during testing. In April, Mozilla said Mythos identified 271 vulnerabilities in Firefox during internal security evaluations. Earlier this month, security startup Calif claimed a preview version of the Mythos model helped researchers develop an exploit chain targeting Apple's M5 Mac chips. The model has also become part of a broader debate inside the AI industry over how advanced systems should be released, and whether AI developers were using fear of an AI apocalypse to push their products. During an interview with the "Core Memory" podcast last month, OpenAI CEO Sam Altman accused Anthropic of using "fear-based marketing," arguing that warnings about cybersecurity risks could become a justification for limiting access to powerful AI systems.
[19]
Anthropic Says Its Latest Claude Model Is the 'Most Honest' Yet
The newest AI model model to hit the market is advertised as being remarkably honest. Anthropic has released Claude Opus 4.8, the next version of its large-sized series of Claude AI models, and allegedly its most honest model yet. Along with the new model, Anthropic is releasing new features for Claude, including the ability for people to control how much effort the model puts into any given task. It is also teasing the imminent release of Claude Mythos, an even larger and more powerful version of Claude. The company says that one of the most prominent improvements to Opus 4.8 is its honesty. It's a well-known fact that AI models can hallucinate and jump to conclusions, but according to Anthropic, early Opus 4.8 testers reported that the new model was "more likely to flag uncertainties about its work and less likely to make unsupported claims." Anthropic didn't offer any explanations regarding how they made the new model so much more truthful. According to Anthropic's internal testing, Opus 4.8 has state-of-the-art coding abilities. In SWE-Bench Pro, a benchmark for judging the performance of AI-powered coding agents, Opus 4.8 scored a record 69.2 percent, compared with its predecessor Claude Opus 4.7's 64.3 percent and rival OpenAI's GPT-5.5's 58.6 percent.
[20]
The Hidden Strategy Behind Anthropic's New Claude Opus 4.8 Release
Anthropic has released Opus 4.8, introducing updates designed to enhance its AI's performance in areas like coding accuracy, reasoning and task management. A notable feature is the addition of dynamic workflows, which break down complex operations into smaller, verifiable subtasks to streamline automation. According to Universe of AI, these updates reflect Anthropic's attempt to meet user demands and remain competitive against alternatives such as OpenAI's GPT-5.5, though the absence of the anticipated Mythos model adds complexity to understanding their broader direction. Learn how features like effort control and fast mode aim to address varied user requirements. Discover the practical applications of dynamic workflows in scenarios like security audits and large-scale code migrations. Gain insight into Anthropic's emphasis on model alignment and transparency as part of its strategy to rebuild user confidence and refine its position in the AI field. Key Enhancements in Opus 4.8 Opus 4.8 builds upon the foundation of its predecessor, Opus 4.7, by introducing targeted improvements to its core functionalities. These enhancements are designed to address user needs while maintaining a competitive edge. Key updates include: * Improved Coding Accuracy: The model demonstrates a fourfold improvement in identifying and correcting flaws within its own code, making it a more reliable tool for developers. * Enhanced Reasoning: Upgrades in logical processing aim to improve the model's decision-making and problem-solving capabilities, allowing it to handle complex scenarios with greater precision. * Better Task Handling: The model is now equipped to manage intricate and multi-layered tasks more efficiently, reducing the need for extensive user intervention. Despite these advancements, Anthropic has described Opus 4.8 as an evolutionary update rather than a new innovation. The pricing structure remains unchanged, with input tokens priced at $5 per million and output tokens at $25 per million. While this consistency may appeal to existing users, it does little to differentiate Opus 4.8 in an increasingly cost-sensitive market. Focusing on Honesty and User Alignment A notable aspect of Opus 4.8 is its emphasis on improving model honesty and alignment with user intent. Anthropic has implemented measures to reduce instances of deceptive or misleading outputs, aiming to enhance trust and reliability. The update also focuses on delivering responses that better align with user needs, reflecting a broader industry trend toward creating AI systems that are not only powerful but also trustworthy. These claims, however, require extensive real-world testing to validate their effectiveness. For Anthropic, rebuilding user trust is a critical objective as it seeks to compete in a market dominated by established players. The company's ability to deliver on these promises will likely influence its standing in the AI industry. Explore further guides and articles from our vast library that you may find relevant to your interests in Anthropic. Dynamic Workflows: Automating Complex Tasks One of the standout features of Opus 4.8 is the introduction of dynamic workflows, currently available as a research preview. This feature is designed to automate the orchestration of complex tasks by breaking them into smaller, manageable subtasks. It also uses multiple AI agents to ensure accuracy and reliability through integrated verification loops. Dynamic workflows are particularly well-suited for large-scale tasks, including: * Bug detection and resolution * Comprehensive security audits * Code migrations across platforms While this feature holds significant potential, its full capabilities and limitations remain unclear as it is still in the research phase. If successfully implemented, dynamic workflows could pave the way for more autonomous and scalable AI solutions, offering users a glimpse into the future of task automation. New Features: Effort Control and Fast Mode Opus 4.8 introduces two additional features aimed at enhancing user flexibility and operational efficiency: * Effort Control: This feature allows users to customize the depth and speed of the model's responses based on their specific requirements. Available across all pricing plans, it offers versatility for a wide range of use cases, from detailed analyses to quick summaries. * Fast Mode: Designed for time-sensitive tasks, Fast Mode delivers quicker responses at a premium cost of $10 per million input tokens and $50 per million output tokens. This option is particularly appealing to businesses and professionals who prioritize speed over cost. These features reflect Anthropic's commitment to providing customizable solutions that cater to diverse user needs. However, their real-world adoption and impact on user satisfaction remain to be seen, as they will depend on how effectively these tools integrate into existing workflows. Navigating a Competitive Market Anthropic faces significant challenges in a market where OpenAI's GPT-5.5 continues to dominate. The anticipated release of GPT-5.6 further intensifies the competition, raising the stakes for all players in the AI industry. Anthropic's decision to delay the release of its advanced Mythos model suggests a cautious and strategic approach. According to the company, Mythos is undergoing rigorous cybersecurity testing and may be tied to Project Glasswing, an initiative aimed at redefining Anthropic's market strategy. While Opus 4.8 introduces meaningful improvements, it falls short of being the fantastic leap that some users may have hoped for. Anthropic has acknowledged this, framing the update as a calculated step forward rather than a innovative change. The company's ability to deliver on its promises with Mythos will likely determine its future trajectory in the AI market. Looking Ahead Opus 4.8 represents Anthropic's effort to refine its AI offerings and address evolving user needs. The update brings notable enhancements in coding accuracy, reasoning and task handling, alongside innovative features like dynamic workflows and effort control. However, these improvements are incremental and their long-term impact on Anthropic's market position remains uncertain. As the company prepares for the eventual release of Mythos, its ability to deliver fantastic solutions will be critical. In a competitive market where innovation and trust are paramount, Opus 4.8 is a step in the right direction. Whether it will be enough to regain market share and compete effectively with industry leaders like OpenAI remains to be seen. Media Credit: Universe of AI Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[21]
Anthropic Debuts More Honest AI Model As Competition Intensifies
Anthropic has rolled out Claude Opus 4.8, replacing Opus 4.7 with improved performance and new controls for consumers and developers. The update delivers stronger results across a range of evaluations. For example, it aims to address a general problem with AI models: "They sometimes jump to conclusions." Opus 4.8 flags uncertainties about its work and is less likely to make unsupported claims, the company claims. Alongside the update, Claude users now get an "effort" setting that trades speed for deeper processing, while Claude Code adds a research preview feature called dynamic workflows for handling larger tasks. Early testers saw improvements in reliability and decision-making for agent-style work. They emphasized efforts to reduce overconfident outputs without supporting evidence. For developers, the Messages API now supports system entries within the messages list. It allows teams to adjust instructions mid-run without disrupting prompt caching or routing changes through a user turn. The company also said Opus 4.8's fast mode runs 2.5x faster and is priced below earlier fast options, while the model now defaults to higher-effort processing. Claude Code rate limits have also been increased to support heavier token usage at "extra" and "max" effort levels. Anthropic added that it is working on future models that match Opus-level capability at lower cost, as well as a new class of models designed to exceed Opus in intelligence. As part of Project Glasswing, certain organizations are testing Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be released more broadly, the company added. "We're making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks," the company noted. Last week, Anthropic shared a sweeping update on Project Glasswing, saying its artificial intelligence-assisted security testing effort has already uncovered "more than 10,000 high-or critical-severity vulnerabilities" across widely used software systems. Photo Courtesy: Stockinq on Shutterstock.com This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. Market News and Data brought to you by Benzinga APIs To add Benzinga News as your preferred source on Google, click here.
[22]
How to Automate Multi-Step Tasks with Claude Opus 4.8 Workflows
Anthropic's latest release, Claude Opus 4.8, builds on the strengths of its predecessor while addressing key limitations that users highlighted in earlier versions. One of the standout improvements is its enhanced ability to handle ambiguous queries, a feature designed to provide more nuanced and context-aware responses. The AI Advantage explores how this update positions Claude Opus 4.8 as a competitive alternative to models like GPT-4.5, particularly with its focus on enterprise-grade features such as dynamic workflows for automating multi-step tasks. In this breakdown, you'll gain insight into the model's expanded platform availability, its performance benchmarks against leading competitors and its practical applications for both individual users and large organizations. Explore how features like customizable effort levels and improved natural language understanding contribute to its adaptability across diverse use cases. Whether you're interested in its role in enterprise operations or its creative potential, this overview provides a clear understanding of what Claude Opus 4.8 brings to the table. What's New in Claude Opus 4.8? Claude Opus 4.8 directly responds to user feedback on the 4.7 model, which faced criticism for overly literal interpretations. The updated version refines its ability to handle ambiguous queries, bringing its performance closer to the well-regarded Claude 4.6 while introducing new features tailored to diverse user needs. Key updates include: * Enhanced ambiguity handling for nuanced and complex queries. * Broader availability across platforms, including web apps, cloud environments, and API integrations. * Improved user experience for both individual users and enterprise clients. These enhancements aim to deliver a seamless and adaptable AI experience, making sure that users across various industries can use its capabilities effectively. Dynamic Workflows: A Key Feature for Enterprises One of the standout features of Claude Opus 4.8 is its dynamic workflows, specifically designed for enterprise users and those on maximum-tier plans. This functionality automates complex, multi-step tasks by deploying sub-agents to handle specific components of larger projects, significantly improving efficiency. Examples of dynamic workflow applications include: * Large-scale code refactoring for software development teams. * Creation of custom dashboards tailored to organizational needs. * Streamlined project management for intricate workflows involving multiple stakeholders. While this feature offers substantial productivity gains, it requires careful resource management. Tasks involving hundreds of sub-agents can quickly consume cloud account limits and tokens, making monitoring essential. Despite these challenges, enterprise users have praised the feature for its ability to simplify and accelerate complex processes, making it a valuable tool for organizations handling large-scale operations. Check out more relevant guides from our extensive collection on Claude Opus that you might find useful. Performance and Benchmark Insights Claude Opus 4.8 demonstrates significant performance improvements over its predecessor, with benchmark tests showing it rivals GPT-4.5 in key areas such as natural language understanding and task execution. Real-world evaluations, including those conducted on platforms like Deepswe, highlight its ability to handle nuanced queries and generate creative outputs effectively. User feedback has further validated these advancements, with many praising the model's adaptability and enhanced creativity. These attributes make Claude Opus 4.8 a strong contender in the competitive AI landscape, particularly for users seeking a balance between precision and flexibility. How Does It Compare to Competitors? In the competitive AI market, Claude Opus 4.8 holds its ground against leading models. While GPT-4.5 excels in agentic applications such as task automation and decision-making, Claude Opus 4.8 offers a compelling alternative with its improved ambiguity handling and dynamic workflows. Gemini 3.5 Flash, on the other hand, has struggled to meet expectations in advanced use cases, leaving room for Claude Opus 4.8 to gain traction, particularly among enterprise users seeking robust and reliable AI solutions. Its ability to adapt to diverse user needs and deliver consistent results makes it a preferred choice for many. User Feedback and Preferences The reception to Claude Opus 4.8 has been overwhelmingly positive, with users highlighting several key improvements that set it apart from its predecessor. These include: * Enhanced creative outputs and more engaging responses. * Greater control through adjustable effort levels for specific tasks, allowing users to fine-tune outputs. * Improved adaptability to a wide range of user needs, from casual queries to enterprise-level operations. These enhancements reflect Anthropic's commitment to incorporating user feedback into its development process. By addressing user concerns and introducing meaningful updates, Claude Opus 4.8 continues to evolve in line with expectations, making sure its relevance in an ever-changing AI landscape. Broader AI Ecosystem Updates The release of Claude Opus 4.8 coincides with significant developments in the broader AI industry, highlighting the rapid pace of innovation and adoption. Key trends include: * DuckDuckGo reporting a 30% increase in app installs following updates to Google's AI search capabilities, demonstrating the growing demand for privacy-focused AI tools. * Google introducing personalized AI search results, sparking mixed reactions regarding their accuracy and relevance. * The rise of AI-powered productivity tools, such as Chat for PowerPoint, which simplify tasks like creating professional presentations and enhance workplace efficiency. These trends underscore the expanding influence of AI across both consumer and enterprise applications, with models like Claude Opus 4.8 playing a pivotal role in shaping the future of AI-driven technologies. Final Thoughts on Claude Opus 4.8 Claude Opus 4.8 represents a significant step forward for Anthropic, addressing past criticisms while introducing features that cater to a wide range of users. Its dynamic workflows, improved ambiguity handling, and competitive performance make it a strong contender in the evolving AI market. Whether you are an enterprise user managing complex workflows or an individual seeking advanced capabilities, Claude Opus 4.8 offers a versatile and compelling option. As the AI landscape continues to evolve, innovations like these will play a critical role in defining the future of AI-driven solutions, making sure that users can navigate increasingly complex challenges with confidence. Media Credit: The AI Advantage Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[23]
Anthropic rolls out Claude Opus 4.8 with improved reasoning and coding performance
Anthropic has announced Claude Opus 4.8, the latest version of its flagship AI model. It builds on Opus 4.7 with improvements across coding, reasoning, and agentic tasks while maintaining the same pricing structure. The release also introduces new workflow controls, developer features, and efficiency upgrades across Claude products. Claude Opus 4.8 Claude Opus 4.8 is the newest version of Anthropic's flagship model. It is designed to improve performance in complex, multi-step tasks such as coding, research, and system-level workflows. The model shows improved results across benchmarks for coding, reasoning, agentic skills, and practical knowledge work. It also demonstrates better consistency in long-context tasks and improved handling of uncertainty during execution. Key features Claude Opus 4.8 focuses on improvements in reasoning accuracy, coding reliability, and workflow efficiency. It also reduces unsupported or overconfident outputs and improves response consistency in complex tasks. Core improvements include * Stronger performance across coding, reasoning, and agentic benchmarks * Around 4× fewer unflagged coding issues compared to Opus 4.7 * Better long-context and multi-step task stability * Improved uncertainty handling and response transparency Dynamic workflows in Claude Code Claude Code introduces Dynamic Workflows, designed for large-scale engineering tasks. It enables the model to break down complex work into structured steps and execute them in parallel. This system supports running hundreds of sub-agents in a single session, processing large codebases, and validating outputs using existing test frameworks before final results are delivered. It is intended for enterprise-scale tasks such as system migrations and large refactoring projects. Effort control system Claude Opus 4.8 adds effort controls in claude.ai and Cowork, allowing users to adjust compute usage depending on task complexity. This provides flexibility between speed, cost, and output quality. * Low effort: faster responses, lower token usage * High effort (default): balanced performance and quality * Extra / xhigh: deeper reasoning for complex tasks * Max: highest compute for demanding workflows Higher effort levels improve output quality but increase token usage, with updated rate limits to support heavier workloads. Faster and cheaper fast mode Fast mode has been optimized in Opus 4.8, delivering improved efficiency for large-scale and real-time workloads. It offers up to 2.5× faster performance and is around 3× cheaper than previous fast mode versions. Developer API updates The Messages API now supports system entries inside the messages array. This allows developers to update system instructions during task execution without breaking prompt caching. It enables runtime adjustments such as modifying permissions, token limits, or execution context without restarting workflows. Safety Anthropic reports improved safety and alignment in Claude Opus 4.8 compared to Opus 4.7. The model shows reduced unsupported or misleading outputs and improved handling of uncertainty. It also demonstrates stronger alignment with user intent and more stable behavior in complex workflows. Internal evaluations indicate lower rates of misalignment and improved transparency in reasoning tasks. Pricing and availability Claude Opus 4.8 is available globally across Claude products and via API under Anthropic, with pricing unchanged from the previous version. Standard pricing * $5 per million input tokens * $25 per million output tokens Fast mode pricing * $10 per million input tokens * $50 per million output tokens The model is available in claude.ai, Claude Code, Cowork, and API access under the identifier . What's next Anthropic describes Opus 4.8 as a steady upgrade while continuing work on more advanced AI systems. Future development includes lower-cost models with similar capabilities and a new class of higher-intelligence models beyond Opus. The company is also advancing "Project Glasswing," where Claude Mythos Preview is being tested for cybersecurity applications under controlled environments. These systems require stronger safety safeguards before broader release, with expansion expected in the coming weeks.
[24]
Anthropic Releases Opus 4.8 as Battle for AI Supremacy Hots Up
This is the quickest upgrade from Anthropic and could be a result of the recent upgrades announced by OpenAI and Google to their AI models or the lukewarm response to the earlier model. Anthropic has released Opus 4.8, the newest version of their most advanced publicly available model, barely 40 days after they released its predecessor, making it their fastest update yet. What caused this rush? Could be the recent upgrades announced by OpenAI and Google to their AI models or the lukewarm response to the earlier model. Coming to Opus 4.8, Anthropic says in a blog post: "Claude Opus 4.8. It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator. It's available today for the same price." As expected, the new release comes with best-in-class benchmark results as well as a renewed focus on how it manages bad or somewhat uncertain data. The company said that early testing results indicate that the new model is "more likely to flag uncertainties about its work and less likely to make unsupported claims." It is well-known that Anthropic and OpenAI are involved in a race to the finish with their respective IPOs and every new launch or release targets the possibility of higher valuations and better investor support. OpenAI had launched Codex recently while Google came out with the Gemini Flash model. Anthropic says Opus 4.8 offers several new features whereby users on claude.ai will have control over the amount of effort Claude puts into a task. The new "dynamic workflow" feature allows users to tackle large-scale challenges while the fast mode at 2.5x speed is three times cheaper than it was for previous models. This aspect of Opus 4.8 features right at the top in Anthropic's PR material and is quite obviously directed at reports around the cost of AI usage. A report by author Madison Mills referred to corporate America walking up to AI price tags that spoke of a CFO fretting about a "half-billion dollar accidental AI bill." Another area where Anthropic seems to be blowing its own trumpet with Opus 4.8 is on the honesty quotient. A general problem with all AI models is they jump to conclusions claiming to have made progress despite thin evidence. "Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims," they say. As for Dynamic Workflows that got launched in a limited research preview, the company says it designed the system to help larger models like Opus manage complex tasks across hundreds of parallel sub-agents. "Claude Code alongside Opus 4.8 can now carry out codebase-scale migrations across hundreds of thousands of lines of code from kick-off to merge, with the existing test suite as its bar," the post says. The blog also announced that the Messages API will henceforth accept system entries inside the messages array where developers can update Claude's instructions mid-task without breaking the prompt cache or routing the update through a user turn. This can be used in a given harness to update permissions, token budgets, or environment context as an agent runs. In a throw forward, Anthropic describes Opus 4.8 as a tangible improvement over its predecessor and says they are now working on developing releasing models that provide many of the same capabilities as Opus at a lower cost. "Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released, the post said.
[25]
Anthropic launches Claude Opus 4.8 with upgraded code and reasoning skills By Investing.com
Investing.com -- Artificial intelligence heavyweight Anthropic has officially rolled out its latest flagship model, Claude Opus 4.8, to users worldwide. The updated architecture builds directly upon the foundations of the previous Opus 4.7 iteration, delivering performance improvements across core technical benchmarks. This release introduces a suite of advanced features designed to grant developers and enterprise clients greater control over their computational workflows. Among the key upgrades is a new effort control setting on Claude.ai, which allows users to dynamically scale the model's depth of reasoning based on task complexity. Weigh what this news means for stocks by upgrading to InvestingPro - get 50% off today The safety and structural integrity of the software have also received significant attention in this generation. Internal alignment assessments indicate that Opus 4.8 is four times less likely than its predecessor to let coding errors pass unremarked. For large-scale enterprise engineering, the company is introducing a research preview of dynamic workflows within Claude Code. This specific feature enables the system to coordinate hundreds of parallel subagents to execute massive codebase migrations seamlessly. Financially, Anthropic is maintaining its existing pricing structure for standard usage to keep the technology accessible to its current developer base. However, the operational economics of the platform have improved, with the model's accelerated fast mode now running three times cheaper than previous versions. This major product milestone arrives at a critical juncture for the San Francisco-based artificial intelligence startup. The firm is reportedly closing a massive $30-plus billion pre-IPO funding round that could value the AI developer at over $900 billion. While leadership has not yet confirmed an official date to go public, the company is actively taking key steps to position itself for a public listing as early as 2026. This aggressive capital strategy comes as SpaceX and OpenAI similarly prepare to list their respective companies on public exchanges.
[26]
Anthropic to roll out Claude Mythos in coming weeks, launches Opus 4.8
May 28 (Reuters) - Anthropic on Thursday said it is launching an upgraded Claude Opus 4.8 model even as it works to release its powerful, market-moving Mythos model to all customers in the coming weeks. Mythos is the AI lab's large language model with advanced cybersecurity capabilities that have raised concerns among executives and world leaders about its impact. As a part of Project Glasswing, major tech firms such as Amazon, Microsoft and Apple are permitted to use Mythos for cybersecurity purposes. The company said Opus 4.8 will be available for the same price as its predecessor and shows improvements across benchmarks, particularly in honesty. Early testers report the model is more likely to flag uncertainties around its work and less likely to make unsupported claims, Anthropic added. "A general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin," the Claude maker said. (Reporting by Zaheer Kachwala in Bengaluru; Editing by Vijay Kishore)
[27]
Claude Opus 4.8 is uncertain: Here's why that's a good thing
Usually, when Anthropic announces a new model, I talk about how it is smarter, more token-friendly, or how much it outperforms the previous one on benchmarks. That is still the case with Claude Opus 4.8 but there is something else more interesting about it that I would rather get into. It has always been annoying to me when an AI chatbot has confidently continued to give answers when it is clear as day that it has no idea, Opus 4.8 might just be changing that. According to Anthropic, early testers have said that "Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims." This is something that we are still yet to see but if it is true, it is a much bigger story than any benchmark. Also read: "Don't build around one processor": DFI President Smit Shah India's drone supply chain Numbers back this up to some extent. Anthropic reports that Opus 4.8 is about four times less likely than its predecessor to fail to flag flaws in the code it has generated. That's no small accomplishment. Anyone who has dug through the code that an AI was absolutely certain of - only to find it plagued with subtle errors - knows the significance of such a metric all too well. Confidence that is unfounded is not an asset, it's a liability. The honesty issue in AI is not merely a technical one. What Anthropic describes here is a system that is aware of its limitations - that doesn't overpromise but instead tries to deliver on its promises. The entire journey of artificial intelligence has taken us in the direction of an appearance of certainty - perhaps, because it's what we have been seeking all along. But what good is certainty when you know next to nothing? An answer hedged in uncertainty may seem like a less confident response. However, the opposite just might be true. Also read: What happens to our brain in deep sleep? AI finally has an answer Opus 4.8 also remains an improvement over its predecessor in coding, reasoning, and agentic tasks. Furthermore, Opus 4.8 retains the price of Opus 4.7, which is $5 per million input tokens and $25 per million output tokens. Anthropic has also introduced a "fast mode," which operates twice as fast and costs three times less than its equivalent cost for the previous generation of models. Moreover, control over effort has also been enabled on claude.ai, and users can now increase or decrease effort spent by the model on particular tasks. None of that, though, is the story. The story is a model that, when it does not know something, says so. In a space that has been drowning in confident nonsense, that is the most interesting thing Anthropic has shipped in a while.
[28]
Claude Opus 4.8 is here but Anthropic is already teasing Mythos class AI models: What you should know
'One of the most prominent improvements in Opus 4.8 is its honesty,' the company said. Anthropic has announced that it is upgrading its Claude Opus with a new version dubbed, Opus 4.8. The company says the new model improves on Claude Opus 4.7 with better agentic coding capabilities, computer use, knowledge work and more. Alongside the new model, Anthropic has also announced several features designed to give users more control and help developers handle larger projects. However, the bigger news may be what comes next. Here is everything you need to know, including key capabilities and availability of Claude Opus 4.8. Anthropic Claude Opus 4.8: Key capabilities AI models can sometimes make confident claims even when they are not fully sure. Anthropic says Opus 4.8 is much less likely to do this and is better at pointing out possible mistakes in its own work. 'One of the most prominent improvements in Opus 4.8 is its honesty,' the company said in a blogpost. According to the company, the model is also better at identifying flaws in code and is less likely to overlook problems. Anthropic's internal testing found that Opus 4.8 is around four times less likely than its predecessor to leave coding issues unmentioned. Anthropic also claims that Opus 4.8 performs better than its predecessor on tests of coding, agentic skills, reasoning, and practical knowledge work tasks. The company has also introduced a new effort control option on claude.ai and Cowork. This lets users decide how much thinking time Claude should spend on a task. Higher effort settings can produce better answers, while lower settings provide faster responses and use fewer resources. Another new feature is Dynamic Workflows for Claude Code. 'Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user,' the company explained. Also read: Apple's biggest Siri upgrade yet? AI chatbot, smarter search revamp and more tipped ahead of WWDC 2026 Claude Opus 4.8: Availability Claude Opus 4.8 is available globally starting today. Pricing remains unchanged from Opus 4.7, with input tokens costing $5 per million and output tokens costing $25 per million. Developers can access the model through Claude API. Also read: ChatGPT may soon let users share chats with colourful preview cards: What we know Mythos class AI models While Opus 4.8 is Anthropic's newest release, the company is already looking ahead. The company said it plans to release a new class of model with even higher intelligence than Opus. 'As part of Project Glasswing, a small number of organisations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We're making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks,' Anthropic said.
Share
Copy Link
Anthropic released Claude Opus 4.8 with significant improvements in honesty and coding capabilities, just over a month after version 4.7. The AI model is 4x less likely to make unsupported claims and shows notable gains in agentic coding. Meanwhile, the company plans to release its powerful Mythos-class AI model to all customers within weeks after restricting access through Project Glasswing.
Anthropic launched Claude Opus 4.8 on Thursday, delivering what the company describes as a "modest but tangible improvement" over its predecessor just over a month after releasing version 4.7
1
. The rapid iteration demonstrates how quickly the AI model landscape evolves, with the new large language model posting impressive benchmark gains particularly in agentic coding capabilities and what Anthropic calls "honesty in AI responses"4
.The most prominent improvement centers on reducing hallucination and unsupported claims. According to Anthropic, early testers found that Claude Opus 4.8 "is more likely to flag uncertainties about its work and less likely to make unsupported claims"
2
. The company's evaluations show the model is around 4x less likely than its predecessor to allow flaws in code it has written to pass unremarked3
. This addresses a persistent problem where AI models sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence5
.
Source: Decrypt
Beyond honesty improvements, Claude Opus 4.8 demonstrates substantial performance gains across multiple benchmarks. The model shows almost a 5 percentage point increase in agentic coding and over an 8 percentage point increase in agentic terminal coding compared to version 4.7
4
. Tom Pritchard, a staff engineer at Spotify who tested the model, noted that "Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes"3
.The model also introduces new features that give users more control over computational resources. Users can now direct the amount of effort Claude puts into a task through Claude.ai and Claude Cowork. Higher-effort responses will use more tokens and potentially deliver better results, while lower settings respond faster and hit rate limits more slowly
1
. In coding tasks, the default high effort setting spends a similar number of tokens as Claude Code Opus 4.7's default level but with better performance3
.
Source: NYT
Anthropic is launching a feature called dynamic workflows in research preview, designed for large-scale tasks such as codebase migrations across hundreds of thousands of lines of code. With dynamic workflows, "Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user"
2
. Rather than following a fixed plan, agents can adjust their priorities and tasks based on what they discover during their work3
. The debugging capabilities built into this feature connect directly to the model's improved honesty, as verifying results from hundreds of subagents requires reliable detection of uncertainty and failed outputs. Dynamic workflows will be available to Claude Code users on Enterprise, Team, and Max plans3
.
Source: Gizmodo
Related Stories
While releasing Claude Opus 4.8, Anthropic announced it is making "swift progress" on developing safeguards for its Claude Mythos Preview model and expects to bring Mythos-class AI model capabilities to all customers "in the coming weeks"
1
. The company has restricted access to Mythos through Project Glasswing, limiting it to a consortium of partners including major tech firms such as Amazon, Microsoft, and Apple for cybersecurity purposes5
.The restricted release stems from the model's enhanced cybersecurity capabilities, which Anthropic deemed advanced enough to warrant giving cybersecurity experts and major tech companies lead time to patch flaws found by the model. Security researchers have confirmed the model can find exploits much more quickly than human hackers, with Mozilla's latest Firefox version including more than 200 fixes identified by Mythos Preview
1
. However, the model's cost may limit its appeal to threat actors. Jake Williams, a cybersecurity researcher at IANS Research, found Mythos was 30 times as expensive in tests as the previous Opus model, placing it "outside the reach of many, including any commodity threat actors"1
.Darren Williams, founder and CEO of cybersecurity firm BlackFog, noted that "the window between a powerful model's release and broad adoption of defenses against it is always a vulnerable moment," highlighting the tension inherent in releasing such capable models even with safeguards
1
. The upcoming public release will reveal whether Mythos lives up to expectations and how effective Anthropic's cybersecurity guardrails prove in practice.🟡 compliments to the story and in right order.Summarized by
Navi
[4]
15 Apr 2026•Technology

24 Nov 2025•Business and Economy

06 Aug 2025•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Business and Economy
