7 Sources
[1]
Claude's new model is more 'honest' when it messes up
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest -- for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence." The AI lab claims that early testers have found that Opus 4.8 "is more likely to flag uncertainties about its work and less likely to make unsupported claims." In the company's evaluations, Opus 4.8 is "around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked." In addition to the honesty improvements, with Opus 4.8, users can direct the amount of effort Claude puts into a task. Higher-effort responses will use more tokens, giving users the option of lower-effort responses if they don't want to burn through their rate limits as quickly. Anthropic is also launching a feature called "dynamic workflows" in research preview, which the company says will let Claude "take on even bigger tasks." With dynamic workflows, "Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user."
[2]
Anthropic launches Opus 4.8, with honesty as its killer feature
Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways * Claude Opus 4.8 promises more honest AI answers. * Dynamic workflows can run hundreds of Claude subagents. * Fast mode gets cheaper, while regular Opus pricing stays put. Diogenes was a fourth-century B.C. Greek philosopher known for his performance art. He is said to have roamed the streets of Athens in the middle of the day, carrying a lit lantern, crying out, "I am looking for an honest man." If that myth were modernized for the present day, we'd all be looking for an honest AI. Anthropic is both announcing and releasing Claude Opus 4.8, a large language model it believes might have satisfied Diogenes' quest. "One of the most prominent improvements in Opus 4.8 is its honesty," the company said Thursday in a blog post. Also: Your Claude agents can 'dream' now - how Anthropic's new feature works Now, perhaps, this new frontier model will behave itself better. Anthropic reports that Opus 4.8 is less likely to make unsupported claims. It's also more likely to tell you when it's uncertain of an answer. "This is borne out in our evaluations, which show that Opus 4.8 is around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked," the company said. In Claude Code, I found Opus 4.7 to be a substantial improvement over 4.6. While 4.6 would often misinterpret instructions or deliver erroneous results, Opus 4.7 regularly tells me that the way it first looked at a problem didn't work, and it's taking a different tactic. Recent project assignments have shown a much greater degree of understanding than with 4.6. So, given the jump in quality from 4.6 to 4.7, which was subjectively quite noticeable over many sessions, I'm hoping we'll see the same in the jump from 4.7 to 4.8. Also: The 5 myths of the agentic coding apocalypse It would seem this is the case, at least according to Tom Pritchard, staff engineer at Spotify, who has already tested Opus 4.8. "Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes. It's a great model to build with," he said in the blog post. That'll be nice. A matter of effort Claude Code has had the ability to set effort since at least 4.7 (at least, that's when I first noticed it). Effort is essentially a measure of how much AI oomph the model throws at a problem, measured in tokens. In Opus 4.8, Claude Code's default of high effort produces what the company said is "the best overall balance of quality and user experience." In coding tasks, this default spends a similar number of tokens as the default level offered in Claude Code Opus 4.7, but with better performance. Also: Anthropic's Mythos is evolving faster than expected, reports AI safety agency This effort capability is now moving into Claude.ai and Cowork. With higher effort settings, Claude will "think more frequently and more deeply." With a lower effort set, Claude responds faster, and users will find their AI experiences are throttled less. Dynamic workflows At launch time, this feature hasn't been fully defined, but it's interesting. Launching as a research preview, Opus 4.8 can plan work, run hundreds of parallel subagents in one session, and verify outputs before reporting back. This feature is designed for very large-scale tasks. The example Anthropic gave was codebase-scale migrations across hundreds of thousands of lines. It seems like Claude can generate and manage the workflow as the task evolves. Rather than running off a fixed plan, agents can change their priorities and tasks based on what they find while doing their work. This could be powerful. Also: Anthropic's new Claude Security tool scans your codebase for flaws - and helps you decide what to fix first Anthropic said that the subagents verify their results before reporting back to users. If Claude is coordinating hundreds of subagents, users need it to notice uncertainty, bad assumptions, and failed outputs. Interestingly, this connects right back to the honesty claims discussed at the beginning of the article. If Claude is going to launch "thousands of agents," getting back reliable and vetted results really matters, because there's no way human oversight can keep up on its own. The dynamic workflows capability will be available to Claude Code users on Enterprise, Team, and Max plans. Price and availability Anthropic said Claude Opus 4.8 is available everywhere Thursday through Claude and the Claude API as claude-opus-4-8. In practice, especially if you're using Claude Code, you might find that you'll need to restart your session or wait a day or so for Claude Code to notice it. When Anthropic jumped Opus 4.6 to 4.7, I kept asking Claude Code what model it was using, and it wasn't until the next morning that it stopped reporting Opus 4.6 and started reporting Opus 4.7. Overall pricing hasn't changed since Opus 4.7. Regular token-based pricing remains $5 per million input tokens and $25 per million output tokens. Also: This exec offers 4 ways to be a successful innovator in the age of agentic AI The company said that fast mode, which enables the model to work at 2.5 times the speed of normal mode, will be "three times cheaper than it was for previous models." While I don't spend on fast mode, I do see the appeal. I've watched a lot of YouTube, waiting for Claude Code to respond to a prompt, hour after hour. Would you rather have Claude respond faster with lower effort or think longer with higher effort? Let us know in the comments below. You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
[3]
Anthropic upgrades Claude with new Opus 4.8 model, here's what's new
Anthropic has released its latest AI model with Claude Opus 4.8. The new version arrives less than two months after the previous model upgrade, ramping up Anthropic's upgrade cadence. Opus 4.8 is a 'more effective collaborator,' Anthropic says Anthropic describes Claude Opus 4.8 as having "sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors." "Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims," according to Anthropic. The company also notes that pricing remains the same between Claude Opus 4.7 and Opus 4.8. Here are some benchmarks between 4.7 and 4.8 models shared by Anthropic: * Agentic coding score increases from 64.3% to 69.2% * Multidisciplinary reasoning with tools jumps from 54.7% to 57.9% * Agentic computer use moves from 82.8% to 83.4% * Knowledge work score increases from 1753 to 1890 * Agentic financial analysis improves from 51.5% to 53.9% Anthropic says Opus 4.8 fast mode is roughly 2.5 times quicker now. The upgraded fast mode also costs three times less than before, the company says. In addition to Opus 4.8's release, Anthropic is launching three more updates today: * Dynamic workflows. This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. * Effort control in claude.ai and Cowork. A new control alongside the model selector lets users choose how much effort Claude puts into a response. * The Messages API now accepts system entries inside the messages array. Developers can update Claude's instructions mid-task without breaking the prompt cache or routing the update through a user turn. You can learn more much about these features and the new Claude Opus 4.8 upgrade from Anthropic's announcement. Anthropic's previous Opus model upgrade arrived on April 16 with the release of version 4.7. Today's release comes just six weeks later. The new model is available globally from today.
[4]
Claude Opus 4.8 just launched -- and Anthropic says it's far less likely to 'fake' answers
Anthropic has officially launched Claude Opus 4.8, the latest version of its flagship AI mode. With this rollout the company promises that it may have fixed one of AI's biggest flaws. According to Anthropic, Claude Opus 4.8 is designed to be more honest, more thoughtful and significantly less likely to confidently pretend it knows something when it doesn't. The company says Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it writes to pass without warning the user. At a time when AI companies are racing to make models faster, more agentic and more autonomous, Anthropic appears to be focusing on the one thing that seems to be ignored: AI should know when it might be wrong and admit it. Claude Opus 4.8 is being positioned as a better collaborator Anthropic says early testers described working with Opus 4.8 as feeling more like working with a real collaborator than previous models. According to the company, testers said the model asks better questions, catches its own mistakes and pushes back when plans don't make sense instead of simply agreeing with users. One of the biggest criticisms surrounding chatbots is that many of them have become "yes-men," often validating bad ideas, weak assumptions or outright incorrect information rather than challenging users. Anthropic appears to be leaning heavily into the opposite approach. The company also says Opus 4.8 showed major improvements in legal reasoning, coding, browser agents and long-form analysis tasks, with several early access partners claiming it outperformed previous Opus models and even GPT-5.5 in some agentic workflows. Users can now control how much the model thinks Alongside the new model, Anthropic is launching a new effort control system inside Claude. Users can now decide how much thinking Claude puts into a task. Higher effort modes allow the AI to spend more time reasoning through responses, while lower effort settings prioritize speed and lower token usage. Anthropic says Opus 4.8 defaults to "high effort" because it offers the best balance between quality and usability. Instead of chatbots instantly firing back responses, companies increasingly seem focused on making models pause, reason and verify information before answering. OpenAI, Google and Anthropic are all moving in this direction as AI systems become more autonomous and agent-driven. Claude can now run more dynamic workflows In addition to the new model, a new research preview feature called Dynamic Workflows was also announced. The feature allows Claude to launch hundreds of parallel subagents during a single task, verify the work and combine the results before responding back to the user. According to Anthropic, Claude Code can now handle massive codebase migrations involving hundreds of thousands of lines of code from start to finish. The future increasingly looks less like a chatbot answering one prompt at a time and more like autonomous systems quietly coordinating multiple AI processes behind the scenes. Anthropic hints at something even bigger coming next by revealing it's already working on "a new class of model with even higher intelligence than Opus." The company says some organizations are already testing a system called Claude Mythos Preview for cybersecurity work, though Anthropic claims models at that level will require stronger safeguards before broader release. Looking ahead At the same time AI companies continue pushing toward more powerful systems, they're also increasingly acknowledging that the next generation of models may carry risks requiring entirely new safety standards. It's clear that Anthropic is trying to position AI like a careful collaborator that questions itself, flags uncertainty and thinks longer before responding, a strategy that seems long overdue. Ironically, that restraint may end up becoming one of the most valuable AI features of all. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.
[5]
Anthropic releases new model, Opus 4.8
Why it matters: The new model still lags the performance of Mythos, Anthropic's most advanced, but the company says Mythos-class models are expected "in the coming weeks." Driving the news: Anthropic said Opus 4.8 outperformed competitors on a number of key benchmarks, including agentic coding, reasoning, financial analysis and knowledge work. * The model has a fast mode that allows it to work at 2.5x the speed while being three times cheaper than it was for previous models. * The company is launching other updates alongside the model, including a "dynamic workflow" that lets Claude run multiple subagents at once and a control panel that lets users decide how much "effort" Claude puts into a response. Zoom in: Opus 4.8 scored well on "prosocial" traits like supporting user autonomy and acting in the user's best interest, achieving a similar level to Mythos, Anthropic's most powerful model, the company said. * This kind of metric may get more attention after a study on AI agents running simulated towns went viral. It showed Claude's agents were the least likely among the models to commit crime. Between the lines: Customers are increasingly looking for ways to use AI more affordably. * This launch from Anthropic emphasizes tools that allow customers to tailor their usage to how much they want to spend. * High effort settings, for example, will "think" for longer, while lower effort settings provide quicker responses that take up fewer tokens, letting users hit rate limits more slowly. Yes, but: Opus 4.8 still isn't Mythos. * The company first released Mythos to just a handful of partners due to security concerns. * A Mythos-class model should be available to all customers "in the coming weeks" following the development of stronger safeguards, according to Anthropic. What we're watching: How model updates account for the increased customer focus on price going forward.
[6]
Anthropic Says Its Latest Claude Model Is the 'Most Honest' Yet
The newest AI model model to hit the market is advertised as being remarkably honest. Anthropic has released Claude Opus 4.8, the next version of its large-sized series of Claude AI models, and allegedly its most honest model yet. Along with the new model, Anthropic is releasing new features for Claude, including the ability for people to control how much effort the model puts into any given task. It is also teasing the imminent release of Claude Mythos, an even larger and more powerful version of Claude. The company says that one of the most prominent improvements to Opus 4.8 is its honesty. It's a well-known fact that AI models can hallucinate and jump to conclusions, but according to Anthropic, early Opus 4.8 testers reported that the new model was "more likely to flag uncertainties about its work and less likely to make unsupported claims." Anthropic didn't offer any explanations regarding how they made the new model so much more truthful. According to Anthropic's internal testing, Opus 4.8 has state-of-the-art coding abilities. In SWE-Bench Pro, a benchmark for judging the performance of AI-powered coding agents, Opus 4.8 scored a record 69.2 percent, compared with its predecessor Claude Opus 4.7's 64.3 percent and rival OpenAI's GPT-5.5's 58.6 percent.
[7]
Anthropic launches Claude Opus 4.8 with upgraded code and reasoning skills By Investing.com
Investing.com -- Artificial intelligence heavyweight Anthropic has officially rolled out its latest flagship model, Claude Opus 4.8, to users worldwide. The updated architecture builds directly upon the foundations of the previous Opus 4.7 iteration, delivering performance improvements across core technical benchmarks. This release introduces a suite of advanced features designed to grant developers and enterprise clients greater control over their computational workflows. Among the key upgrades is a new effort control setting on Claude.ai, which allows users to dynamically scale the model's depth of reasoning based on task complexity. Weigh what this news means for stocks by upgrading to InvestingPro - get 50% off today The safety and structural integrity of the software have also received significant attention in this generation. Internal alignment assessments indicate that Opus 4.8 is four times less likely than its predecessor to let coding errors pass unremarked. For large-scale enterprise engineering, the company is introducing a research preview of dynamic workflows within Claude Code. This specific feature enables the system to coordinate hundreds of parallel subagents to execute massive codebase migrations seamlessly. Financially, Anthropic is maintaining its existing pricing structure for standard usage to keep the technology accessible to its current developer base. However, the operational economics of the platform have improved, with the model's accelerated fast mode now running three times cheaper than previous versions. This major product milestone arrives at a critical juncture for the San Francisco-based artificial intelligence startup. The firm is reportedly closing a massive $30-plus billion pre-IPO funding round that could value the AI developer at over $900 billion. While leadership has not yet confirmed an official date to go public, the company is actively taking key steps to position itself for a public listing as early as 2026. This aggressive capital strategy comes as SpaceX and OpenAI similarly prepare to list their respective companies on public exchanges.
Share
Copy Link
Anthropic launched Claude Opus 4.8, positioning honesty as its standout capability. The AI model is four times less likely to let code flaws pass unremarked and flags uncertainties more readily. The release includes dynamic workflows that run hundreds of parallel subagents and an effort control system that lets users balance response quality with token usage.
Anthropic released Claude Opus 4.8 on Thursday, marking a significant shift in how AI companies approach model development. Rather than racing solely for speed and capability, Anthropic is prioritizing what it calls "honesty"—the ability of the AI model to acknowledge uncertainty and avoid making unsupported claims
1
. According to the company, this large language model addresses one of AI's most persistent problems: the tendency to confidently present work as making progress despite thin evidence2
.
Source: Tom's Guide
Early testers have found that Claude Opus 4.8 is more likely to flag uncertainties about its work and less likely to make claims it cannot support. In Anthropic's evaluations, the model is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked
4
. Tom Pritchard, a staff engineer at Spotify who tested the model, described Claude Opus 4.8 as having "noticeably better judgment," noting that "it asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes"2
.The AI model with enhanced honesty also delivers measurable improvements across multiple metrics. Agentic coding scores increased from 64.3% to 69.2%, while multidisciplinary reasoning with tools jumped from 54.7% to 57.9%
3
. Knowledge work scores rose from 1753 to 1890, and financial analysis capabilities improved from 51.5% to 53.9%3
. These gains suggest the model's debugging capabilities and analytical judgment have strengthened substantially since the previous version.
Source: Axios
Anthropic emphasized that Claude Opus 4.8 outperformed competitors on several key benchmarks, including agentic coding, reasoning, and knowledge work tasks
5
. The model also scored well on "prosocial" traits like supporting user autonomy and acting in the user's best interest, achieving a level similar to Mythos, Anthropic's most powerful model5
.Alongside the model release, Anthropic launched an effort control system in Claude.ai and Cowork, allowing users to direct how much computational effort Claude puts into a task
1
. Higher-effort responses use more tokens, enabling the AI to think more frequently and deeply. Lower-effort settings prioritize speed and help users avoid hitting rate limits as quickly2
. This approach reflects growing customer demand for more affordable AI usage options, as organizations seek ways to tailor their spending based on task complexity5
.The default high-effort setting in Claude Opus 4.8 offers what Anthropic describes as "the best overall balance of quality and user experience"
2
. Meanwhile, the fast mode now operates at 2.5 times the speed while costing three times less than previous models3
. Regular Opus pricing remains unchanged from version 4.73
.Anthropic introduced dynamic workflows in research preview, a feature that allows Claude to handle substantially larger projects. With this capability, Claude can plan work and run hundreds of parallel subagents in a single session, then verify outputs before reporting back to the user
1
. The company cited codebase migrations across hundreds of thousands of lines as an example use case2
.
Source: The Verge
What distinguishes this approach is that agents can adjust their priorities and tasks based on what they discover during execution, rather than following a fixed plan. The subagents verify their results before reporting back, which becomes critical when coordinating at scale where human oversight cannot keep pace
2
. This connects directly to the model's enhanced honesty features—when launching thousands of agents, reliable and vetted results matter significantly. Dynamic workflows will be available to Claude Code users on Enterprise, Team, and Max plans2
.Related Stories
The Messages API now accepts system entries inside the messages array, enabling developers to update Claude's instructions mid-task without breaking the prompt cache or routing updates through a user turn
3
. This technical improvement gives developers more flexibility in managing complex workflows and adjusting model behavior dynamically.While Claude Opus 4.8 represents a substantial upgrade, it still lags behind Mythos, Anthropic's most advanced model. The company revealed that some organizations are already testing Claude Mythos Preview for cybersecurity applications
4
. However, Anthropic states that models at that intelligence level require stronger AI safeguards before broader release4
. A Mythos-class model is expected to become available to all customers "in the coming weeks" following the development of enhanced safety measures5
.The release arrives just six weeks after Claude Opus 4.7 launched on April 16, indicating an accelerated upgrade cadence from Anthropic
3
. The model is available globally through Claude and the Claude API as claude-opus-4-82
. As AI systems become more autonomous and agent-driven, the emphasis on models that pause, reason, and verify information before responding may prove as valuable as raw capability gains.Summarized by
Navi
[4]
15 Apr 2026•Technology

06 Aug 2025•Technology

24 Nov 2025•Business and Economy

1
Policy and Regulation

2
Science and Research

3
Technology
