38 Sources
38 Sources
[1]
Anthropic says its new AI model "maintained focus" for 30 hours on multistep tasks
On Monday, Anthropic released Claude Sonnet 4.5, a new AI language model the company calls its "most capable model to date," with improved coding and computer use capabilities. The company also revealed Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK, which is a tool developers can use to build their own AI coding agents. Anthropic says it has witnessed Sonnet 4.5 working continuously on the same project "for more than 30 hours on complex, multi-step tasks," though the company did not provide specific details about the tasks. In the past, agentic models have been known to typically lose coherence over long periods of time as errors accumulate and context windows (a type of short-term memory for the model) fill up. In the past, Anthropic has mentioned that previous Claude 4.0 models have played Pokémon for over 24 hours or refactored code for seven hours. To understand why Sonnet exists, you need to know a bit about how AI language models work. Traditionally, Anthropic has produced three differently sized AI models in the Claude family: Haiku (the smallest), Sonnet (mid-range), and Opus (the largest). Anthropic last updated Haiku in November 2024 (to 3.5), Sonnet this past May (to 4.0), and Opus in August (to 4.1). Model size in parameters, which are values stored in its neural network, is roughly proportional to overall contextual depth (the number of multidimensional connections between concepts, which you might call "knowledge") and better problem-solving capability, but larger models are also slower and more expensive to run. So AI companies always seek a sweet spot in the middle with reasonable performance-cost trade-offs. Claude Sonnet has filled that role for Anthropic quite well for several years now. Claude is popular with some software developers thanks to Claude Code, and Anthropic is confident about the latest version of Sonnet's coding capability: "Claude Sonnet 4.5 is the best coding model in the world," the company boasts on its website. "It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains in reasoning and math." Anthropic backs up those claims with strong benchmark performance. Sonnet 4.5 model achieved a reported 77.2 percent score on SWE-bench Verified, a benchmark that attempts to measure real-world software coding abilities, and it currently leads the OSWorld benchmark at 61.4 percent, which tests AI models on real-world computer tasks. That beats OpenAI's GPT-5 Codex (which scored 74.5 percent) and Google's Gemini 2.5 Pro (67.2 percent). In other testing, Claude Sonnet 4.5 showed gains across multiple other evaluations such as AIME 2024, a mathematics competition benchmark, and MMMLU, which tests subject knowledge across 14 non-English languages. On finance-specific tasks measured by Vals AI's Finance Agent benchmark, which is a relatively new benchmark that "tests the ability of agents to perform tasks expected of an entry-level financial analyst," Sonnet 4.5 scored 92 percent. Sonnet 4.5 also reportedly demonstrated improved computer use capabilities compared to its predecessor in testing. Four months ago, Claude Sonnet 4 scored 42.2 percent on OSWorld. The new version increases that score to 61.4 percent. Anthropic uses these capabilities in its Claude for Chrome extension. Similar to OpenAI's ChatGPT Agent. Claude's extension can navigate websites, fill spreadsheets, and complete other browser-based tasks with various degrees of success. As always, it's worth noting that AI benchmarks can be gamed easily, poorly designed, or suffer from dataset contamination (a scenario where the model is inadvertently trained on answers in the benchmark). So always take any benchmarks with a grain of salt until they are independently verified. Even with a skeptical eye on the self-reported numbers, it seems that Sonnet 4.5 represents a solid step up from 4.0, and given Anthropic's history of delivering more capable models over time, we have no particular reason to doubt that. Simon Willison, a veteran software developer and frequent source of independent expert perspective on AI models for Ars Technica, wrote about Sonnet 4.5 on his blog today. He seems generally impressed: "Anthropic gave me access to a preview version of a 'new model' over the weekend which turned out to be Sonnet 4.5," he wrote. "My initial impressions were that it felt like a better model for code than GPT-5-Codex, which has been my preferred coding model since it launched a few weeks ago. This space moves so fast -- Gemini 3 is rumored to land soon so who knows how long Sonnet 4.5 will continue to hold the 'best coding model' crown." Claude 4.5 is available everywhere today. Through the API, the model maintains the same pricing as Claude Sonnet 4, at $3 per million input tokens and $15 per million output tokens. Developers can access it through the Claude API using "claude-sonnet-4-5" as the model identifier. Other new features Some ancillary features of the Claude family got some upgrades today, too. For example, Anthropic added code execution and file creation directly within conversations for users of Claude's web interface and dedicated apps. Along those lines, users can now generate spreadsheets, slides, and documents without leaving the chat interface. The company also released a five-day research preview called "Imagine with Claude" for Max subscribers, which demonstrates the model generating software in real time. Anthropic describes it as "a fun demonstration showing what Claude Sonnet 4.5 can do" when combined with appropriate infrastructure. As mentioned above, the command-line development tool Claude Code also received several updates today, alongside the new model. The company added checkpoints that save progress and allow users to roll back to previous states, refreshed the terminal interface, and shipped a native VS Code extension. The Claude API also gains a new context editing feature and memory tool for handling longer-running agent tasks. Right now, AI companies are particularly clinging to software development benchmarks as proof of AI assistant capability because progress in other fields is difficult to objectively measure, and it's a domain where LLMs have arguably shown high utility compared to other fields that might suffer from confabulations. But people still use AI chatbots like Claude as general assistants. And given the recent news about troubles with some users going down fantasy rabbit holes with AI chatbots, it's perhaps more notable than usual that Anthropic claims that Claude Sonnet 4.5 shows reduced "sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking" compared to previous models. Sycophancy, in particular, is the tendency for an AI model to praise the user's ideas, even if they are wrong or potentially dangerous. We could quibble with how Anthropic frames some of those AI output behaviors through a decidedly anthropomorphic lens, as we have in the past, but overall, attempts to reduce sycophancy are welcome news in a world that has been increasingly turning to chatbots for far more than just coding assistance.
[2]
Anthropic launches Claude Sonnet 4.5, its best AI model for coding | TechCrunch
On Monday, Anthropic launched a new frontier model called Claude Sonnet 4.5, which it claims will state-of-the-art on coding benchmarks. The company says Claude Sonnet 4.5 is capable of building "production-ready" applications, a leap in reliability from previous AI models. Claude Sonnet 4.5 will be available via the Claude API and in the Claude.ai chatbot. The pricing for developers is the same as Claude Sonnet 4: $3 per million input tokens (roughly 750,000 words, or more than the entire Lord of The Rings series) and $15 per million output tokens. In the last year, Anthropic's AI models have emerged as a favorite among developers and enterprises, in large part due to their strong performance on software engineering tasks. Apple and Meta reportedly use Claude AI models internally, and Anthropic has made a significant business selling API access to AI coding applications such as Cursor, Windsurf, and Replit. Recently, OpenAI's GPT-5 has challenged Anthropic's dominance in the space, outperforming Claude models on a variety of coding benchmarks. Anthropic says Claude Sonnet 4.5 offers industry-leading performance on several coding benchmarks, including SWE-Bench Verified. However, Anthropic AI researcher David Hershey tells TechCrunch that it is hard to capture Claude Sonnet 4.5's performance on benchmarks alone. Hershey says he's seen Claude Sonnet 4.5 code autonomously for up to 30 hours during early trials with some enterprise customers. In that time, he watched the AI model not only build an application, but stand up database services, purchase domain names, and perform a SOC 2 audit to make sure the product was secure. In a statement shared with TechCrunch Cursor CEO Micheal Truell said Claude Sonnet 4.5 represents state-of-the-art coding performance, specifically on longer horizon tasks. Windsurf CEO Jeff Wang said in a statement that Claude Sonnet 4.5 represents a "new generation of coding models." Anthropic also claims that Claude Sonnet 4.5 is its most aligned frontier AI model yet, with lower rates of sycophancy and deception than previous models. The company says it has also improved Claude's susceptibility to prompt injection attacks. Alongside the launch of Claude Sonnet 4.5, Anthropic is also launching the Claude Agent SDK. The company says this is the same infrastructure that powers Claude Code, and can be used to help developers build their own agents. Anthropic is also releasing a temporary research preview called "Imagine with Claude" for Max subscribers, which shows the AI model generating software on the fly. The company says the model will respond to user requests in real time, with no predetermined functionality or prewritten code.
[3]
Anthropic's New Claude Sonnet 4.5 AI Model Promises to Be a Coding Beast
Imad is a senior reporter covering Google and internet culture. Hailing from Texas, Imad started his journalism career in 2013 and has amassed bylines with The New York Times, The Washington Post, ESPN, Tom's Guide and Wired, among others. Claude Sonnet 4.5 is out today and brings major coding improvements, including checkpoints, code execution, file creation and a refreshed terminal to the AI model, Anthropic said in a press release on Monday. Claude Code gains a much-requested feature with the addition of checkpoints, allowing coders to save their progress or roll back to a previous state. Claude can now execute code and create files, such as spreadsheets, slides and documents. On the agent side, the Claude API lets agents run longer and handle more complex tasks. And with the Claude Agent SDK, developers can make their own AI agents that can better manage memory, handle permissions and work with subagents to solve tasks. Don't miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source. Claude Sonnet 4.5 is the "most aligned frontier model we've ever released," according to Anthropic. This means that Sonnet 4.5 has seen major improvements in "sycophancy, deception, power-seeking and the tendency to encourage delusional thinking." Anthropic says it's also made "considerable progress" in defending against prompt injection attacks, when bad actors use specially crafted language to trick a model into doing things it wasn't meant to do. "Claude Sonnet 4.5 resets our expectations -- it handles 30+ hours of autonomous coding, freeing our engineers to tackle months of complex architectural work in dramatically less time while maintaining coherence across massive codebases," Sean Ward, CEO of iGent AI, said in a press release. Claude Sonnet 4.5 comes as the AI race is heating up. While much attention has been given to OpenAI's ChatGPT and Google's Gemini, players like Anthropic, too, have been pushing AI technology forward. Fans appreciate Claude for its coding ability and the chatbot's conversational nature. In GDPval, a benchmarking tool made by OpenAI, Claude Opus 4.1 was the most performant model, beating GPT-5. It could be why OpenAI was caught using Claude Code and subsequently had its access removed for violating Anthropic's terms of service. OpenAI responded by saying it's standard in the industry to evaluate competing models for accuracy and safety and that its API would still be made available to Anthropic. In August, the two companies announced the results of a joint exercise in which each company evaluated the other's models. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) As Anthropic continues to excel in certain areas, it's raising billions in capital. Recently, Anthropic completed its series F fundraising round of $13 billion and is now valued at $183 billion. Anthropic also settled a $1.5 billion lawsuit with authors for illegally pirating their work earlier this month. OSWorld, a tool that tests how AI models perform in real-world computer tasks, benchmarked Sonnet 4.5 at 61.4%, whereas Sonnet 4 was 42.2% four months prior. The Claude for Chrome extension, which is currently available to those who signed up for the waitlist last month, takes advantage of Sonnet 4.5's agentic improvements.
[4]
Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy
Anthropic's latest AI model spent 30 hours running by itself to code a chat app akin to Slack or Teams. It spat out about 11,000 lines of code, according to Anthropic, and it only stopped running when it had completed the task. The model, Claude Sonnet 4.5, was announced today, and its ability to operate autonomously for 30 hours straight is a huge jump forward. Before, the company's Opus 4 model made headlines in May for its ability to operate for seven hours. It's all a significant step in Anthropic's battle to corner the market on both AI agents and AI coding. The company called Claude Sonnet 4.5 "the best model in the world for real-world agents, coding, and computer use" and said it "leads the market at using computers," referencing the Computer Use feature Anthropic debuted nearly a year ago. The new model is particularly adept in fields like cybersecurity, financial services, and research, according to Anthropic. One of its beta-testers, Canva, said the new model helped with "complex, long-context tasks -- from engineering in our codebase to in-product features and research." Anthropic, OpenAI, Google, and other companies have been continuously releasing incremental updates and features that allow their technology to act as an assistant both for consumers (researching topics, scheduling meet-ups, and looking up flights) and for enterprise and developer use (creating slide decks, helping with coding tasks, and analyzing spreadsheets). The battle for attention and reliance heats up nearly every month, if not every week. Days ago, OpenAI announced Pulse, its newest ChatGPT feature designed to be part of users' morning routines and research topics relevant to their days. Anthropic also said the new model would be paired with other updates to help developers code their own AI agents. "We're combining the launch of the model with access to virtual machines, memory, context management, and multi-agent support," the company wrote in a release. "This essentially packages the same building blocks that power Claude Code - enabling developers to build their own cutting-edge agents." Dianne Penn, a head of product management at Anthropic, told The Verge in an interview that the model's improvements in its computer use capabilities surprised even her. Claude Sonnet 4.5 is more than three times as skilled at navigating a browser and using a computer compared to Anthropic's tech from last October. Penn said the team had received feedback from early-access customers -- "the GitHubs and Cursors of the world" -- and spent the past month working intensively on the model. Scott White, product lead for Claude.ai, told The Verge that the new model operates at "chief-of-staff level" and can find availability between multiple peoples' calendars and schedule a meeting, look at a data dashboard and pull together insights, write status updates based on one-on-one meetings with his direct reports, and more. Neither White nor Penn had yet tried vibe-coding with the new model when The Verge spoke to them. But Penn said she uses Claude Sonnet 4.5 for hiring potential new team members at Anthropic. "It's been actually really helpful to have a continuous running prompt that I use of, 'Do a deep web search, come up with like these parameters for profiles to source for certain types of roles on my team,'" Penn said. "That's been really, really helpful. And I've seen the Sonnet 4.5 just do even better than in the past, on the quality and the depth of the searches and actually generating a spreadsheet with LinkedIn profiles so then I can email them."
[5]
Claude Sonnet 4.5 could be your next breakthrough coding tool - how to access it today
Anthropic's latest model, Claude Sonnet 4.5, is here. It scored very highly on coding benchmarks. Claude Code also got long-awaited upgrades. Anthropic's coding tools have become well-regarded amongst developers, with its Claude 4 Sonnet model, released in May, serving as a free and reliable coding assistant for many. Just months later, Anthropic has released its next-generation model, featuring upgrades to its performance across the board. Also: Anthropic's free Claude Sonnet 4 aced my coding tests - but its paid Opus model somehow didn't Claude Sonnet 4.5 is now available, the company said Wednesday, claiming it is the "best coding model in the world" as well as the best model for building complex agents, utilizing computers, and gaining reasoning and mathematical capabilities. Anthropic also launched updates across its Claude Code offering, Claude for Chrome extension, and more. If you have been closely tracking Anthropic's releases, you may recall that in May, Claude Opus 4 and Sonnet 4 scored highest amongst frontier models on the industry-standard software engineering benchmark test (SWE-bench), which evaluates LLMs' abilities to solve real-world software engineering tasks sourced from GitHub. Claude Opus 4.1, released in August, surpassed it. Now, Claude Sonnet 4.5 has lapped that last model, outperforming it on the SWE-bench Verified evaluation, a human-filtered subset of the SWE-bench. Claude Sonnet 4.5 also outperformed leading models from competitors, including GPT-5 Codex, GPT-5, and Gemini 2.5 Pro. Anthropic said that on the SWE-bench Verified, Sonnet 4.5 held its focus for more than 30 hours on complex, multi-step tasks. This capability is specifically useful for agentic tasks, which oftentimes require solo work in the background for extended periods of time. Also: I got 4 years of product development done in 4 days for $200, and I'm still stunned Other improvements include its performance on computers, as indicated by its score on the OSWorld benchmark, which tests the performance of AI models on real-world computer tasks: "Sonnet 4.5 now leads at 61.4%. Just four months ago, Sonnet 4 held the lead at 42.2%," Anthropic noted in the release. The Claude for Chrome extension, now rolled out to everyone who joined the waitlist last month, uses these capabilities. Anthropic also saw improvements across math and reasoning. Also: I teamed up two AI tools to solve a major bug - but they couldn't do it without me Claude Sonnet 4.5 is also Anthropic's "most aligned" frontier model yet, according to the company. This means it's the model that adheres most closely to humans' instructions and intended use cases, and that has reduced instances of behaviors such as sycophancy and deception. The model is also better at resisting prompt injection attacks and has AI Safety Level 3 (ASL-3) protections on Anthropic's model framework. Claude Sonnet 4.5 is available everywhere, including in the Claude.ai chatbot. Of course, developers and professionals can access the new model in the API and Claude Code, and for the same price as Sonnet 4. Anthropic also upgraded its other coding offerings, starting with Claude Code, which now has checkpoints that allow users to save progress and revisit a previous state. It also has what Anthropic is calling a "refreshed" terminal interface and a native VS Code extension. Also: Researchers from OpenAI, Anthropic, Meta, and Google issue joint AI safety warning - here's why Anthropic also launched the Claude Agent SDK, which is the same infrastructure that powers Claude Code, allowing developers to build their own agents with it. The Claude Code API has introduced a new context editing feature and a memory tool that enables agents to work more efficiently and tackle more complex problems, according to the company. The company also upgraded Claude apps so they can execute code and create files in chat.
[6]
Anthropic Debuts Claude Sonnet 4.5 With More Coding, Less 'Deception'
Emily is an experienced reporter who covers cutting-edge tech, from AI and EVs to brain implants. She stays grounded by hiking and playing guitar. Anthropic released a new AI model, Claude Sonnet 4.5, with stronger coding abilities and new features to streamlines the process. It's "the best coding model in the world," Anthropic says. The company also also referred to its last model as "the world's best" for coding. But, unsurprisingly, it says the newest one is a cut above. A major new feature is "checkpoints," which allows programmers to save their progress and revert to a previous version. Other additions include a new terminal interface, context adjusting capabilities, and file creation (spreadsheets, slides, and documents) without leaving the chat window, which is now available with all paid plans. Developers can create their own AI agents using the Claude Agent SDK. "The infrastructure that powers our frontier products -- and allows them to reach their full potential -- is now yours to build with," says Anthropic. In the company's internal evaluations, Sonnet 4.5 earned a score of 77.2% on the "agentic coding," or self-directed coding, test. That's compared to a slightly lower 74.5% for its predecessor Opus 4.1 and Codex, the programming tool within OpenAI's GPT-5. But remember, these are internal benchmarks. It's always worth trying multiple tools to find the one that's best for you. Happily, Anthropic did not increase the price of Sonnet 4.5 through the developer API, which remains the same as Sonnet 4, at $3/$15 per million tokens. Beyond coding, Anthropic is positioning Claude as the go-to chatbot for all workplace tasks. That differentiates it from ChatGPT, which is used for non-work-related conversations over 70% of the time, according to an OpenAI study released this month. Anthropic lists financial services, cybersecurity, and law as other fields its chatbot excels in. Claude Sonnet 4.5 "creates presentations, spreadsheets, and PDFs you'd actually be proud to share with your boss or clients and sharpens your thinking on complex problems," Anthropic says. The AI can even use your computer for you, performing simple tasks like navigating websites and filling out spreadsheets. It works in Google Chrome with an extension, which is now available for those with a Max plan ($100 to $200-per-month) to sign up for through a waitlist. However, these capabilities are still nascent and flawed, just like agentic coding. Anthropic says it bolstered Claude's defenses against prompt injection attacks when in computer use mode, a type of cyberattack that represents "one of the most serious risks." Claude, like all chatbots, can also be kind of a jerk. It's prone to "sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking," in certain scenarios. It's known to do things like "praise obviously-terrible business ideas" and confirm to users that they are indeed the Matrix, according to the system card for Sonnet 4.5. Anthropic says Sonnet 4.5 is its least likely model to engage in these behaviors, and that it expects it to be "much more direct and much less likely to mislead users than any recent popular large language model (LLM)." When OpenAI and Anthropic evaluated each other's models over the summer, OpenAI reported Claude was less likely to engage in sycophantic and harmful behaviors than ChatGPT. Over the past year, Claude has emerged as a favorite LLM among individuals and businesses (not book authors). Apple and Meta reportedly use Claude internally, TechCrunch reports. You may start to see Anthropic advertisements on streaming platforms like Netflix and Hulu, and at live sporting events as well since company launched its first major advertising campaign this fall, AdWeek reports.
[7]
Anthropic Says New Model Can Code On Its Own for 30 Hours Straight
Anthropic is releasing a new artificial intelligence model that is designed to code longer and more effectively than prior versions, its latest attempt to stay ahead of rivals like OpenAI in offering tools for software developers. The new model, called Claude Sonnet 4.5, is better at following instructions and can code on its own for up to 30 hours straight, the company said on Monday. By comparison, a previous model called Claude Opus 4 is said to be able to field coding tasks for up to seven hours by itself. The updated version of Sonnet is also intended to excel at using a person's computer to take actions for them, improving on a feature Anthropic introduced a year ago. Anthropic has been an early leader in building so-called AI agents that field complex tasks on a user's behalf, particularly for streamlining the process of writing and debugging code. The company, now valued at $183 billion, reached $5 billion in run-rate revenue in August, fueled in part by traction for its coding software. But other companies, including OpenAI and Alphabet Inc.'s Google, are also vying to win over programmers with similar capabilities. Anthropic's latest release comes a week before OpenAI is set to hold its annual developer event. Jared Kaplan, Anthropic's co-founder and chief science officer, said Sonnet 4.5 is "stronger in almost every way" than its most recent high-end Opus model. Anthropic is also working to build a better version of the Opus model, which he expects will likely come out later this year. "We get benefits from having usage at both model sizes."
[8]
Anthropic launches Claude 4.5, touts better abilities, targets business customers
Sept 29 (Reuters) - Anthropic unveiled the Claude 4.5 AI model on Monday, saying the newest version can code for longer uninterrupted stretches and handle finance and scientific tasks better, as the startup pushes deeper into enterprise AI. The Alphabet (GOOGL.O), opens new tab and Amazon.com-backed (AMZN.O), opens new tab AI startup is racing rivals to build models that can reliably operate software and complete multi-step work, key for AI agents, which can perform tasks on behalf of humans. The Sonnet 4.5 model created a web app from scratch in internal tests, and one customer had the AI chatbot code autonomously for 30 hours, up from a seven-hour run achieved by Anthropic's earlier Claude Opus 4 for a different client, Chief Product Officer Mike Krieger said. Anthropic is targeting power users and business customers rather than chasing a viral consumer moment, he said. Claude 4.5 is stronger at finance and scientific reasoning and better at using computers, scoring about 60% on a benchmark that tests operating-system dexterity versus roughly 40% for prior models, the company said. "It's a lot more visceral when you just see the model using a computer the way a person does if you're not a coder," said Chief Science Officer Jared Kaplan. Separately on Monday, Microsoft said it would add new Microsoft 365 Copilot features powered by Anthropic models, including "Agent Mode" in Excel and Word and an "Office Agent" in Copilot chat, with PowerPoint to follow. Microsoft last week said it would bring Anthropic's models to Microsoft 365 Copilot to diversify beyond longtime partner OpenAI. Anthropic, founded by former OpenAI executives, has positioned Claude for workplace use with guardrails it says reduce risky outputs. The company has been marketing Claude's coding and data-analysis skills to regulated industries and teams that want models to work across multiple software tools. Krieger said the company's focus is on sustained, reliable performance over long tasks rather than short demos. Reporting by Jeffrey Dastin, Deepa Seetharaman in San Francisco and Akash Sriram in Bengaluru; Editing by Anil D'Silva Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Artificial Intelligence Jeffrey Dastin Thomson Reuters Jeffrey Dastin is a correspondent for Reuters based in San Francisco, where he reports on the technology industry and artificial intelligence. He joined Reuters in 2014, originally writing about airlines and travel from the New York bureau. Dastin graduated from Yale University with a degree in history. He was part of a team that examined lobbying by Amazon.com around the world, for which he won a SOPA Award in 2022.
[9]
Claude Sonnet 4.5 is Anthropic's safest AI model yet
In May, Anthropic announced two new AI systems, Opus 4 and Sonnet 4. Now, less than six months later, the company is introducing Sonnet 4.5, and calling it the best coding model in the world to date. Anthropic's basis for that claim is a selection of benchmarks where the new AI outperforms not only its predecessor but also the more expensive Opus 4.1 and competing systems, including Google's Gemini 2.5 Pro and GPT-5 from OpenAI. For instance, in OSWorld, a suite that tests AI models on real-world computer tasks, Sonnet 4.5 set a record score of 61.4 percent, putting it 17 percentage points above Opus 4.1. At the same time, the new model is capable of autonomously working on multi-step projects for more than 30 hours, a significant improvement from the seven or so hours Opus 4 could maintain at launch. That's an important milestone for the type of agentic systems Anthropic wants to build. Perhaps more importantly, the company claims Sonnet 4.5 is its safest AI system to date, with the model having undergone "extensive" safety training. That training translates to a chatbot Anthropic says is "substantially" less prone to "sycophancy, deception, power-seeking and the tendency to encourage delusional thinking" -- all potential model traits that have landed OpenAI in hot water in recent months. At the same time, Anthropic has strengthened Sonnet 4.5's protections against prompt injection attacks. Due to the sophistication of the new model, Anthropic is releasing Sonnet 4.5 under its AI Safety Level 3 framework, meaning it comes with filters designed to prevent potentially dangerous outputs related to prompts around chemical, biological and nuclear weapons. With today's announcement, Anthropic is also rolling out quality of life improvements across the Claude product stack. To start, Claude Code, the company's popular coding agent, has a refreshed terminal interface, with a new feature called checkpoints included. As you can probably guess from the name, they allow you to save your progress and roll back to a previous state if Claude writes some funky code that isn't quite working like you imagined it would. File creation, which Anthropic began rolling out at the start of the month, is now available directly in conversations with the chatbot, and if you joined the waitlist Claude for Chrome, you can start using the extension today.
[10]
Anthropic launches Claude Sonnet 4.5, its latest AI model that's 'more of a colleague'
Anthropic on Monday announced its latest artificial intelligence model: Claude Sonnet 4.5. The model is better at coding, using computers and meeting practical business needs, and it excels in specialized fields like cybersecurity, finance and research, Anthropic said. The Amazon-backed startup, which is valued at $183 billion, is making Claude Sonnet 4.5 available to all users. Anthropic said Claude Sonnet 4.5 is the "best coding model in the world" according to industry benchmarks like SWE-bench Verified, a test set that measures an AI system's software coding abilities. "People are just noticing with this model, because it's just smarter and more of a colleague, that it's kind of fun to work with it when encountering problems and fixing them," Jared Kaplan, Anthropic's co-founder and chief science officer, told CNBC in an interview. The model generates higher-quality code, is better at identifying code improvements and can follow instructions more reliably, the company said.
[11]
Anthropic launches Claude Sonnet 4.5 with longer coding sessions and enhanced safety
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. What just happened? Anthropic unveiled a major upgrade to its AI toolset this week with Claude Sonnet 4.5, a new model designed to handle longer and more complex programming tasks while offering significant improvements in instruction-following and real-world business applications. The rollout underscores Anthropic's push to maintain its presence in the increasingly crowded market for AI-powered developer tools, where it faces rising competition from industry leaders such as OpenAI and Google. According to Anthropic, Claude Sonnet 4.5 can maintain autonomous coding sessions for up to 30 hours - a substantial increase over the company's previous Claude Opus 4 model, which supported roughly seven uninterrupted hours. The firm claims that Sonnet 4.5 is stronger "in almost every way" compared with earlier versions, offering improvements not only in task persistence and execution speed, but also in the range of tasks it can perform on a user's system. Key upgrades include checkpoints within the Claude Code tool, enabling developers to snapshot and revert their coding progress, a refreshed terminal interface, and an official Visual Studio Code extension for seamless integration with popular development environments. The updated Claude API introduces context editing and enhanced memory management, allowing the AI to handle longer, more complex requests without losing track or slowing down. For end users, new capabilities include direct code execution and file creation such as spreadsheets, presentations, and documents, directly within conversational workflows. Anthropic is positioning Claude Sonnet 4.5 as a "frontier model," highlighting its leading performance on industry benchmarks. On the SWE-bench Verified test, which evaluates models' ability to solve real-world coding problems, Sonnet 4.5 sets the current standard. Meanwhile, on OSWorld, a benchmark assessing AI proficiency in practical computer tasks, the model scores 61.4 percent, a significant improvement over the earlier Sonnet 4's 42.2 percent. This version of Claude also incorporates enhanced defenses against common vulnerabilities in AI agent deployments, such as prompt injection attacks. In addition, Anthropic reports improvements in alignment, a measure of how consistently an AI system behaves as intended. Executive summaries and public-facing system cards now include results from safety and alignment tests including mechanistic interpretability, demonstrating reductions in undesirable behaviors such as sycophancy, deception, and power-seeking. Developers can now access the Claude Agent SDK, the same infrastructure used by Anthropic's own teams to build and scale agentic tools. The SDK provides resources for memory management, user permission systems, and coordination among sub-agents. Claude Sonnet 4.5 is available immediately via the Claude API, retaining the pricing model of its predecessor. Anthropic executives noted that further improvements are forthcoming, with additional Opus model updates expected later this year.
[12]
Claude 4.5 just launched -- 7 prompts that show what it can really do
Anthropic's new Claude 4.5 just launched, and Anthropic claims it's the company's most powerful AI model yet. With the ability to run for 30+ hours on autonomous coding tasks, Claude 4.5 promises to be a powerhouse model for everything from coding to nuanced reasoning. And while the benchmarks are impressive, sometimes the best way to test what a new model can do, is to give it a try yourself. The new Claude is free for all users, although subscribers get more tokens to use the model longer. So, let's see just how impressive this new model is with 7 prompts that show off what Claude 4.5 can (and can't) do right now. Prompt: "Write a simple budgeting app in Python that lets me input expenses, categories, and shows a weekly summary." In less than two minutes, Claude 4.5 created a useful budgeting app that fulfilled all the prompt requests. Simple, yes, but it gets the job done. It's clear that based on the speed and usability of this app, Claude 4.5 has made big gains in coding and can scaffold usable software quite quickly. Prompt: "Plan a 7-day European itinerary with train travel only, balancing cost, culture, and family-friendly activities." In lightning speed, Claude 4.5 knocked out a comprehensive and completely useable travel itinerary with a variety of family-friendly activites and a cost breakdown. Anthropic claims the new model shines when it comes to multi-step reasoning, and in this test, it did not disappoint. It balanced all variables with ease. Prompt: "Here's a broken code snippet [paste code]. Debug it, explain what was wrong, and suggest two alternative fixes." This broken code was an obvious fix, but judging by how fast Claude 4.5 generated multiple solutions in seconds, I know this will come in handy for vibe coding. Whether you're just starting with vibe coding or have been coding for years, Claude 4.5 seems like it could make this kind of work smoother and more productive, especially with the ability to run longer "autonomous sessions." Prompt: "Explain how to set up a home Wi-Fi mesh network with three routers, step by step, including diagrams in ASCII." Testing this prompt was a jaw-dropping moment for me. You'd think by now I'd be over the speed, but it's truly unreal just how quickly Claude 4.5 generates the response and with incredible accuracy. One great thing about speed, other than the obvious of not waiting, the faster a model can generate a response, the less of a hit the environment takes. The step-by-step instructions are detailed and easy to follow, which makes me want to remember Claude next time I need something broken down in simpler steps. Prompt: "Pretend you're a film director. Pitch me a 3-scene short film about humans teaching AI how to dance." I laughed out loud at the responses. Claude 4.5 knocked this one out of the park with funny, interesting and thought-provoking film ideas. I recently wrote about how AI creativity is basic math, but I have to say, Claude feels fresher than other chatbots. It can handle storytelling with style and emotion. I wish I could include the entire response here, but am limited for space. I encourage you to try this prompt on either this idea or one of your own. I already know what I'll be doing this evening. Prompt: "Solve this: A factory produces 120 widgets in 4 hours with 6 machines. How many machines are needed to produce 900 widgets in 10 hours?" Whether you're a math whiz or struggle with it like me, Claude 4.5 could be the ultimate sidekick for working through quick word problems. It breaks down the steps and shows the work, which I find helpful. As a mom of a fifth grader, who hasn't had to think through a math word problem in years, I'm a little rusty and might need to use Claude to keep up with him. Prompt: "Act as if you're navigating a desktop. Open a folder, create a file called draft.txt, add the line 'Hello Claude 4.5' and show me the file tree." I just had to try this prompt since the model is touted as more capable with "computer use." This prompt supports this claim. I can think of several uses where this prompt could support productivity and efficiency such as training new hires, assisting older computer users and cleaning up files. These seven different prompts showcase the wide range of abilities from Anthropic's Claude 4.5, Is it the smartest model yet? I encourage you to give these prompts a test as well as some of your own to see if the hype holds up. I'm impressed so far and plan to keep testing it. But one thing is clear, Anthropic is pushing Claude toward a future of faster, smarter and more autonomous agents. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!
[13]
Anthropic's Claude Sonnet 4.5 can work autonomously for 30 hours
Why it matters: To act as an agent, AI models must sustain work on a single task for hours -- something many earlier models couldn't do. Driving the news: The new version of Claude can work for 30 hours or more on its own, a big step up from the seven hours of autonomous work with Claude Opus 4. * Beyond math and coding, where Claude has previously excelled, Sonnet 4.5 is strong on tasks requiring research and diligence, Scott White, a product lead at Anthropic, told Axios. * The company offered a variety of benchmarks and customer comments touting the power and performance of the new model, which is priced the same as its predecessor, Claude Sonnet 4.0. * Anthropic will give developers access to Claude Code's building blocks -- virtual machines, memory and context management -- to make it easier to create Claude-powered agents. The big picture: Anthropic said the rapid progress, marked by major Sonnet updates in February and May, shows a pattern where every six months its new model can handle tasks that are twice as complex. * "This is a continued evolution on Claude, going from an assistant to more of a collaborator to a full, autonomous agent that's capable of working for extended time horizons," White said. Between the lines: Anthropic also claims that the new Sonnet is the company's "most-aligned" model yet.
[14]
Anthropic's new Claude can code for 30 hours. Think of it as your AI coworker
Anthropic launched Claude Sonnet 4.5 on Monday, positioning the artificial intelligence model as "the best coding model in the world" in a direct challenge to OpenAI's recently released GPT-5, as the two AI giants battle for dominance in the lucrative enterprise software development market. The San Francisco-based startup claims its newest model achieves state-of-the-art performance on critical coding benchmarks, scoring 77.2% on SWE-bench Verified -- a rigorous software engineering evaluation -- compared to GPT-5's performance. More remarkably, Anthropic says Claude Sonnet 4.5 can maintain focus on complex, multi-step tasks for more than 30 hours, a dramatic leap in AI's ability to handle sustained work. "Sonnet 4.5 achieves 77.2% on SWE-bench Verified (82% with parallel test-time compute). It is SOTA," an Anthropic spokesperson told this reporter, using industry shorthand for "state of the art." The company also highlighted the model's 50% score on Terminal-bench, another coding benchmark where it claims leadership. The announcement follows mounting pressure from OpenAI's recent advances and pointed criticism from high-profile figures like Elon Musk, who recently posted on X.com that "winning was never in the set of possible outcomes for Anthropic." When asked about Musk's statement, Anthropic declined to comment. The release arrives just seven weeks after OpenAI's GPT-5 launch in August, underscoring the breakneck pace of competition in artificial intelligence as companies race to capture enterprise customers increasingly relying on AI for software development. The timing is particularly noteworthy as Anthropic grapples with questions about its heavy dependence on just two major customers. Anthropic dominates coding market despite customer concentration risks The competition centers on a market that has emerged as AI's first major profitable use case beyond chatbots. Anthropic commands 42% of the code generation market -- more than double OpenAI's 21% share -- according to a Menlo Ventures survey of 150 enterprise technical leaders. That dominance has translated into remarkable financial performance, with the company reaching a $5 billion revenue run rate earlier this year. However, industry analysis reveals that coding applications Cursor and GitHub Copilot drive approximately $1.4 billion of Anthropic's revenue, creating a potentially dangerous customer concentration that could leave the company vulnerable if either relationship falters. "Our run-rate revenue has grown significantly, even when you exclude these two customers," the Anthropic spokesperson said, pushing back on concerns about customer concentration. The company provided supportive quotes from both Cursor CEO Michael Truell and GitHub Chief Product Officer Mario Rodriguez praising Claude Sonnet 4.5's performance. The new model achieves significant advances in computer use capabilities, scoring 61.4% on OSWorld, a benchmark that tests AI models on real-world computer tasks. Just four months ago, Claude Sonnet 4 held the lead at 42.2%, demonstrating rapid improvement in AI's ability to interact with software interfaces. OpenAI's aggressive pricing strategy threatens Anthropic's premium positioning Anthropic's announcement comes as the company grapples with competitive pressure from GPT-5's aggressive pricing strategy. Early analysis shows Claude Opus 4 costing roughly seven times more per million tokens than GPT-5 for certain tasks, creating immediate pressure on Anthropic's premium positioning. The pricing disparity signals a fundamental shift in competitive dynamics that could force enterprise procurement teams to reconsider vendor relationships previously built on performance rather than price. Companies managing exponentially growing AI budgets now face comparable capability at a fraction of the cost. Yet Anthropic is maintaining its pricing strategy with Claude Sonnet 4.5. "Sonnet 4.5's cost remains the same as Sonnet 4," the spokesperson confirmed, keeping prices at $3 per million input tokens and $15 per million output tokens. Claude Sonnet 4.5 delivers 30-hour autonomous work sessions and enhanced security Beyond performance improvements, Anthropic positions Claude Sonnet 4.5 as its "most aligned frontier model yet," showing significant reductions in concerning behaviors like sycophancy, deception, and power-seeking tendencies. The company has made "considerable progress on defending against prompt injection attacks," a critical security concern for enterprise deployments. The model is being released under Anthropic's AI Safety Level 3 (ASL-3) protections, which include classifiers designed to detect potentially dangerous inputs and outputs related to chemical, biological, radiological, and nuclear weapons. While these safeguards sometimes flag normal content, Anthropic says it has reduced false positives by a factor of ten since initially describing them. Perhaps most significantly for developers, Anthropic is releasing the Claude Agent SDK -- the same infrastructure that powers its Claude Code product. "We built Claude Code because the tool we needed didn't exist yet," the company said in its announcement. "The Agent SDK gives you the same foundation to build something just as capable for whatever problem you're solving." International expansion accelerates as $1.5 billion copyright settlement finalizes The model launch coincides with Anthropic's aggressive international expansion, as the company seeks to diversify beyond its U.S.-concentrated customer base. The startup recently announced plans to triple its international workforce and expand its applied AI team fivefold in 2025, driven by data showing that nearly 80% of Claude usage now comes from outside the United States. However, the expansion comes amid significant legal costs. Anthropic recently agreed to pay $1.5 billion in a copyright settlement with authors and publishers over allegations the company illegally used their books to train AI models without permission. The settlement, approved by a federal judge last week, requires payments of $3,000 for each publication listed in the case. Enterprise AI spending doubles as companies prioritize performance over cost The rapid-fire model releases from both companies reflect the high stakes in enterprise AI adoption. Model API spending has more than doubled to $8.4 billion in just six months, according to Menlo Ventures, as enterprises shift from experimental projects to production deployments. Customer behavior patterns suggest enterprises consistently prioritize performance over price, upgrading to the newest models within weeks of release regardless of cost. This behavior could work in Anthropic's favor if Claude Sonnet 4.5's performance advantages prove compelling enough to overcome GPT-5's pricing advantage. However, the dramatic price differential introduced by GPT-5 could overcome typical switching inertia, especially for cost-conscious enterprises facing budget pressures. Industry observers note that model switching costs remain relatively low, with 66% of enterprises upgrading within existing providers rather than switching vendors. For enterprises, the intensifying competition delivers better performance and lower costs through continuously improving capabilities. The rapid pace of model improvements -- with new versions launching monthly rather than annually -- provides organizations with expanding AI capabilities while vendors compete aggressively for their business. While the corporate rivalry between Anthropic and OpenAI dominates industry headlines, the real economic impact extends far beyond Silicon Valley boardrooms. The development of AI systems capable of sustained coding work for 30 hours represents a fundamental shift in how software gets built, with implications that extend across every industry relying on technology infrastructure. These advancing capabilities signal broader workplace transformation ahead. As AI systems demonstrate increasing proficiency at complex, sustained intellectual work, the technology industry's competition for coding supremacy foreshadows similar disruptions across fields requiring analytical thinking, problem-solving, and technical expertise.
[15]
Anthropic launches Claude Sonnet 4.5 -- 'best coding model in the world'
Anthropic has formally announced Claude Sonnet 4.5, a new AI model specifically made for coding. Anthropic didn't mince any words during its announcement, calling Claude Sonnet 4.5 the "best coding model in the world." Starting today, it'll be powering Claude Code, a popular choice for vibe coders and professionals alike. The new model is a step up from the old models and seems to be able to do quite a lot. Per Anthropic co-founder and CEO Scott Wu, the new model features "the biggest jump we've seen since the release of Claude Sonnet 3.6" and can "run longer, handle harder tasks, and deliver production-ready code." Anthropic shows this through a variety of charts demonstrating how effective the model can be. For example, Claude Sonnet 4.5 has a lower instance of misaligned behaviors than its direct competitors, including older models from Anthropic. Anthropic AI researcher David Hershey told TechCrunch that he's seen the model code for 30 hours without interruption in early trials, so overall performance may be difficult to show in benchmarks. In addition to better smarts, Anthropic also announced several new features for Claude Code to coincide with the release. That includes checkpoints, a feature that Anthropic says has been requested quite a lot. Checkpoints will save snapshots of the code the user is working on, and then grant the ability to roll back to a prior checkpoint if things go off the rails. There is also a new context editing feature and memory tool that allows AI agents to run longer and handle more complex instructions. Anthropic has been on a roll in 2025, as have most AI companies. New models seem to drop every couple of months like clockwork these days. Anthropic's prior big model release, Claude Opus 4, launched in May 2025, which was also designed for advanced coding. OpenAI launched its latest GPT-5 in early August, and Google joined the fray with Gemini 2.5 over the summer. Thus, AI fans have a lot of new stuff to check out if they haven't done so in a while.
[16]
Anthropic's Claude Sonnet 4.5 is available now - 'the best AI model in the world for real-world agents, coding, and computer use'
Claude Code now has checkpoints and you can create files directly from within Claude's chatbot Anthropic has released Claude Sonnet 4.5, the next generation of its incredibly popular AI model. The company is calling Sonnet 4.5 "the best coding model in the world" and claims it's the "strongest model for building complex agents" and for "using computers." The new upgrade to Claude launches alongside other upgrades to Anthropic's most popular products, including checkpoints for Claude Code so you can save your progress and roll back to previous states, as well as code execution and file creation for spreadsheets, slides, and documents from within your conversation. These capabilities will be available on all paid plans. Alongside these big announcements, Anthropic also confirmed that Claude for Google Chrome will become available to everyone who had previously joined the waitlist. Anthropic says Sonnet 4.5 is the company's "most aligned model yet" and claims the new upgrade will "substantially improve the model's behavior, reducing concerning behaviors like sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking." Claude Sonnet 4.5 is available today, and prices remain the same per million tokens as the previous model Sonnet 4. OpenAI's latest study shows Claude Sonnet 4.1 beats GPT-5, Gemini, and Grok in real-world job tasks. Now Anthropic says Sonnet 4.5 shows "clear progress" over its predecessor, and not only does the new model achieve higher performance in benchmarks, but it is also "better at meeting practical business needs than its predecessor, allowing our customers to do more, solve harder problems, and be more creative." Sonnet 4.5 can now help users build custom agents using natural language, and Anthropic even claims that the model can work autonomously for 30 hours, a massive improvement over Claude Opus 4's 7-hour capability. Anthropic shared some quotes from its clients during the reveal of Sonnet 4.5, and the model has been met by early adopters with glowing praise. Danny Wu, Head of AI Products at Canva, said, "Claude Sonnet 4.5 delivers impressive gains on our most complex, long-context tasks -- from engineers in our codebase to in-product features and research. It's noticeably more intelligent, helping us push what 240M+ users can design with Canva." While Sean Ward, CEO of iGent AI, said the new model, "resets our expectations -- it handles 30+ hours of autonomous coding, freeing our engineers to tackle months of complex architectural work in dramatically less time while maintaining coherence across massive codebases." Claude Sonnet 4.5 looks set to be one of the biggest AI releases of the year, and while we're yet to try it ourselves, it sure does sound incredibly promising.
[17]
Anthropic launches new AI model, touting coding supremacy
US startup Anthropic on Monday announced the launch of its new generative artificial intelligence model, Claude Sonnet 4.5, which it says is the world's best for computer programming. Anthropic was created in early 2021 by former OpenAI staff who felt their employer, led by CEO Sam Altman, was not doing enough to control and prevent the potentially harmful effects of its models. Backed by Amazon, it quickly joined the major players in generative AI that embarked on a frantic race after the arrival of ChatGPT from OpenAI in November 2022, with new models being released at a furious pace with ever-expanding capabilities. While trailing OpenAI in terms of users and name recognition, Anthropic had been considered for several months the top performer in generative AI for computer coding. This is seen as a highly strategic accomplishment, with programming often cited as the specialty most ripe for disruption -- and revenue generation -- by AI in the near term. But OpenAI's most recent assistant, GPT-5, launched in early August, had taken the lead in certain rankings for AI-generated programming, putting pressure on Anthropic to deliver more capability in its next offering. In a key benchmark, Claude Sonnet 4.5, a new generation of language model, can operate autonomously for 30 hours straight once it is assigned a task. This is a significant leap from Anthropic's most powerful version until now, Claude 4 Opus, which could only run for seven hours. These generative AI programs function alone for several hours as they regularly evaluate their own output and make changes and corrections autonomously. Claude Sonnet 4.5 achieved the highest score when tested by the independent evaluation system SWE-Bench Verified, developed by researchers from Princeton and Stanford universities. It is also, according to Anthropic, the most advanced model for developing AI agents capable of making real-world decisions for which they have not been trained or specifically programmed. Anthropic's new release is also the most sophisticated for applications that allow an AI assistant to use a computer as a human would. Upon request in everyday language, the interface can perform a Google search or update a calendar. This functionality was first offered by Anthropic in October 2024. OpenAI launched an equivalent product, Operator, in January 2025.
[18]
Anthropic releases Claude 4.5, a model it says can build software and accomplish business tasks autonomously | Fortune
Anthropic has launched Claude Sonnet 4.5, its newest AI model, claiming significant advancements in autonomous work and coding. The company said that the model was able to run autonomously for 30 hours, maintaining sustained focus with minimal oversight while building an entire software application. It's a significant improvement over the company's previous Opus 4 model, released four months ago, which could operate autonomously for only seven hours. Anthropic said Claude Sonnet 4.5 also outperformed Opus on key benchmarks and was more effective in meeting customers' practical business needs. The company said the model was even better at coding than previous frontier models, and state-of-the-art on SWE-Bench Verified, a key benchmark that tests how models perform at software development tasks. Anthropic said that Claude Sonnet 4.5 was better than its predecessors at following instructions, identifying code improvements, and generating more production-ready code. When tested on tasks from the financial services industry, the company said the new model outperformed earlier Claude models in tasks such as researching, building financial models, and forecasting. Anthropic appears to be pushing further ahead of its competitors in coding assistance and autonomous task completion, positioning its models toward corporate and workplace use. The company's previous Claude 4.1 Opus model already bested competitors on OpenAI's new benchmark of professional task completion, GDPval, which tested how models performed compared to human professionals across a range of industries and jobs. Last week, OpenAI said its GPT-5 model and Anthropic's Claude Opus 4.1 were "already approaching the quality of work produced by industry experts." Dueling usage studies released earlier this month also suggested that Anthropic's Claude models were emerging as more professionally-oriented AI models, especially in comparison to OpenAI's ChatGPT, which is increasingly being used as a consumer product. According to the study, most Claude users were turning to the models for workplace or productivity tasks, with mathematical tasks and coding cited as the dominant activities globally for Claude.ai, and making up 36% of all use cases. Business use of Claude leaned heavily toward task automation. According to the study, approximately 77% of prompts that the model receives through its API -- the application programming interface that is primarily used by enterprise customers -- involve users requesting the system to perform tasks on their behalf, rather than just providing advice or suggestions. These business-focused interactions are also concentrated in coding, which accounts for 44% of API use. A further 5% of API usage was dedicated to developing or evaluating AI systems. The tasks that business users automate also tend to be the most expensive ones to run. The findings indicate a shift in how businesses approach these tools. Rather than using them mainly for decision support or research, many teams are relying on them to take work off their plates entirely. If models like Claude are able to become more capable of autonomous work, especially in complex, time-intensive domains like software engineering, the implications for businesses and employees could be significant. Autonomous agents can reduce the need for constant human oversight and lower costs on repetitive workflows, speeding up a company's operations and potentially reducing the need for headcount.
[19]
Anthropic Claims 'Best Coding Model in the World' With Claude Sonnet 4.5 -- We Tested It - Decrypt
Anthropic claimed improvements on alignment and safety, but jailbreakers cracked it within minutes. Anthropic released Claude Sonnet 4.5 on Monday, calling it "the best coding model in the world" and releasing a suite of new developer tools alongside the model. The company said the model can focus for more than 30 hours on complex, multi-step coding tasks and shows gains in reasoning and mathematical capabilities. The model scored 77.2% on SWE-bench Verified, a benchmark that measures real-world software coding abilities, according to Anthropic's announcement. That score rises to 82% when using parallel test-time compute. This puts the new model ahead of the best offerings from OpenAI and Google, and even Anthropic's Claude 4.1 Opus (per the company's naming scheme, Haiku is a small model, Sonnet is a medium size, and Opus is the heaviest and most powerful model in the family). Claude Sonnet 4.5 also leads on OSWorld, a benchmark testing AI models on real-world computer tasks, scoring 61.4%. Four months ago, Claude Sonnet 4 held the lead at 42.2%. The model shows improved capabilities across reasoning and math benchmarks, and experts in specific business fields like finance, law and medicine. We tried the model, and our first quick test found it capable of generating our usual "AI vs Journalists" game using zero-shot prompting without iterations, tweaks, or retries. The model produced functional code faster than Claude 4.1 Opus while maintaining top quality output. The application it created showed visual polish comparable to OpenAI's outputs, a change from earlier Claude versions that typically produced less refined interfaces. Anthropic released several new features with the model. Claude Code now includes checkpoints, which save progress and allow users to roll back to previous states. The company refreshed the terminal interface and shipped a native VS Code extension. The Claude API gained a context editing feature and a memory tool that lets agents run longer and handle greater complexity. Claude apps now include code execution and file creation for spreadsheets, slides, and documents directly in conversations. Pricing remains unchanged from Claude Sonnet 4 at $3 per million input tokens and $15 per million output tokens. All Claude Code updates are available to all users, while Claude Developer Platform updates, including the Agent SDK, are available to all developers. Anthropic also called Claude Sonnet 4.5 "our most aligned frontier model yet," saying it made substantial improvements in reducing concerning behaviors like sycophancy, deception, power-seeking, and encouraging delusional thinking. The company also said it made progress on defending against prompt injection attacks, which it identified as one of the most serious risks for users of agentic and computer use capabilities. Of course, it took Pliny -- the world's most famous AI prompt engineer -- a few minutes to jailbreak it and generate drug recipes like it was the most normal thing in the world. The release comes as competition intensifies among AI companies for coding capabilities. OpenAI released GPT-5 last month, while Google's models compete on various benchmarks. This can be a shocker for some prediction markets, which up until a few hours ago were almost completely certain that Gemini was going to be the best model of the month. It may be a race against time. Right now, the model does not appear on the rankings, but LM Arena announced it was already available for ranking. Depending on the number of interactions, the outcome tomorrow could be pretty surprising, considering Claude 4.1 Opus in in second place and Claude 4.5 Sonnet is much better. Anthropic is also releasing a temporary research preview called "Imagine with Claude," available to Max subscribers for five days. In the experiment, Claude generates software on the fly with no predetermined functionality or prewritten code, responding and adapting to requests as users interact. "What you see is Claude creating in real time," the company said. Anthropic described it as a demonstration of what's possible when combining the model with appropriate infrastructure.
[20]
Claude Sonnet 4.5 can code for 30 hours straight -- and it could change the future of work forever
Anthropic has just announced Claude Sonnet 4.5 and is calling it the "best in the world" for coding, real-world agent and complex computer use. In internal testing, Sonnet 4.5 ran autonomously for more than 30 hours straight while maintaining performance and focus. This is a giant leap from the seven hours possible with Claude Opus 4 just months ago. With nearly a full work week of nonstop AI effort, this new model underscores the possibilities of where the future of work and personal productivity might be headed. With vibe coding so easy that anyone can do it, Claude Sonnet 4.5 hints at how AI could soon handle everyday tasks more reliably. Instead of just spitting out snippets of code or short answers, this model can stay focused for hours, which means it's finally practical for real projects or just about anything else they are prompted to do. From apps and websites to so much more, here's what is now possible: Claude Sonnet 4.5 is here just four months after Sonnet 4, highlighting just how quickly AI is evolving. Instead of being limited to short bursts, it now sustains output across multi-day projects. Anthropic positions it as both more powerful and safer, with a balance of speed and cost that could attract not just businesses, but everyday users who want a reliable assistant for demanding projects. Anthropic could potentially win over Gemini and ChatGPT users, especially those who want AI to tackle their biggest and most complex workloads with speed and efficiency. Because Claude Sonnet 4.5 can sustain effort, remember context, and interact with the tools you already use - all with human-like conversations - the dream of a true digital sidekick is quickly becoming a reality.
[21]
Anthropic launches new AI model, touting coding supremacy
New York (AFP) - US startup Anthropic on Monday announced the launch of its new generative artificial intelligence model, Claude Sonnet 4.5, which it says is the world's best for computer programming. Anthropic was created in early 2021 by former OpenAI staff who felt their employer, led by CEO Sam Altman, was not doing enough to control and prevent the potentially harmful effects of its models. Backed by Amazon, it quickly joined the major players in generative AI that embarked on a frantic race after the arrival of ChatGPT from OpenAI in November 2022, with new models being released at a furious pace with ever-expanding capabilities. While trailing OpenAI in terms of users and name recognition, Anthropic had been considered for several months the top performer in generative AI for computer coding. This is seen as a highly strategic accomplishment, with programming often cited as the specialty most ripe for disruption -- and revenue generation -- by AI in the near term. But OpenAI's most recent assistant, GPT-5, launched in early August, had taken the lead in certain rankings for AI-generated programming, putting pressure on Anthropic to deliver more capability in its next offering. In a key benchmark, Claude Sonnet 4.5, a new generation of language model, can operate autonomously for 30 hours straight once it is assigned a task. This is a significant leap from Anthropic's most powerful version until now, Claude 4 Opus, which could only run for seven hours. These generative AI programs function alone for several hours as they regularly evaluate their own output and make changes and corrections autonomously. Claude Sonnet 4.5 achieved the highest score when tested by the independent evaluation system SWE-Bench Verified, developed by researchers from Princeton and Stanford universities. It is also, according to Anthropic, the most advanced model for developing AI agents capable of making real-world decisions for which they have not been trained or specifically programmed. Anthropic's new release is also the most sophisticated for applications that allow an AI assistant to use a computer as a human would. Upon request in everyday language, the interface can perform a Google search or update a calendar. This functionality was first offered by Anthropic in October 2024. OpenAI launched an equivalent product, Operator, in January 2025.
[22]
Anthropic sets AI coding record with new flagship Claude Sonnet 4.5 model - SiliconANGLE
Anthropic sets AI coding record with new flagship Claude Sonnet 4.5 model Anthropic PBC today debuted its newest large language model, Claude Sonnet 4.5, and a toolkit for building artificial intelligence agents. The company describes the LLM as the world's best coding model. Additionally, it says that Sonnet 4.5 has set a record on a benchmark designed to evaluate neural networks' tool use capabilities. Sonnet 4.5 is a hybrid reasoning model, which means it has two modes. When users enter relatively simple queries, the LLM quickly generates a response using a limited amount of computing power. When it receives a more complicated question, Sonnet 4.5 can spend a significant amount of time working on an answer. That approach boosts output quality at the expense of higher hardware usage. Anthropic evaluated the model's programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new industry record with a 82% score. The next two highest scores were also achieved by Anthropic models while the fourth place went to GPT-5 Codex, which answered 74.5% of the questions correctly. Sonnet 4.5 also set a record on a second benchmark called OSWorld. It's used to measure how well neural networks interact with external applications such as databases. Sonnet 4.5 achieved a record score of 61.4%, a nearly 20% improvement over the Sonnet 4 model Anthropic released four months ago. The company claims that its latest LLM also outperformed the competition across more than a half dozen other benchmarks. According to Anthropic, those tests evaluate AI models' ability to perform tasks such as interpreting graphs and analyzing financial data. Sonnet 4.5 is available through Anthropic's Claude chatbot service, Claude Code programming assistant and its application programming interface. The latter two products received updates today in conjunction with the LLM launch. Developers interact with Claude Code by entering instructions into a command line interface. Anthropic has made several usability improvements to that interface as part of today's update. Additionally, it's rolling out an extension that embeds Claude Code in the popular Visual Studio Code programming tool. The extension is currently available in beta. The other major addition to Claude Code is a feature that automatically saves the user's code after every major change. If an error finds its way into the workflow, developers can rewind their code to an earlier, reliable version. The upgrades are rolling out alongside a development toolkit called the Claude Agent SDK. According to Anthropic, its engineers originally built the toolkit to power Claude Code. Customers can use it to build AI agents. Claude Agent SDK enables an agent to delegate work to so-called subagents that can perform multiple tasks in parallel, which speeds up processing. Additionally, the toolkit makes it easier to build AI applications that can interact with external systems. To reduce the risk of hallucinations, agents built with Claude Agent SDK can check their output for accuracy issues. The toolkit can be used with the Claude API, which now provides access to Sonnet 4.5. The LLM is joined by several other enhancements.
[23]
Anthropic Launches Claude Sonnet 4.5, Touts It as 'Best Coding Model in the World' | AIM
Claude Sonnet 4.5 achieved top scores on the SWE-bench Verified evaluation, which tests real-world software coding skills. Anthropic on Monday announced the release of Claude Sonnet 4.5, its latest AI model for coding and agent-based tasks. The company said the model demonstrates improvements in reasoning, math, and long-duration task management. "Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents," the company said in its blog post. "It's also the best model at using computers and shows substantial gains in reasoning and math." The model is available via the Claude API at the same pricing as Sonnet 4, $3 per million tokens for standard use and $15 per million for extended use. Anthropic said that the Claude API has added context editing and memory tools to support longer tasks, and the Claude apps now allow code execution and file creation directly within conversations. Anthropic also released the Claude for Chrome extension for Max users on the waitlist. Claude Sonnet 4.5 is also integrated into Claude Code, which now includes checkpoints to save progress and roll back to previous states, a refreshed terminal interface, and a native VS Code extension. Developers can access the Claude Agent SDK, which provides the infrastructure used internally to build Claude Code. "The Agent SDK gives you the same foundation to build something just as capable for whatever problem you're solving," the spokesperson said. Claude Sonnet 4.5 achieved top scores on the SWE-bench Verified evaluation, which tests real-world software coding skills. On OSWorld, a benchmark for real-world computer tasks, the model scored 61.4%, up from 42.2% for Claude Sonnet 4. Early users reported improved performance across finance, law, medicine, and STEM domains. The company emphasised safety and alignment improvements, noting reductions in misaligned behaviour such as sycophancy, deception, and power-seeking. The model is released under Anthropic's AI Safety Level 3 framework, which includes classifiers to flag potentially dangerous content. Anthropic also introduced a temporary research preview, "Imagine with Claude," which allows users to see the model generate software in real time. It is available to Max subscribers for five days at claude.ai/imagine.
[24]
Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities
AI company Anthropic has released Claude Sonnet 4.5, a new flagship model that the company positions as its most capable for coding, building complex AI agents, and using computer systems, with significant gains in reasoning and mathematics. The new model is available now and is accompanied by a new developer toolkit and major updates across the Claude product line. According to Anthropic's blog post, the model achieves state-of-the-art performance on the SWE-bench Verified evaluation, a benchmark that measures real-world software coding abilities. It also shows improved performance on the OSWorld benchmark, which tests an AI model's ability to perform real-world tasks on a computer, such as navigating websites and filling spreadsheets. The company also reports that experts in finance, law, medicine, and STEM found Sonnet 4.5 to have dramatically better domain-specific knowledge and reasoning compared to previous models. Alongside the new model, Anthropic has launched the Claude Agent SDK. This software development kit provides developers with the same infrastructure the company uses to power its Claude Code product, enabling them to build their own custom AI agents. The SDK is designed to solve common challenges in agent development, such as managing memory for long-running tasks, handling permission systems, and coordinating subagents working toward a shared goal. The launch of Sonnet 4.5 includes several significant upgrades to existing Claude products. Anthropic states that Claude Sonnet 4.5 is its most aligned model to date, with improvements in reducing undesirable behaviors like deception and sycophancy. The model is released under the company's AI Safety Level 3 (ASL-3) framework, which includes safeguards like classifiers designed to detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons. For a limited time, Anthropic is offering a research preview called "Imagine with Claude" for its Max subscribers. In this demonstration, the model generates software in real time in response to user requests, with no prewritten code. This preview is designed to showcase the capabilities of Son-net 4.5 when combined with the right infrastructure. Claude Sonnet 4.5 is available now through the Claude API. The pricing is the same as the previous Claude Sonnet 4 model, at $3 per million input tokens and $15 per million output tokens. Anthropic recommends upgrading to Sonnet 4.5 for all uses, as it provides improved performance for the same cost. The release of Claude Sonnet 4.5 has intensified the competition at the forefront of artificial intelligence, directly challenging GPT-5. While both models represent advanced AI development, they showcase distinct strengths, particularly in the realms of coding, agentic capabilities, and overall performance. Claude Sonnet 4.5 has been positioned as the "best coding model in the world." This claim is substantiated by its leading performance on several key benchmarks. On SWE-bench Verified, which measures a model's ability to solve real-world GitHub issues, Sonnet 4.5 scores an impressive 77.2%, outperforming GPT-5's 72.8%. With additional computing power, Sonnet 4.5's score jumps to 82%. Furthermore, on Terminal-Bench, a test of an AI's ability to use a command-line interface, Sonnet 4.5 achieved a 50% success rate, significantly ahead of GPT-5's 43.8%. This suggests that for developers and technical users who need an AI to perform complex, multi-step tasks in a terminal environment, Sonnet 4.5 holds a distinct advantage. In contrast, GPT-5 is presented as a powerful, general-purpose coding model. While it set new state-of-the-art benchmarks at the time of its release, the specialized focus of Sonnet 4.5 appears to give it an edge in developer-centric tasks. A standout feature of Claude Sonnet 4.5 is its ability to function as a long-running autonomous agent. Reports indicate the model can maintain focus and performance on complex tasks for more than 30 hours, a significant increase from previous models. This endurance is crucial for tasks that require sustained effort, such as large-scale code refactoring or in-depth data analysis. On the OSWorld benchmark, which evaluates an AI's ability to perform real-world tasks on a computer, Sonnet 4.5 has taken the top spot with a success rate of 61.4%. This proficiency is further demonstrated in its tool use capabilities, where it scored a remarkable 98.0% in the Telecom domain of the Ï„-bench evaluations, nearly doubling the performance of its predecessor and surpassing GPT-5. GPT-5, on the other hand, is designed as a unified system that can intelligently switch between different reasoning approaches based on the task's complexity. This allows it to handle a wide variety of tasks efficiently, but it does not emphasize the same long-duration autonomy as Sonnet 4.5. In areas of general reasoning and mathematics, the competition is much closer. On the AIME 2025 high school math competition, Sonnet 4.5 achieved a perfect 100% score when using Python, slightly edging out GPT-5's 99.6%. For graduate-level reasoning, as measured by the GPQA Diamond benchmark, the models are highly competitive, with GPT-5 holding a slight lead. Early user reports and hands-on tests suggest that Sonnet 4.5 is noticeably faster...
[25]
Anthropic Says Its Latest Claude AI Is 'the Best Coding Model in the World'
Anthropic has announced Claude Sonnet 4.5, the latest version of its default model. The company says the model isn't just "the best coding model in the world," it's also "the strongest model for building complex agents." In the context of AI, an agent is an AI model that uses tools that allow it to take actions, like running code and taking over an internet browser. Anthropic said that when it comes to coding, Sonnet 4.5 is better at both identifying small improvements and considering larger changes to code, and follows instructions more directly when coding on users' behalf. In data shared with Inc., Anthropic claimed that the new model exhibited state-of-the-art performance across a wide variety of benchmarks. For example, on SWE-Bench Verified, a widely-used benchmark that measures an AI model's ability to solve real-world software engineering tasks, Sonnet 4.5 was able to successfully solve 77.2 percent of tasks, up from the 74.5 percent solved by Claude Opus 4.1, a larger and much more expensive model released in August. AI agents built using Sonnet 4.5 will also be a step up thanks to a new software development kit (SDK) called Claude Agent SDK. The SDK gives developers access to the same agentic tools used by the company's popular coding agent, Claude Code. These tools enable developers to easily build Sonnet 4.5-based agents that can read and write files, manage context while working on long-running tasks, run code, search the web, pass on context from one agent to another, and coordinate multiple sub-agents to work on tasks simultaneously.
[26]
Anthropic Says Claude Sonnet 4.5 Is the 'Best Coding Model in the World'
Claude Sonnet 4.5 also improves agentic and computer use functions Anthropic released the Claude Sonnet 4.5 artificial intelligence (AI) model on Monday. Calling it the "best coding model in the world," the company highlighted that it has been improved across multiple domains including, coding, agentic operations, computer use, reasoning, and domain-specific knowledge. The new model will be available across Claude website and mobile apps, Claude Code, the application programming interface (API), as well as the under-testing Claude for Chrome extension. Anthropic also claims that the AI model can work autonomously for 30 hours on a task. Claude Sonnet 4.5 Features and Capabilities In a blog post, the AI firm detailed the new AI model. Based on the company's claims, Claude Sonnet 4.5 is designed to make a massive leap in coding and agentic performance, although other areas have also been upgraded. However, despite the upgrades, the new model only offers incremental improvements and does not bring any new capability or modality. Based on internal testing, Anthropic claims that the large language model (LLM) achieved a score of 77.2 percent on the SWE bench-Verified benchmark, which measures the agentic coding capabilities of a model. Notably, this is higher than what OpenAI's GPT-5 and Google's Gemini 2.5 Pro scored, or the company's Opus 4.1. Claude Sonnet 4.5 demo Gadgets 360 staff members briefly tested the model's coding capabilities. We asked it to create a WhatsApp-like messaging chatbot, complete with individual and group chats, as well as audio and video calls. In just two minutes, it wrote 436 lines of code in React, and was able to generate a preview for the fully-functional (minus the server connectivity) interface. Other benchmarks where the AI model is said to have led the charts include Terminal Bench, OSWorld for Computer Use, AIME 2025 for high-school mathematics, and Finance Agent for financial analysis. In reasoning-based GPQA Diamond, Gemini 2.5 Pro fared better, while GPT-5 led the charts in MMMU benchmark for visual reasoning and MMLU for multilingual performance. Anthropic also claimed that the model surpassed all the company's older models in domain-specific knowledge and reasoning across finance, law, medicine, and STEM fields. Coming to safety, the AI firm claims that the Claude Sonnet 4.5 is its "most aligned frontier model." Anthropic claims that it has reduced behaviours such as sycophancy, depeception, power-seeking, and the tendency to encourage delusional thinking. Safeguards have been taken to protect it from prompt injections as well.
[27]
Claude Sonnet 4.5 launched by Anthropic: New features, upgrades, free access and more
Anthropic has unveiled its latest AI model, Claude Sonnet 4.5, which the company claims can work autonomously for up to 30 hours to create applications from scratch. This marks a significant improvement over its previous model, Opus 4.1, which could operate independently for just seven hours. According to Anthropic, Claude Sonnet 4.5 demonstrated its capabilities by creating a chat application similar to Slack and Microsoft Teams, generating over 11,000 lines of code without stopping until the task was complete. The company describes Sonnet 4.5 as the "best model at using computers," emphasizing its focus on building complex agents. Claude Sonnet 4.5 expands on the Computer Use functionality introduced last year. The AI can now interpret what appears on the screen and navigate it autonomously, performing tasks in a manner akin to human users. Anthropic has added a checkpoint feature to Claude Code, allowing users to easily revert to previous versions of their code during any project. Additionally, the Claude API has been enhanced, enabling agents to operate for longer durations and tackle more complex challenges. Users can now generate documents, slides, and spreadsheets directly within a conversation with Claude. The new Imagine with Claude feature allows users to transform their ideas into software in real-time. Unlike traditional coding tools, it requires no preset commands or code, letting users create software purely based on their prompts. Free users will have access to Claude Sonnet 4.5, though usage may be limited by a daily token allowance. Anthropic has confirmed that premium members will not face any price increase while gaining access to the new AI model.
[28]
Sculptor : The Missing Claude Code UI You've have Been Waiting For
What if the tools meant to simplify AI development often end up complicating it instead? For many developers working with Claude Code, the lack of a dedicated, intuitive interface has been a persistent frustration. Managing multiple agents, debugging errors, and collaborating across teams often feels like navigating a maze without a map. Enter Sculptor, a new desktop interface designed to bridge this gap. With its focus on parallel processing, real-time collaboration, and intelligent error handling, Sculptor doesn't just streamline workflows, it redefines them. Imagine a workspace where managing AI agents is as seamless as sketching ideas on a blank canvas. That's the promise Sculptor brings to the table. Imbue explain how Sculptor transforms the way developers build, refine, and deploy Claude Code agents. From its ability to run multiple agents in secure containers to its forward-thinking features like conversation forking, this tool is more than just a convenience, it's a fantastic option for innovation. Whether you're a startup founder scaling your AI operations or an engineer troubleshooting complex ecosystems, Sculptor offers solutions that feel almost tailor-made. But how does it achieve this balance of simplicity and power? Let's delve into the features that make Sculptor not just a tool, but the missing UI developers have been waiting for. Managing multiple AI agents simultaneously is often a daunting task, but Sculptor makes it intuitive and efficient. Its ability to run multiple Claude Code agents in parallel, each within its own secure container, ensures optimal performance and safety. This architecture allows you to test, refine, and deploy agents without the risk of interference or resource conflicts. For instance, you can operate a customer service bot alongside a data analysis agent, each performing distinct tasks in real time. This capability is especially beneficial for teams working on complex AI ecosystems, as it minimizes downtime and maximizes efficiency. Sculptor's parallel agent management is a fantastic option for organizations aiming to scale their AI operations seamlessly. Collaboration is a cornerstone of Sculptor's design philosophy. The platform's Pairing Mode enables real-time testing and editing directly within the integrated development environment (IDE). This feature allows team members to make changes and immediately observe their impact, fostering faster iterations and smoother teamwork. For distributed teams or projects requiring rapid updates, this functionality is invaluable. Imagine a scenario where one team member edits an agent's code while another tests its functionality, all within the same synchronized environment. This streamlined workflow ensures that your team remains aligned and productive, regardless of geographical location or time zone. Gain further expertise in Claude Code by checking out these recommendations. Debugging and resolving errors can often consume a significant portion of development time. Sculptor addresses this challenge by not only detecting coding errors but also offering actionable suggestions to resolve them. Its advanced error-handling capabilities extend to managing merge conflicts, a common issue in collaborative coding environments. For example, if two developers make conflicting changes to an agent's logic, Sculptor analyzes the differences and proposes a resolution. This feature reduces frustration, saves time, and ensures that your codebase maintains a high standard of quality. By simplifying error resolution, Sculptor allows you to focus on innovation rather than troubleshooting. Sculptor is built with a forward-looking approach, making sure it evolves alongside advancements in AI technology. Upcoming updates include conversation forking, which enables you to explore multiple development paths simultaneously, and enhanced AI-driven suggestions to support better decision-making during the development process. Additionally, Sculptor's planned integration with GPT-5 will bring more advanced language capabilities to Claude Code agents, allowing them to handle complex tasks with greater precision. The platform will also support custom Docker file integration, giving you the flexibility to tailor your development environment to meet specific project requirements. These features position Sculptor as a tool that not only meets today's needs but also anticipates tomorrow's challenges. Sculptor ensures accessibility for developers working across diverse operating systems. With support for both Mac and Linux, the platform eliminates compatibility concerns, making it easy for teams with varied hardware setups to adopt and use the tool effectively. Whether you're working on a MacBook or a Linux workstation, Sculptor delivers a consistent, reliable experience that adapts to your workflow. Sculptor is more than just an interface, it's a comprehensive solution for managing Claude Code agents. By combining parallel processing, real-time collaboration, intelligent error handling, and future-ready features, it enables developers to focus on creativity and innovation rather than logistical challenges. Whether you're building AI agents for customer support, data analysis, or other specialized applications, Sculptor provides the tools and flexibility you need to succeed. Its robust feature set, coupled with a commitment to enhancing productivity, makes it an indispensable resource for developers and engineers navigating the rapidly evolving AI landscape.
[29]
Anthropic launches new vibe coding model Claude Sonnet 4.5: All you need to know - The Economic Times
Anthropic launched Claude Sonnet 4.5, its latest AI coding model, which it claims has improved coding, reasoning, and mathematical skills. It also shows better computer use. Experts note its enhanced domain knowledge. The model is available globally today. Pricing remains unchanged from Sonnet 4. Anthropic targets global expansion, including India.Anthropic launched its latest artificial intelligence (AI) coding model, Claude Sonnet 4.5, on Monday, claiming it could handle longer coding sessions, and perform better on reasoning and mathematical tasks. The AI major also claimed that the model has shown significant improvement in computer use. The new model has been rolled out as Anthropic eyes global expansion -- including plans for India -- to meet the rising enterprise demand for AI. Alphabet and Amazon-backed Anthropic has been marketing Claude's coding and data-analysis skills to regulated industries and teams that want models to work across multiple software tools. Claude Sonnet 4.5 features Anthropic claims that Claude Sonnet 4.5 scores the highest against its own models as well as OpenAI's GPT-5 and GPT-5 Codex and Google's Gemini 2.5 Pro on the SWE-bench verified evaluation. This benchmark measures real-world software coding abilities. Claude Sonnet 4.5 was seen maintaining focus for more than 30 hours on complex, multi-step tasks, Anthropic said. Claude Sonnet 4.5 also shows improvements in computer use. It leads at 61.4% on OSWorld, a benchmark that tests AI models on real-world computer tasks. According to Anthropic, experts in finance, law, medicine, and STEM found Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1. Claude Sonnet 4.5 availability and price Anthropic said Claude Sonnet 4.5 is available everywhere today. Developers can access the coding model via Claude API. Pricing remains the same as Claude Sonnet 4, at $3/$15 per million input/output tokens, the AI company said.
[30]
Anthropic Launches Claude Sonnet 4.5 and Introduces Claude Agent SDK | PYMNTS.com
By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions. "Claude Sonnet 4.5 is state-of-the-art on the SWE-bench Verified evaluation, which measures complex real-world software coding abilities," the company said Monday (Sept. 29) in an announcement. "Practically speaking, we've observed it maintaining focus for more than 30 hours on complex, multi-step tasks." Anthropic added in the post that Sonnet 4.5 leads a benchmark that tests AI models on real-world computer tasks, OSWorld, at 61.4%. Together with the release of Sonnet 4.5, Anthropic has released upgrades to its products, according to the post. These include the addition of checkpoints to Claude Code, enabling users to save their progress and roll back to a previous state; the addition of a new context editing feature and memory tool to the Claude API, letting agents run longer and handle greater complexity; and the addition of code execution and file creation directly into the conversation in Claude apps, per the post. Anthropic also introduced Claude Agent SDK, which gives developers the ability to build AI agents with the same infrastructure that powers its frontier products, the post said. In addition, the Claude for Chrome extension is now available to Max users who joined the waitlist last month, according to the post. "We recommend upgrading to Claude Sonnet 4.5 for all uses," Anthropic said in the post. "Whether you're using Claude through our apps, our API, or Claude Code, Sonnet 4.5 is a drop-in replacement that provides much improved performance for the same price." When Anthropic launched a feature preview on Sept. 9 that allows users of Claude to create and edit files directly within Claude.ai and the desktop app, PYMNTS reported that the move positions the company to more directly compete against OpenAI's ChatGPT Enterprise, Microsoft Copilot and Google Gemini, which are all pitching AI-driven productivity tools to corporate users.
[31]
New Claude Code 2.0 Agentic AI Coding Agent : The Secret Weapon Every Dev Needs
What if your code could think for itself, anticipating your next move, debugging with precision, and even automating entire workflows? With the release of Claude Code 2.0, this isn't just a futuristic dream, it's a reality reshaping how developers approach software creation. Powered by the advanced reasoning capabilities of Claude Sonnet 4.5, this new AI coding agent is more than just a tool; it's a partner in innovation. From seamless integration into your IDE to autonomous agents that tackle complex challenges, Claude Code 2.0 promises to redefine productivity and creativity in development. But is this the breakthrough developers have been waiting for, or just another overhyped upgrade? In this overview, World of AI explore how Claude Code 2.0 transforms the development landscape with its agentic workflows and developer-centric design. You'll discover how its native VS Code extension, enhanced terminal interface, and Cloud Agent SDK empower you to work smarter, not harder. Whether you're debugging intricate algorithms, automating compliance checks, or building AI-driven applications, this platform offers tools tailored to meet the demands of modern coding. So, how does it all come together to elevate your workflow? At the heart of this innovation lies Claude Sonnet 4.5, a model engineered to deliver exceptional reasoning capabilities, mathematical accuracy, and high computational performance. Scoring an impressive 82% on Swaybench, it is specifically optimized for agentic coding, empowering you to tackle complex problems with confidence and precision. Whether you're designing intricate algorithms or building AI-driven applications, Claude Sonnet 4.5 provides a reliable computational backbone to bring your concepts to fruition. Its ability to handle nuanced tasks ensures that you can focus on innovation while relying on a robust AI foundation. Claude Code 2.0 introduces a suite of features designed to enhance your development experience and streamline your workflow. These include: These features are tailored to improve productivity, whether you're debugging, testing, or deploying code. By integrating these tools into your workflow, you can achieve greater efficiency and focus on solving critical challenges. The Cloud Agent SDK is a powerful resource for building autonomous, agent-driven workflows. It supports advanced functionalities such as sub-agents, hooks, and background tasks, allowing you to create specialized systems for tasks like backend API development or in-depth debugging. This SDK is particularly valuable in industries where precision and reliability are paramount, such as finance, healthcare, and cybersecurity. By automating tasks that traditionally require manual intervention, it minimizes the risk of human error and saves valuable time. For instance, you can develop agents that independently handle regulatory compliance checks or proactively detect cybersecurity threats, streamlining operations in high-stakes environments. This capability not only enhances operational efficiency but also ensures that critical processes are executed with consistency and accuracy. Claude Code 2.0 is designed with a strong emphasis on usability, making sure that developers of all levels can integrate its tools into their workflows effortlessly. Key features include: These features are designed to eliminate unnecessary friction, allowing you to focus on innovation and problem-solving rather than navigating cumbersome processes. By prioritizing developer-friendly design, Claude Code 2.0 ensures that you can maximize your productivity and creativity. The versatility of Claude Code 2.0 makes it an invaluable tool across a wide range of industries. Its capabilities are particularly impactful in sectors where compliance, security, and innovation are critical. Notable applications include: These use cases highlight the fantastic potential of Claude Code 2.0 in driving innovation and efficiency across diverse domains. By using its advanced features, you can develop solutions that meet the demands of today's dynamic technological landscape. Claude Code 2.0 and Claude Sonnet 4.5 represent a significant leap forward in AI-driven software development. By combining innovative technology with a focus on usability, these tools empower you to build autonomous agents, streamline workflows, and tackle complex challenges with confidence. Whether you're creating new applications or optimizing existing systems, Claude Code 2.0 equips you with the resources needed to succeed in an increasingly fast-paced and competitive environment. With its robust capabilities and developer-centric design, it sets a new standard for what AI-driven coding can achieve.
[32]
Anthropic launches Claude 4.5, touts better abilities, targets business customers - The Economic Times
The Alphabet and Amazon.com-backed AI startup is racing rivals to build models that can reliably operate software and complete multi-step work, key for AI agents, which can perform tasks on behalf of humans.Anthropic unveiled the Claude 4.5 AI model on Monday, saying the newest version can code for longer uninterrupted stretches and handle finance and scientific tasks better, as the startup pushes deeper into enterprise AI. The Alphabet and Amazon.com-backed AI startup is racing rivals to build models that can reliably operate software and complete multi-step work, key for AI agents, which can perform tasks on behalf of humans. The Sonnet 4.5 model created a web app from scratch in internal tests, and one customer had the AI chatbot code autonomously for 30 hours, up from a seven-hour run achieved by Anthropic's earlier Claude Opus 4 for a different client, Chief Product Officer Mike Krieger said. Anthropic is targeting power users and business customers rather than chasing a viral consumer moment, he said. Claude 4.5 is stronger at finance and scientific reasoning and better at using computers, scoring about 60% on a benchmark that tests operating-system dexterity versus roughly 40% for prior models, the company said. "It's a lot more visceral when you just see the model using a computer the way a person does if you're not a coder," said Chief Science Officer Jared Kaplan. Separately on Monday, Microsoft said it would add new Microsoft 365 Copilot features powered by Anthropic models, including "Agent Mode" in Excel and Word and an "Office Agent" in Copilot chat, with PowerPoint to follow. Microsoft last week said it would bring Anthropic's models to Microsoft 365 Copilot to diversify beyond longtime partner OpenAI. Anthropic, founded by former OpenAI executives, has positioned Claude for workplace use with guardrails it says reduce risky outputs. The company has been marketing Claude's coding and data-analysis skills to regulated industries and teams that want models to work across multiple software tools. Krieger said the company's focus is on sustained, reliable performance over long tasks rather than short demos.
[33]
Claude Agents 2.0 are INSANE : Non-Stop AI Coding Until Its Finished 30hrs+
What if the final barrier to seamless AI-driven coding wasn't a massive leap, but the elusive last 1%*? Imagine AI agents so advanced they could tackle tasks for over 30 uninterrupted hours, rewinding their steps with precision and adapting to your workflow as if they were an extension of your own mind. Bold claim? Perhaps. But with the release of Claude Sonnet 4.5 and Claude Code 2, we're witnessing a transformation in AI coding tools that feels less like an upgrade and more like a paradigm shift. These tools don't just promise efficiency, they deliver a new level of autonomy that redefines what's possible in automation and task management. In this exploration, AI Labs uncover how these next-gen AI agents from Anthropic are solving the challenges that have long plagued developers: from automating complex workflows to integrating seamlessly with platforms like GitHub and Slack. You'll discover how features like enhanced IDE extensions and rewind functionality are empowering coders to focus on creativity and strategy rather than repetitive tasks. But it's not just about coding, these tools are expanding into general-purpose automation, offering a glimpse into the future of AI-driven productivity. Could this be the tipping point where AI moves from assistant to indispensable partner? Let's explore the possibilities. The latest versions of Claude Sonnet and Claude Code bring a range of updates designed to improve functionality and user experience. These tools now feature AI agents capable of maintaining focus on tasks for extended periods, sometimes exceeding 30 hours. This advancement is particularly valuable for managing complex, long-duration tasks without interruptions, making sure consistent performance and reliability. Highlighted updates include: These improvements aim to make coding more efficient and accessible, empowering developers to focus on higher-priority tasks while minimizing repetitive manual effort. One of the most impactful uses of these tools is in automating workflows. By integrating Claude Code with platforms like GitHub Actions, you can automate repetitive tasks, enhance collaboration, and improve overall team efficiency. For example: These capabilities significantly reduce manual intervention, allowing teams to focus on strategic objectives. However, successful implementation requires careful configuration of tool permissions and thorough integration testing to ensure seamless operation. Cloud-based task management is another area where these updates excel. Claude Code enables developers to automate background tasks and implement features directly within GitHub repositories. For instance: To fully use these benefits, it is essential to provide detailed SDK documentation and rigorously test MCP servers. These steps help prevent errors and ensure reliable performance, maximizing the tools' potential. Here is a selection of other guides from our extensive library of content you may find of interest on AI coding tools. Despite their advanced capabilities, these tools are not without challenges. Common issues include: Overcoming these challenges requires a proactive approach. Regular updates, meticulous testing, and detailed documentation are critical to optimizing the functionality of these tools. By addressing these issues, developers can unlock the full potential of Claude Sonnet 4.5 and Claude Code 2, making sure smooth and reliable performance. The advancements in Claude Sonnet 4.5 and Claude Code 2 extend the capabilities of AI agents beyond traditional coding tasks. These tools now support general-purpose automation, allowing a wide range of applications, such as: By using these features, organizations can streamline operations, improve efficiency, and focus on higher-value activities. These tools empower users to create more adaptable and scalable processes, enhancing their ability to meet evolving demands in a dynamic technological environment. Claude Sonnet 4.5 and Claude Code 2 represent a significant step forward in the realm of AI-powered coding tools. Their ability to handle long-running tasks, enhanced IDE features, and expanded integration capabilities make them indispensable for developers and organizations seeking to optimize workflows and improve task management. While challenges remain, the solutions provided by these tools pave the way for a more versatile and reliable future in AI-driven automation. By addressing persistent issues and introducing innovative features, these updates set a new standard for what AI coding tools can achieve, offering practical benefits that extend far beyond the coding environment.
[34]
Anthropic launches Claude Sonnet 4.5, claims world's best coding model By Investing.com
Investing.com -- Anthropic has released Claude Sonnet 4.5, which the company describes as "the best coding model in the world" with enhanced capabilities for building complex agents and using computers. The new model shows substantial improvements in reasoning and math compared to previous versions, according to Anthropic. The company reports that Claude Sonnet 4.5 leads on the SWE-bench Verified evaluation, which measures real-world software coding abilities, and has achieved a 61.4% score on OSWorld, a benchmark for AI models performing real-world computer tasks. Alongside the model release, Anthropic has introduced several product upgrades. Claude Code now features checkpoints that allow users to save progress and roll back to previous states. The terminal interface has been refreshed, and a new VS Code extension brings Claude directly to the integrated development environment. For developers using the Claude API, Anthropic has added context editing to automatically clear stale context and a memory tool that stores information outside the context window, helping to manage long-running tasks without hitting context limits. The company has also made the Claude for Chrome extension available to users who joined the waitlist last month, and introduced the Claude Agent SDK, which provides developers with the infrastructure used to build Claude Code. Claude Sonnet 4.5 is available on the Claude Developer Platform, Amazon Bedrock, and Google Cloud's Vertex AI, with pricing remaining the same as Sonnet 4 at $3/$15 per million tokens. Anthropic claims this is their "most aligned frontier model yet," with improvements in reducing behaviors like sycophancy, deception, and power-seeking. The model is being released under Anthropic's AI Safety Level 3 protections, which include filters to detect potentially dangerous inputs and outputs. As a temporary feature, Anthropic is offering "Imagine with Claude," a five-day research preview available to Max subscribers that demonstrates the model generating software in real time.
[35]
Claude 4.5 Sonnet Fully Tested : From Coding to Complex Problem Solving
What if an AI could not only write code but also reason through complex problems, manage multi-step workflows for hours, and even design a functional game or simulate a solar system? Enter Claude Sonnet 4.5, the latest innovation from Anthropic that's reshaping the landscape of artificial intelligence. With its unparalleled 200K context window -- expandable to a staggering 1M in beta mode, this model isn't just an upgrade; it's a bold redefinition of what AI can achieve. Whether you're a developer tackling intricate software projects, a researcher analyzing massive datasets, or a designer crafting user-friendly interfaces, Claude 4.5 promises to be more than a tool, it's a partner in innovation. In this detailed testing of the new Anthropic AI model, World of AI uncover how Claude Sonnet 4.5 achieves its new performance benchmarks, outpacing competitors like GPT-5 and Gemini 2.5 Pro in reasoning, coding efficiency, and long-form content generation. From its multimodal input capabilities to its ability to sustain focus on tasks for up to 30 hours, this AI model is engineered for the demands of modern workflows. But it's not just about technical specs, Claude 4.5 has been rigorously tested in real-world applications, from creating SaaS platforms to designing intricate physics simulations. What makes it truly remarkable, though, is how it bridges the gap between raw computational power and practical, creative problem-solving. Claude Sonnet 4.5 builds upon the foundation of its predecessor, Sonnet 4, and outshines competitors such as Opus 4.1, GPT-5, and Gemini 2.5 Pro in critical benchmarks like Swaybench. These benchmarks evaluate essential capabilities, including reasoning, mathematical problem-solving, and coding efficiency. One of the standout features of Claude 4.5 is its exceptional reliability, maintaining focus on multi-step tasks for up to 30 hours. This level of consistency makes it a dependable choice for handling complex workflows that demand sustained attention and precision. Claude Sonnet 4.5 introduces a range of advanced features that enhance its versatility and adaptability: These features collectively position Claude 4.5 as a powerful tool for tackling a wide range of professional challenges, from data analysis to creative content generation. Claude Sonnet 4.5 has been rigorously tested in various scenarios, showcasing its versatility and problem-solving capabilities: These applications underline the model's ability to adapt to diverse industries, making it a valuable asset for professionals seeking innovative solutions to complex problems. Claude Sonnet 4.5 provides a robust set of tools and integrations designed to empower developers and creators: These tools provide developers with the resources they need to innovate and create, making Claude 4.5 an essential component of modern development ecosystems. Claude Sonnet 4.5 adopts a straightforward token-based pricing structure, making sure accessibility for a wide range of users. Input tokens are priced at $3 per 1M, while output tokens cost $15 per 1M. This transparent pricing model caters to individual developers, small businesses, and large enterprises alike, making it a cost-effective solution for diverse professional needs. While Claude Sonnet 4.5 offers impressive capabilities, there are areas where it can improve. Certain aspects of code generation and context handling still require refinement to achieve optimal performance. These limitations are expected to be addressed in future iterations, such as the anticipated Claude 5. Despite these challenges, the model's current performance establishes it as a leader in AI-driven coding and reasoning, setting the stage for continued innovation in the field. Claude Sonnet 4.5 stands out as a innovative tool for coding, reasoning, and multi-step task management. Its advanced features, reliable performance, and wide-ranging applications make it an indispensable resource for professionals across industries. While there is room for growth, its current capabilities solidify its position as a frontrunner in artificial intelligence, paving the way for future advancements that will continue to redefine what AI can achieve.
[36]
Claude Sonnet 4.5 Agentic Coding AI Released By Anthropic
What if the future of AI wasn't just about speed or raw power, but about unwavering focus and adaptability? Imagine an AI capable of managing intricate, multi-step tasks for hours, perhaps even days, without losing sight of the end goal. Enter Sonnet 4.5, the latest innovation from Entropic, which promises to redefine what it means to build and deploy agents. In a world where many AI systems falter under the weight of complexity or prolonged tasks, Sonnet 4.5 stands out as a precision-engineered solution for industries where reliability isn't just a preference, it's a necessity. From regulated sectors like healthcare and finance to innovative scientific research, this model is poised to set a new standard for agentic coding. In this overview, Prompt Engineering explains how Sonnet 4.5's new enhanced context awareness and dynamic memory management make it a fantastic option for long-duration, high-stakes applications. You'll discover why its ability to sustain focus over extended periods is a breakthrough for tasks that demand iterative problem-solving or compliance with strict regulations. We'll also dive into its innovative tools, such as the Cloud Agent SDK and a native VS Code extension, which amplify its utility for developers. Whether you're curious about its performance benchmarks or intrigued by its alignment with safety standards, this analysis will unpack why Sonnet 4.5 might just be the best agentic coding AI yet. After all, in a field as fast-moving as AI, it's not just about keeping up, it's about staying ahead. Sonnet 4.5 redefines agentic coding by maintaining unwavering focus on specific, multi-step tasks for up to 30 hours. Unlike many AI models that struggle to balance competing objectives, this system excels at task-specific execution, making sure consistent and accurate results over extended periods. A standout feature of Sonnet 4.5 is its improved context awareness, which ensures the model remains effective even during prolonged and complex tasks. By optimizing token usage and memory allocation, it minimizes the risk of task abandonment, a common challenge in long-duration operations. Key advancements include: These enhancements ensure that Sonnet 4.5 can adapt to evolving task requirements while maintaining its focus, making it an invaluable tool for industries that demand accuracy and adaptability. Discover other guides from our vast content that could be of interest on Agentic Coding. These benchmarks underscore its ability to deliver consistent, high-quality results in demanding environments, reinforcing its position as a reliable and efficient solution for enterprise needs. To complement the capabilities of Sonnet 4.5, Entropic has introduced two powerful tools designed to enhance its functionality and streamline workflows: Additionally, a native VS Code extension ensures a smooth and efficient development experience, further amplifying the model's utility for software engineers and technical teams. Sonnet 4.5 is Entropic's most aligned model to date, making it a reliable choice for industries such as finance, law, and healthcare, where compliance and safety are paramount. Its advanced AI safety measures minimize the risk of misaligned behavior, making sure adherence to industry standards and regulatory requirements. This alignment is particularly critical for enterprises operating in high-stakes environments, where even minor errors can have significant consequences. By prioritizing safety and compliance, Sonnet 4.5 provides a dependable solution for sensitive applications, offering peace of mind to organizations navigating complex regulatory landscapes. Despite its significant upgrades, Sonnet 4.5 retains the same pricing as its predecessor, making it a cost-effective option for businesses seeking innovative AI solutions. Its superior performance, alignment capabilities, and enterprise focus position it as a leader in the AI market. Whether you're developing software, managing multi-agent systems, or operating in regulated industries, Sonnet 4.5 is designed to meet your needs with precision, efficiency, and reliability. Its combination of advanced features and competitive pricing ensures that it delivers exceptional value for enterprises of all sizes.
[37]
Claude Sonnet 4.5 explained: Why Anthropic claims it's the world's best coding model
New Claude Sonnet 4.5 dominates SWE-bench, OSWorld coding benchmarks When Anthropic unveiled Claude Sonnet 4.5, it wasn't just another AI upgrade. The company called it their "most aligned frontier model" yet - one that doesn't just generate text, but thinks, reasons, and codes across long, complex tasks. With this release, Anthropic is making an audacious claim: that Claude Sonnet 4.5 is the best coding model in the world. Also read: Anthropic launches Claude Sonnet 4.5, claims it can build production-ready apps At the heart of Anthropic's case are numbers. On SWE-bench Verified, a leading benchmark for software problem-solving, Claude Sonnet 4.5 sets new highs, outperforming both its predecessor and rivals. On OSWorld, a test of real-world computer interactions and tool use, the model scores 61.4%, a sharp jump from Sonnet 4's 42.2% just months earlier. These metrics suggest Sonnet 4.5 is moving beyond code completion into something closer to genuine software engineering assistance, tackling debugging, multi-file reasoning, and complex tool orchestration. One of the biggest challenges in AI coding is keeping context coherent across long projects. Anthropic claims Sonnet 4.5 can sustain reasoning for 30+ hours on extended tasks, making it possible to manage sprawling projects instead of just snippets of code. That opens the door to AI handling entire development cycles: writing code, testing it, and revising it without losing the thread. For developers, this could mean a model that doesn't just help solve problems but can stick with them from start to finish. Anthropic is pairing Sonnet 4.5 with new features that make it feel less like a chatbot and more like a developer environment. In short, Claude isn't just suggesting code; it's becoming a collaborator inside the tools programmers already use. Also read: OpenAI plans to launch a social app for AI videos: Here's how it may work Another big piece is Anthropic's push into "agentic" AI models that don't just generate responses but act in the world. With Sonnet 4.5, Anthropic introduced a Claude Agent SDK, giving developers access to the infrastructure behind its tool-using behaviors. This means Claude can run commands, manipulate files, and carry out workflows that once required human oversight. For Anthropic, this agentic leap is what transforms Claude from assistant to co-worker. Of course, more capable AI also means more risk. Anthropic stresses that Sonnet 4.5 is released under AI Safety Level 3 (ASL-3) protections. New classifiers and filters are designed to stop dangerous misuse, especially around sensitive technical knowledge like cyber or biosecurity, while reducing false positives by an order of magnitude compared with earlier models. And if a conversation gets blocked by filters? Users can still fall back to Claude Sonnet 4, a safer but less capable sibling. Importantly, Anthropic hasn't raised prices. Sonnet 4.5 is available across its apps and API at the same rate as Sonnet 4: $3 per million input tokens, $15 per million output tokens. That pricing keeps it in line with top competitors like OpenAI and Google, even as Anthropic leans hard into its coding advantage. But rivals aren't standing still. OpenAI's GPT-5 is being pitched as a generalist model that performs at human levels across a range of jobs, while Google's Gemini 1.5 has been making strides in reasoning and multimodal tasks. Anthropic is betting that owning the coding niche with unmatched benchmarks and developer-friendly features will set Claude apart. In the end, Claude Sonnet 4.5 isn't just about coding speed. It's about sustained reasoning, workflow integration, and safe autonomy. Anthropic is positioning it as the model for builders, the one you choose when you want an AI that doesn't just autocomplete but engineers, debugs, and persists across entire projects. Whether it really is the "world's best coding model" will depend on how it performs outside controlled benchmarks. But for now, Anthropic has planted its flag: Claude isn't just a conversationalist, it's a coder.
[38]
Anthropic launches Claude Sonnet 4.5, claims it can build production-ready apps
The Claude Sonnet 4.5 is now available through the Claude API and chatbot. Anthropic has officially announced its latest AI model dubbed Claude Sonnet 4.5. As per the company, it will be positioned as a coding performance and reliability and can make more than just prototypes, claiming it can autonomously build "production-ready" applications -- a step up from its predecessors. The Claude Sonnet 4.5 is now available through the Claude API and chatbot, with the pricing remaining unchanged from Claude Sonnet 4 at $3 per million input tokens and $15 per million output tokens. With this model, the company aims to compete with OpenAI's GPT 5, which made headlines recently for surpassing Claude models in several coding benchmarks. Despite the competition, Anthropic claims its new release is the best in the business. The model reportedly performed well in SWE-Bench Verified and other evaluations. However, according to Anthropic researcher David Hershey, benchmarks only capture a portion of its capabilities. He stated that in early enterprise trials, Claude Sonnet 4.5 coded autonomously for up to 30 hours, managing not only application development but also database setup, domain purchases, and even a security compliance audit. Also read: Apple iOS 26.0.1 update rolling out with fixes for connectivity glitches and accessibility issues on iPhone: How to download The company also claims that the Claude Sonnet 4.5 is its most aligned model to date, with improved resistance to prompt injection attacks and decreased tendencies towards sycophancy or misleading responses. Also read: ChatGPT now lets you buy products directly through chat with Instant Checkout: How it works The Claude Agent SDK, which enables developers to create their own AI-powered agents using the same infrastructure as Claude Code, was announced along with the Claude Sonnet 4.5. Anthropic is also giving a sneak peek at "Imagine with Claude," a research tool available to Max subscribers that allows for real-time software creation without the need for prewritten code. Previously, the company unveiled the Claude Opus 4.1 in an attempt to boost competition in the AI industry.
Share
Share
Copy Link
Anthropic releases Claude Sonnet 4.5, claiming it to be the world's best coding model with improved capabilities in building complex agents and computer use. The model demonstrates unprecedented focus, maintaining coherence for over 30 hours on complex tasks.
Anthropic has released Claude Sonnet 4.5, its latest AI language model, claiming it to be the "most capable model to date" with significant improvements in coding and computer use capabilities
1
. This release marks a substantial leap forward in AI technology, particularly in the realms of autonomous coding and complex task management.Source: engadget
One of the most striking features of Claude Sonnet 4.5 is its ability to maintain focus on complex, multi-step tasks for extended periods. Anthropic reports that the model has worked continuously on the same project "for more than 30 hours"
1
. This level of sustained coherence is a significant improvement over previous models, which typically struggled with long-term task management.Source: Axios
Anthropic boasts that Claude Sonnet 4.5 is "the best coding model in the world"
1
. The model has achieved impressive scores on various benchmarks:1
These scores surpass those of competitors like OpenAI's GPT-5 Codex and Google's Gemini 2.5 Pro
1
.Source: VentureBeat
Alongside Claude Sonnet 4.5, Anthropic has introduced several new features and tools for developers:
1
1
3
3
Anthropic claims that Claude Sonnet 4.5 is their "most aligned frontier model" yet, with reduced instances of sycophancy, deception, and power-seeking behaviors
3
. The company also reports improved defenses against prompt injection attacks, enhancing the model's overall safety and reliability3
.Related Stories
Claude Sonnet 4.5 is now available through the Claude API and the Claude.ai chatbot. For developers, the pricing remains the same as Claude Sonnet 4: $3 per million input tokens and $15 per million output tokens
2
.The release of Claude Sonnet 4.5 intensifies the competition in the AI industry, particularly in the realms of coding and autonomous agents. As companies like Anthropic, OpenAI, and Google continue to push the boundaries of AI capabilities, we can expect to see further advancements in the near future
4
.Summarized by
Navi
04 Mar 2025•Technology
06 Jun 2025•Technology
23 May 2025•Technology
1
Technology
2
Business and Economy
3
Business and Economy