Curated by THEOUTPOST
On Wed, 26 Mar, 12:06 AM UTC
39 Sources
[1]
Gemini 2.5 Pro is here with bigger numbers and great vibes
Just a few months after releasing its first Gemini 2.0 AI models, Google is upgrading again. The company says the new Gemini 2.5 Pro Experimental is its "most intelligent" model yet, offering a massive context window, multimodality, and reasoning capabilities. Google points to a raft of benchmarks that show the new Gemini clobbering other large language models (LLMs), and our testing seems to back that up -- Gemini 2.5 Pro is one of the most impressive generative AI models we've seen. Gemini 2.5, like all Google's models going forward, has reasoning built in. The AI essentially fact-checks itself along the way to generating an output. We like to call this "simulated reasoning," as there's no evidence that this process is akin to human reasoning. However, it can go a long way to improving LLM outputs. Google specifically cites the model's "agentic" coding capabilities as a beneficiary of this process. Gemini 2.5 Pro Experimental can, for example, generate a full working video game from a single prompt. We've tested this, and it works with the publicly available version of the model. Google says a lot of things about Gemini 2.5 Pro; it's smarter, it's context-aware, it thinks -- but it's hard to quantify what constitutes improvement in generative AI bots. There are some clear technical upsides, though. Gemini 2.5 Pro comes with a 1 million token context window, which is common for the big Gemini models but massive compared to competing models like OpenAI GPT or Anthropic Claude. You could feed multiple very long books to Gemini 2.5 Pro in a single prompt, and the output maxes out at 64,000 tokens. That's the same as Flash 2.0, but it's still objectively a lot of tokens compared to other LLMs. Naturally, Google has run Gemini 2.5 Experimental through a battery of benchmarks, in which it scores a bit higher than other AI systems. For example, it squeaks past OpenAI's o3-mini in GPQA and AIME 2025, which measure how well the AI answers complex questions about science and math, respectively. It also set a new record in the Humanity's Last Exam benchmark, which consists of 3,000 questions curated by domain experts. Google's new AI managed a score of 18.8 percent to OpenAI's 14 percent.
[2]
Google unveils a next-gen AI reasoning model
On Tuesday, Google unveiled Gemini 2.5, a new family of AI reasoning models that pauses to "think" before answering a question. To kick off the new family of models, Google is launching Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that the company claims is its most intelligent model yet. This model will be available on Tuesday in the company's developer platform, Google AI Studio, as well as in the Gemini app for subscribers to the company's $20-a-month AI plan, Gemini Advanced. Moving forward, Google says all of its new AI models will have reasoning capabilities baked in. Since OpenAI launched the first AI reasoning model in September 2024, o1, the tech industry has raced to match or exceed that model's capabilities with their own. Today, Anthropic, DeepSeek, Google, and xAI all have AI reasoning models, which use extra computing power and time to fact-check and reason through problems before delivering an answer. Reasoning techniques have helped AI models achieve new heights in math and coding tasks. Many in the tech world believe reasoning models will be a key component of AI agents, autonomous systems that can perform tasks largely san human intervention. However, these models are also more expensive. Google claims that Gemini 2.5 Pro outperforms its previous frontier AI models, and some of the competing leading AI models, on several benchmarks. Specifically, Google says it designed Gemini 2.5 to excel at creating visually compelling web apps and agentic coding applications. On an evaluation measuring code editing, called Aider Polyglot, Google says Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek. However, on another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI's o3-mini and DeepSeek's R1, but underperforming Anthropic's Claude 3.7 Sonnet, which scored 70.3%. On Humanity's Last Exam, a multimodal test consisting of thousands of crowdsourced questions relating to mathematics, humanities, and the natural sciences, Google says Gemini 2.5 Pro scores 18.8%, performing better than most rival flagship models. To start, Google says Gemini 2.5 Pro is shipping with a 1 million token context window, which means the AI model can take in roughly 750,000 words in a single go. That's longer than the entire "Lord of The Rings" book series. And soon, Gemini 2.5 Pro will support double the input length (2 million tokens).
[3]
The hottest AI models, what they do, and how to use them | TechCrunch
AI models are being cranked out at a dizzying pace, by everyone from Big Tech companies like Google to startups like OpenAI and Anthropic. Keeping track of the latest ones can be overwhelming. Adding to the confusion is that AI models are often promoted based on industry benchmarks. But these technical metrics often reveal little about how real people and companies actually use them. To cut through the noise, TechCrunch has compiled an overview of the most advanced AI models released since 2024, with details on how to use them and what they're best for. We'll keep this list updated with the latest launches, too. There are literally over a million AI models out there: Hugging Face, for example, hosts over 1.4 million. So this list might miss some models that perform better, in one way or another. Gemini 2.5 Pro Experimental, a reasoning model, excels at building web apps and code agents according to Google. It underperforms on one popular coding benchmark compared to Claude Sonnet 3.7, however. The model requires a $20 monthly Gemini Advanced subscription. OpenAI has upgraded its existing GPT-4o model to generate images, not just text. The souped-up model soon went viral for transforming images into Studio Ghibli-style anime, despite obvious copyright concerns. Accessing GPT-4o requires, at minimum, a $20 per month ChatGPT Plus subscription. Image generation startup Stability AI has launched a model that the company says can generate 3D scenes and camera angles from a single 2D image. However, it still struggles with scenes featuring more complex elements like humans and moving water. The model is available for noncommercial research use on HuggingFace. Cohere released a multimodal model called Aya Vision that it claims is best in class at doing things like captioning images and answering questions about photos. It also excels in languages other than English, unlike other models, Cohere claims. It is available for free on WhatsApp. OpenAI calls Orion their largest model to date, touting its strong "world knowledge" and "emotional intelligence." However, it underperforms on certain benchmarks compared to newer reasoning models. Orion is available to subscribers of OpenAI's $200-per-month plan. Anthropic says this is the industry's first "hybrid" reasoning model, because it can both fire off quick answers and really think things through when needed. It also gives users control over how long the model can think for, per Anthropic. Sonnet 3.7 is available to all Claude users, but heavier users will need a $20-per-month Pro plan. Grok 3 is the latest flagship model from Elon Musk-founded startup xAI. It's claimed to outperform other leading models on math, science, and coding. The model requires X Premium (which is $50 per month.) After one study found Grok 2 leaned left, Musk pledged to shift Grok more "politically neutral" but it's not yet clear if that's been achieved. This is OpenAI's latest reasoning model and is optimized for STEM-related tasks like coding, math, and science. It's not OpenAI's most powerful model but because it's smaller, the company says it's significantly lower cost. It is available for free but requires a subscription for heavy users. OpenAI's Operator is meant to be a personal intern that can do things independently, like help you buy groceries. It requires a $200-per-month ChatGPT Pro subscription. AI agents hold a lot of promise, but they're still experimental: A Washington Post reviewer says Operator decided on its own to order a dozen eggs for $31, paid with the reviewer's credit card. Google Gemini's much-awaited flagship model says it excels at coding and understanding general knowledge. It also has a super-long context window of 2 million tokens, helping users who need to quickly process massive chunks of text. The service requires (at minimum) a Google One AI Premium subscription of $19.99 a month. This Chinese AI model took Silicon Valley by storm. DeepSeek's R1 performs well on coding and math, while its open source nature means anyone can run it locally. Plus, it's free. However, R1 integrates Chinese government censorship and faces rising bans for potentially sending user data back to China. Deep Research summarizes Google's search results in a simple and well-cited document. The service is helpful for students and anyone else who needs a quick research summary. However, its quality isn't nearly as good as an actual peer-reviewed paper. Deep Research requires a $19.99 Google One AI Premium subscription. This is the newest and most advanced version of Meta's open source Llama AI models. Meta has touted this version as its cheapest and most efficient yet, especially for math, general knowledge, and instruction following. It is free and open source. Sora is a model that creates realistic videos based on text. While it can generate entire scenes rather than just clips, OpenAI admits that it often generates "unrealistic physics." It's currently only available on paid versions of ChatGPT, starting with Plus, which is $20 a month. This model is one of the few to rival OpenAI's o1 on certain industry benchmarks, excelling in math and coding. Ironically for a "reasoning model," it has "room for improvement in common sense reasoning," Alibaba says. It also incorporates Chinese government censorship, TechCrunch testing shows. It's free and open source. Claude's Computer Use is meant to take control of your computer to complete tasks like coding or booking a plane ticket, making it a predecessor of OpenAI's Operator. Computer use, however, remains in beta. Pricing is via API: $0.80 per million tokens of input and $4 per million tokens of output. Elon Musk's AI company, xAI, has launched an enhanced version of its flagship Grok 2 chatbot it claims is "three times faster." Free users are limited to 10 questions every two hours on Grok, while subscribers to X's Premium and Premium+ plans enjoy higher usage limits. xAI also launched an image generator, Aurora, that produces highly photorealistic images, including some graphic or violent content. OpenAI's o1 family is meant to produce better answers by "thinking" through responses through a hidden reasoning feature. The model excels at coding, math, and safety, OpenAI claims, but has issues with trying to deceive humans, too. Using o1 requires subscribing to ChatGPT Plus, which is $20 a month. Claude Sonnet 3.5 is a model Anthropic claims as being best in class. It's become known for its coding capabilities and is considered a tech insider's chatbot of choice. The model can be accessed for free on Claude, although heavy users will need a $20 monthly Pro subscription. While it can understand images, it can't generate them. OpenAI has touted GPT 4o-mini as its most affordable and fastest model yet, thanks to its small size. It's meant to enable a broad range of tasks like powering customer service chatbots. The model is available on ChatGPT's free tier. It's better suited for high-volume simple tasks compared to more complex ones. Cohere's Command R+ model excels at complex retrieval-augmented generation (or RAG) applications for enterprises. That means it can find and cite specific pieces of information really well. (The inventor of RAG actually works at Cohere.) Still, RAG doesn't fully solve AI's hallucination problem.
[4]
Google Gemini 2.5 Is the Newest Model Set To Compete With DeepSeek R1
Google describes the new line of Gemini 2.5 models as "thinking models," ones that recursively analyzes their answers before giving users a final output. Per benchmarks on LMArena, Gemini 2.5 is leading in reasoning, science, math and agentic coding. It's not winning in all tests, however. For example, OpenAI o3-mini still leads it in LiveCodeBench v5. Gemini 2.5 is rolling out to paid Advanced users now. Some users on Reddit are reporting that they needed to delete and reinstall the Gemini app for 2.5 to show up. On desktop, Gemini 2.5 can be found on Google AI Studio. One advantage that Google's AI models have over the competition is the high token rate -- the ability to understand or produce complex sets of data. Google has always touted Gemini as the AI that can handle large context windows with high token output. On X, the social media platform formerly known as Twitter, people are also experimenting with Gemini 2.5's capabilities. Fei Xia, who's a staff researcher at Google DeepMind, was able to take a crude drawing of a three-tier cake and convert it up to a 3D print file. Google published a video showing Gemini 2.5 making a simple endless runner video game in seconds. Another user on X made a simple flight simulation video game. Google didn't immediately respond to a request for comment. The launch of Gemini 2.5 is the latest weapon thrown into the AI gladiator ring. The launch of DeepSeek R1 earlier this year from China put American AI companies on notice. DeepSeek released a free and open-source reasoning model that was more efficient than what was available from OpenAI. Google's also betting big on AI. The generative technology has infiltrated pretty much everything in the company's product portfolio, from Search to Docs. Google plans to invest $75 billion in AI development in 2025 alone. Considering the AI market is projected to grow to $1.8 trillion by 2030, according to Grand View Research, Google has strong financial incentive to dominate the space. In addition to Gemini 2.5, Google also introduced Gemini 2.0 Flash Thinking earlier this year, which aims to be a speedier reasoning model. Last month, Google released Gemini Code Assist, a free AI coding tool with very generous input token support.
[5]
Google releases 'most intelligent' experimental Gemini 2.5 Pro - here's how to try it
Gemini's latest model outperformed OpenAI's o3 mini and Anthropic's Claude 3.7 Sonnet on the latest benchmarks. Moments after DeepSeek released its latest model, another AI giant has already stolen back some of the limelight. On Tuesday, Google announced Gemini 2.5, its "most intelligent" model. The company announced that this initial release is an "experimental version of 2.5 Pro, which is state-of-the-art on a wide range of benchmarks and debuts at #1 on LMArena by a significant margin." Also: I tried ChatGPT's new Advanced Voice Mode update - here's what changed A family of thinking models, meaning they reason through their responses, the release follows Google's Gemini 2.0 Flash Thinking, which landed in December. Most notably, Gemini 2.5 Pro Experimental outperformed OpenAI's o3 mini and Anthropic's Claude 3.7 Sonnet on Humanity's Last Exam (HLE), a recently created benchmark designed to combat saturation, or the problem of industry tests becoming too easy for rapidly evolving models. HLE is, therefore, a relatively harder test to perform well on; Gemini 2.5 scored 18.8% compared to o3 mini's 14% (evaluated using text problems only, no images) and Claude 3.7 Sonnet's 8.9%. Already topping the Chatbot Arena leaderboard, the new model also outperformed competitors on common benchmarks for science, math, and coding, though usually by a smaller margin, which is now expected given the rate at which new models are accelerating. Google reported that Gemini 2.5 Pro Experimental shows improvements in reasoning, multimodal, and agentic capabilities, even from a "single line prompt." Google said Gemini 2.5 Pro is available today with a one million token context window for Gemini Advanced users via Google AI Studio and the Gemini app, and will be "coming to Vertex AI soon." The company added that it will release pricing information in the next few weeks.
[6]
Google says its new 'reasoning' Gemini AI models are the best ones yet
Richard Lawler is a senior editor following news across tech, culture, policy, and entertainment. He joined The Verge in 2021 after several years covering news at Engadget. After delivering a new "open" AI model with better performance on a single GPU, Google has now introduced an update to the AI models for its products with Gemini 2.5, which combines "a significantly enhanced base model with improved post-training" for better overall performance. It's claiming that the first release, Gemini 2.5 Pro experimental, leads competition from OpenAI, Anthropic, xAI, and DeepSeek on common AI benchmarks that measure understanding, mathematics, coding, and other capabilities. The new model is available to access in Google AI Studio or for Gemini Advanced subscribers in the app's model dropdown menu.
[7]
Google releases Gemini 2.5 AI model for complex thinking
Google has the pedal to the metal on its AI development. Just a few months after the debut of Gemini 2.0, the tech giant has unveiled another upgrade in Gemini 2.5. As with any new AI launch, Google is touting a strong performance on LMArena for Gemini 2.5, particularly its capabilities in coding, mathematics and science. The first model in this series is Gemini 2.5 Pro Experimental. Google said this is a thinking model that's intended to provide responses grounded in more reasoning, analysis and context than the answers offered by classification- and prediction-driven models. It's a different approach than Google took with the Gemini 2.0 series, which started off with the more efficient and less expensive Flash version. "With Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training," the company said in a blog post attributed to Koray Kavukcuoglu, CTO of Google DeepMind. "Going forward, we're building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents." Google had only just started rolling out Gemini 2.0 to its services, using it to power the newly added AI Mode in search and Deep Research for handling more complex queries. With today's launch, expect to hear more updates from the company about getting this latest version. Gemini 2.5 Pro Experimental is available now in Google AI Studio, and Gemini Advanced members can use it directly in the Gemini app.
[8]
Gemini 2.5: Meet Google's most intelligent reasoning model to date
The new AI model is currently available for developers and Gemini Advanced users. It's only been a few months since the debut of Gemini 2.0. But at the breakneck pace Google is moving at with AI development, December probably seems like ancient history. The company is already moving on to the AI model's next update -- Gemini 2.5. Today, Google announced Gemini 2.5, its latest and most intelligent AI model to date. The first release of the 2.5 generation will be Gemini 2.5 Pro, which is a thinking model (capable of reasoning through thoughts before responding) just like Gemini 2.0 Flash Thinking. Pro will come with a 1 million token context window, and Google says this will expand to 2 million soon. According to the tech giant, Gemini 2.5 outperforms some of the most popular AI models across a range of benchmarks, including reasoning, science, and mathematics. The company achieved this feat by "combining a significantly enhanced base model with improved post-training." It appears that enhancing coding performance was a major focus this time around. The company claims it made a big improvement compared to 2.0, allowing it to create "visually compelling web apps and agentic code applications." To show off the model's coding capabilities, Google shared this video of the AI generating a game based on a text prompt. Gemini 2.5 is launching as an experimental model that will be available to developers in AI Studio and Gemini Advanced users in the Gemini app. If you're a Gemini Advanced user, you'll be able to select Gemini 2.5 from the model dropdown on desktop and in mobile app. Google says Gemini 2.5 will also be available for Vertex AI in the coming weeks, but it did not offer an exact date.
[9]
Gemini 2.5 Pro, Google's 'most intelligent AI model,' is rolling out now
Summary Google has launched Gemini 2.5 Pro Experimental, its "most intelligent AI model" yet, which has already topped the LMArena leaderboard. A key feature of Gemini 2.5 Pro is its enhanced ability to "think and reason" before responding, leading to improved performance and accuracy in complex tasks. Currently available for Gemini Advanced users on the web (mobile support often lags behind), 2.5 Pro outperformed competitors in reasoning, knowledge, science, and math benchmarks, and has a 1 million-token context window. Google is constantly working on enhancing Gemini's capabilities. Gemini 2.0 Advanced landed on mobile earlier this year, followed by enhancements to the AI tool within Workspace apps. Subsequently, the tech giant expanded Gemini 2.0 Flash access to all, followed by the rollout of its former top-of-the-line model Gemini 2.0 Pro Experimental, with Gemini 2.0 Flash Thinking Experimental and Flash Thinking Experimental with apps in tow. You'd be wondering why I described 2.0 Pro as Gemini's former top-of-the-line model. This is because Google today began rolling out Gemini 2.5 Pro, which the tech giant describes as its "most intelligent AI model." Related Google Gemini: Everything you need to know about Google's next-gen multimodal AI Google Gemini is here, with a whole new approach to multimodal AI Posts From the new 2.5 family, Google has only begun rolling out Gemini 2.5 Pro Experimental, a model that is reportedly designed to tackle increasingly complex problems. It debuted at the #1 spot on the community-driven LMArena LLM leaderboard's 'overall' category, followed by the Grok 3 Beta. A key highlight of the new model is its ability to think and reason with its own thoughts before responding -- a quality that us as humans could benefit from. The tech giant suggests that this results in enhanced performance and improved accuracy, likely aiding in avoiding hallucinations."Going forward, we're building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents," Google indicated. Close The new model was compared with others in its league, including OpenAI's o3-mini and GPT-4.5, Claude's Sonnet 3.7, Grok 3 Beta, and DeepSeek R1. Gemini 2.5 Pro managed to outperform all mentioned models when it comes to reasoning and knowledge, science-related queries, mathematics, code editing, visual reasoning, long-context reasoning, and more. It, however, did lag behind some models when it comes to code generation, agentic coding, and even factuality. Notably, 2.5 Pro scored higher than all other comparable models in 'Humanity's Last Exam (no tools),' a language model academic benchmark meant to test human knowledge of a wide range of subjects. Humanity's Last Exam: a dataset designed by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. You can try out the new model now, but only if you have a Gemini Advanced subscription Gemini 2.5 Pro Experimental has begun rolling out now for Gemini Advanced users. The model is available to us on the web but not on the mobile app -- but that isn't surprising. Support on mobile often lags by a few weeks. The new model is currently limited to a 1 million token context window, with an upgrade to "2 million coming soon," according to the tech giant.
[10]
Google's Gemini 2.5 Model Family is Already Here
Jackery's New Curved Solar Roof Tiles Are What I've Been Waiting For Summary Gemini 2.5 Pro introduces Google's first full "chain-of-thought" model. The new model by Google reportedly excels in reasoning, coding, and problem-solving tasks. Gemini 2.5 is available now for experimental use by Gemini Advanced users. It hasn't been too long since Google released its Gemini 2.0 family of models, but the company is already moving ahead with what's next. Google has just announced the Gemini 2.5 family, starting with Gemini 2.5 Pro. It seems rushed, but we'll allow it. Google has just announced the introduction of Gemini 2.5, its newest generation of artificial intelligence models. The initial rollout features the experimental version of Gemini 2.5 Pro, which the company positions as a significant advancement in AI reasoning and coding capabilities compared to Gemini 2.0 and even compared to competing models. The big thing to note here is that Gemini 2.5 is Google's first full "chain-of-thought" model, which means that it performs multi-step thinking and checks its responses for accuracy before actually outputting them. Gemini 2.0 already supported this with the 2.0 Flash Thinking model (which is also experimental), but Gemini 2.5 is not available in a non-chain-of-thought version at all. It will take longer to respond to queries sometimes, but responses will be more accurate and, hopefully, we'll also have less hallucinations -- as it turns out, that's still a huge problem with AI, even with how advanced large language models have gotten. Related Google's Gemini 2.0 Models Are Arriving For Everyone Smarter and faster. Posts The generational gains Google is claiming here look pretty good. In areas requiring advanced reasoning, the company claims Gemini 2.5 Pro performs pretty well on benchmarks such as GPQA (Graduate-Level Google-Proof Q&A) and AIME 2025 (American Invitational Mathematics Examination problems). Furthermore, it reportedly scored 18.8% on Humanity's Last Exam, a challenging dataset designed by subject matter experts, when tested without external tool use. The model also debuted at the top position on the LMArena leaderboard, a platform that ranks AI models based on human preference evaluations, sitting above recently released models like OpenAI's GPT 4.5 or xAI's Grok 3. Google claims that Gemini 2.5 Pro performs great when it comes to generating web applications, agentic code (code designed to perform tasks autonomously), code transformation, and editing. On the SWE-Bench Verified benchmark, which evaluates agentic coding skills, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. To further flaunt its capabilities, the company even said that the model is capable of generating executable code for a video game from a single-line prompt. I tried exactly that last week when the new Canvas feature was released and it kind of sucked, so I'd need to try that out again with the new model to see if it's true. Gemini 2.0 was first released publicly in late January, so it hasn't even been two full months since that particular model family was released. As a fun note, Google has also completely scrubbed the experimental version of Gemini 2.0 Pro and replaced it with Gemini 2.5, so unless the stable version of that model is coming soon, we could technically say the short-lived Gemini 2.0 family didn't have a stable "advanced" model at all. Yes, we moved that quickly. With everyone wanting to claim the AI throne for themselves and competition ramping up, companies releasing models in rapid succession will likely become an increasingly common sight. The model is currently available in an experimental stage for Gemini Advanced users, so if you have a subscription, you can try it out from now. If you don't see it yet, it might take a few more days to pop up. We're not sure when we'll see this become stable, or when we might see a smaller Gemini 2.5 Flash model for free users. Source: Google
[11]
Gemini 2.5 Pro is Google's 'most intelligent AI model' with thinking built-in
Following updates to the models that are available for all users, Google today announced Gemini 2.5 Pro (experimental) for Advanced subscribers and developers. Like before, Google is doing another mid-year/model update. Notably, all models in the Gemini 2.5 family, including futures ones, are "thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy." Google is "building these thinking capabilities directly into all of [its] models" to allow them to "handle more complex problems and support even more capable, context-aware agents." Compared to 2.0 Flash Thinking, which was first revealed in December and got an update this month, Google is no longer explicitly attaching the "Thinking" label. Users can "Show thinking" in the Gemini app to see the train of thought. In the field of AI, a system's capacity for "reasoning" refers to more than just classification and prediction. It refers to its ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions. Gemini 2.5 features a "new level of performance by combining a significantly enhanced base model with improved post-training." Gemini 2.5 Pro (gemini-2.5-pro-exp-03-25) is the first model in this family. Aimed at complex tasks, Google notes how it "tops the LMArena leaderboard -- which measures human preferences -- by a significant margin." It also leads on math (AIME 2025), and science (GPQA diamond) benchmarks "without test-time techniques that increase cost, like majority voting." It also scores a state-of-the-art 18.8% across models without tool use on Humanity's Last Exam, a dataset designed by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. There's also a focus on advanced coding with a "big leap over 2.0" and "more improvements to come." In addition to native multimodality, Gemini 2.5 Pro has a 1 million token context window with 2 million coming soon. It can comprehend vast datasets and handle complex problems from different information sources, including text, audio, images, video and even entire code repositories. Gemini 2.5 Pro (experimental) is rolling out first to Gemini Advanced and Google AI Studio, with Vertex AI following in the coming weeks. We'll also introduce pricing in the coming weeks, enabling people to use 2.5 Pro with higher rate limits for scaled production use. In the Gemini app, it replaces last month's 2.0 Pro (experimental) with access to apps (@Gmail, @YouTube, etc.) and file uploads.
[12]
OpenAI just unveiled enhanced image generator within ChatGPT-4o -- here's what you can do now
Google has unveiled Gemini 2.5, the tech giant's most advanced AI model to date. Capable of enhanced reasoning, coding proficiency and multimodal functionalities, the latest model is said to be able to analyze complex information, incorporate contextual nuances and draw logical conclusions with unprecedented accuracy. According to Google's official blog, the model's latest improvements are achieved by combining a significantly enhanced base model with improved post-training techniques. Gemini 2.5 reportedly leads in math and science benchmarks, scoring 18.8% on Humanity's Last Exam, a dataset designed to assess AI's ability to handle complex knowledge-based questions. For comparison, OpenAI's deep research model can complete 26% of Humanity's Last Exam. In the realm of coding, Gemini 2.5 is said to demonstrate remarkable proficiency. This is good news for the average user or non-developers. Because the model excels at creating visually compelling web applications and agentic code applications, as well as code transformation and editing, users don't need advanced skills themselves. For instance, on SWE-Bench Verified, a human-validated subset of SWE-bench that more reliably evaluates AI models' ability to solve real-world software issues. the industry standard for agentic code evaluations, Gemini 2.5 Pro scores 63.8% with a custom agent setup. As of January 2025, no model had yet crossed 50% completion on SWE-bench Verified, though the updated Claude 3.5 Sonnet is at 49%. Gemini 2.5 is designed to comprehend vast amounts of data and handle complex problems across various information sources, including text, audio, images, video, and even code repositories. The model features native multimodality and supports a context window of up to 1 million tokens, with Google planning to extend this to 2 million tokens in the near future, though an exact timeline has not been disclosed. Tokens and context windows are two concepts that are essential to understand when it comes to how AI processes and generates language. So, what is a token? A token is the smallest unit of data that an AI model processes. Depending on the model's design, a token can represent something as simple as an individual word or single character. It could also be a segement of a word or punctuation marks. For example, the sentence "The cat jumped over the fence and disappeared quickly." is tokenized into 12 tokens. This breakdown allows the AI to analyze and generate text effectively. A context window refers to the amount of information an AI model can process at one time. You can think of it as the model's short-term memory, encompassing the sequence of tokens the AI considers when generating a response. The size of the context window determines how much prior information the model can utilize to produce coherent and contextually relevant outputs. For instance, using the earlier sentence: "The cat jumped over the fence and disappeared quickly." If an AI model has a context window limited to 5 tokens, it would only process the last part of the input. Therefore, if you were to ask the "Who jumoped over the fence and disappeared quickly?" the model might not correctly identify "The cat" as the subject because it lacks access to the initial portion of the sentence. If Google increases the context window of Gemini 2.5 to 2 million tokens. This expansive capacity enables the model to consider and retain a vast amount of information when generating responses. Essentially, the larger the context window, the greater abilitiy for the model to process extensive prompts, resulting in outputs that are more consistent, relevant, and useful. For comparison, the combined word count of the "Lord of the Rings" trilogy is around 500,000 words. This means you could provide the entire trilogy -- as context to Gemini 2.5 Pro and that would only be just 1 million tokens. The Gemini 2.5 Pro Experimental model is now accessible in Google AI Studio and within the Gemini app for Gemini Advanced subscribers. The release of Gemini 2.5 Pro Experimental gives subscribers paying $20 per month broader usage with higher rate limits for production-scale applications.
[13]
Google releases 'most intelligent model to date,' Gemini 2.5 Pro
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Just a few months after releasing Gemini 2.0 and the rise of DeepSeek, Google announced its "most intelligent model" yet, Gemini 2.5, capable of reasoning and with better performance and accuracy. Gemini 2.5 comes three months after Google released its previously most intelligent model family, Gemini 2.0 which introduced reasoning and agentic use cases. This new model is available as Gemini 2.5 Pro (experimental) on Google's AI Studio and for Gemini Advanced users on the Gemini chat interface. It will be available on Vertex AI soon. Koray Kavukcuoglu, CTO at Google DeepMind, said in a blog post that Gemini 2.5 represents the next step in Google's goal of making "AI smarter and more capable of reasoning." "Now, with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training," Kavukcuoglu wrote. "Going forward, we're building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents." More context and comprehension Like Gemini 2.0 and Gemini 2.0 Flash Thinking, Gemini 2.5 Pro "thinks" before it responds. The new model can handle multimodal input from text, audio, images, videos and large datasets. Gemini 2.5 Pro can also understand entire code repositories for coding projects. Gemini 2.5 Pro offers some of the largest context windows available for experimental models on Gemini. It ships with a 1 million token context window but will expand to 2 million tokens soon. Google AI Studio product manager Logan Kilpatrick posted on X that Gemini 2.5 Pro is "the first experimental model with higher rate limits + billing." Google plans to release pricing for Gemini 2.5 models soon. Enhanced coding and reasoning performance Google said the model leads in advanced reasoning benchmark tests. The company said Gemini 2.5 Pro "leads in match and science benchmarks like GPQA and AIME 2025." Kavukcuoglu said the model also scored "a state-of-the-art 18.8% across models without tool use on Humanity's Last Exam," a dataset aiming to capture human knowledge and reasoning. Gemini 2.5 Pro also performs strongly on coding tasks and scored better than Gemini 2.0 in specific benchmarks. Google noted the new model "excels at creating visually compelling web apps and agentic code applications, along with code transformation and editing." A more competitive market Gemini 2.5 Pro enters the reasoning model fray in a significantly changed environment than Gemini 2.0 did in December. The release of DeepSeek's reasoning large language model (LLM) DeepSeek-R1 showed that powerful models can perform well at a fraction of the training and compute cost. Furthermore, DeepSeek showed that open-source models can compete with more closed-source LLMs, such as OpenAI's o1 and o3 models. Besides DeepSeek's ever-expanding model offerings, Google has to compete with OpenAI's reasoning models. While the newest model from OpenAI was GPT-4.5 -- not a reasoning model -- the company is still expected to develop more reasoning models soon. Gemini 2.5 is Google's second new model this month. In March, the company released the latest version of its small language model, Gemma 3, which offered a 128,000 token context model and was best for use in on-the-go devices.
[14]
Gemini 2.5 is now available for Advanced users and it seriously improves Google's AI reasoning
Google just announced Gemini 2.5, and it's a major upgrade to Gemini that the company is calling its "most intelligent AI model" yet. Announced on the company's blog, Google revealed the experimental version of 2.5 Pro, which is available today for all Gemini Advanced subscribers. More 2.5 models will arrive in the future. Google's Gemini 2.5 models are a new generation of thinking models that are able to reach "a new level of performance by combining a significantly enhanced base model with improved post-training." 2.5's thinking capabilities will be implemented into all future Google AI models, which the company says will allow them to "handle more complex problems and support even more capable, context-aware agents." So what does this mean? Well, Google is doubling down on its impressively frequent AI updates, and this time we're getting better reasoning capabilities than ever before. Available right now, you can access Gemini 2.5 Pro Experimental simply by selecting the model in the Gemini app or directly in Google's AI Studio. You'll need a Gemini Advanced subscription to see this as an option. Pricing for the improved model (for those who want to use it for scaled production use) will be announced in the coming weeks, and more 2.5 models are expected to launch in due course. Google shared some benchmark results for Gemini 2.5 Pro Experimental and the results are seriously impressive. The new AI model scores 18.8% on Humanity's Last Exam compared to 14% for ChatGPT's o3-mini and 8.6% for DeepSeek R1. Humanity's Last Exam is the most thorough and difficult AI benchmark, so to score substantially higher than its competitors is no mean feat. 18.8% is the highest score we've ever seen on Humanity's Last Exam (without tool use). Google is calling Gemini 2.5 Pro's reasoning capabilities "state-of-the-art" and it's clear to see why. Google continues to drive forward with its AI development at a rapid pace. Just last week the company made Gemini Deep Research free for all and followed that up with improvements to its impressive AI podcasting tool, NotebookLM. We'll be testing Gemini 2.5 Pro and putting the new Experimental model through its paces, so stay tuned to TechRadar for further Google AI coverage.
[15]
Hands on with Gemini 2.5 Pro: why it might be the most useful reasoning model yet
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Unfortunately for Google, the release of its latest flagship language model, Gemini 2.5 Pro, got buried under the Studio Ghibli AI image storm that sucked the air out of the AI space. And perhaps fearful of its previous failed launches, Google cautiously presented it as "Our most intelligent AI model" instead of the approach of other AI labs, which introduce their new models as the best in the world. However, practical experiments with real-world examples show that Gemini 2.5 Pro is really impressive and might currently be the best reasoning model. This opens the way for many new applications and possibly puts Google at the forefront of the generative AI race. Long context with good coding capabilities The outstanding feature of Gemini 2.5 Pro is its very long context window and output length. The model can process up to 1 million tokens (with 2 million coming soon), making it possible to fit multiple long documents and entire code repositories into the prompt when necessary. The model also has an output limit of 64,000 tokens instead of around 8,000 for other Gemini models. The long context window also allows for extended conversations, as each interaction with a reasoning model can generate tens of thousands of tokens, especially if it involves code, images and video (I've run into this issue with Claude 3.7 Sonnet, which has a 200,000-token context window). For example, software engineer Simon Willison used Gemini 2.5 Pro to create a new feature for his website. Willison said in a blog, "It crunched through my entire codebase and figured out all of the places I needed to change -- 18 files in total, as you can see in the resulting PR. The whole project took about 45 minutes from start to finish -- averaging less than three minutes per file I had to modify. I've thrown a whole bunch of other coding challenges at it, and the bottleneck on evaluating them has become my own mental capacity to review the resulting code!" Impressive multimodal reasoning Gemini 2.5 Pro also has impressive reasoning abilities over unstructured text, images and video. For example, I provided it with the text of my recent article about sampling-based search and prompted it to create an SVG graphic that depicts the algorithm described in the text. Gemini 2.5 Pro correctly extracted key information from the article and created a flowchart for the sampling and search process, even getting the conditional steps correctly. (For reference, the same task took multiple interactions with Claude 3.7 Sonnet and I eventually maxed out the token limit.) The rendered image had some visual errors (arrowheads are misplaced). It could use a facelift, so I next tested Gemini 2.5 Pro with a multi-modal prompt, giving it a screenshot of the rendered SVG file along with the code and prompting it to improve it. The results were impressive. It corrected the arrowheads and improved the visual quality of the diagram. Other users have had similar experiences with multimodal prompts. For example, in their tests, DataCamp replicated the runner game example presented in the Google Blog, then provided the code and a video recording of the game to Gemini 2.5 Pro and prompted it to make some changes to the game's code. The model could reason over the visuals, find the part of the code that needed to be changed, and make the correct modifications. It is worth noting, however, that like other generative models, Gemini 2.5 Pro is prone to making mistakes such as modifying unrelated files and code segments. The more precise your instructions are, the lower the risk of the model making incorrect changes. Data analysis with useful reasoning trace Finally, I tested Gemini 2.5 Pro on my classic messy data analysis test for reasoning models. I provided it with a file containing a mix of plain text and raw HTML data I had copied and pasted from different stock history pages in Yahoo! Finance. Then I prompted it to calculate the value of a portfolio that would invest $140 at the beginning of each month, spread evenly across the Magnificent 7 stocks, from January 2024 to the latest date in the file. The model correctly identified which stocks it had to pick from the file (Amazon, Apple, Nvidia, Microsoft, Tesla, Alphabet and Meta), extracted the financial information from the HTML data, and calculated the value of each investment based on the price of the stocks at the beginning of each month. It responded to a well-formatted table with stock and portfolio value at each month and provided a breakdown of how much the entire investment was worth at the end of the period. More importantly, I found the reasoning trace to be very useful. It is not clear whether Google reveals the raw chain-of-thought (CoT) tokens for Gemini 2.5 Pro, but the reasoning trace is very detailed. You can clearly see how the model is reasoning over the data, extracting different bits of information, and calculating the results before generating the answer. This can help troubleshoot the model's behavior and steer it in the right direction when it makes mistakes. Enterprise-grade reasoning? One concern about Gemini 2.5 Pro is that it is only available in reasoning mode, which means the model always goes through the "thinking" process even for very simple prompts that can be answered directly. Gemini 2.5 Pro is currently in preview release. Once the full model is released and pricing information is available, we will have a better understanding of how much it will cost to build enterprise applications over the model. However, as inference costs continue to fall, we can expect it to become practical at scale. Gemini 2.5 Pro might not have had the splashiest debut, but its capabilities demand attention. Its massive context window, impressive multimodal reasoning and detailed reasoning chain offer tangible advantages for complex enterprise workloads, from codebase refactoring to nuanced data analysis.
[16]
Google's Gemini 2.5 Pro is the smartest model you're not using - and 4 reasons it matters for enterprise AI
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The release of Gemini 2.5 Pro on Tuesday didn't exactly dominate the news cycle. It landed the same week OpenAI's image-generation update lit up social media with Studio Ghibli-inspired avatars and jaw-dropping instant renders. But while the buzz went to OpenAI, Google may have quietly dropped the most enterprise-ready reasoning model to date. Gemini 2.5 Pro marks a significant leap forward for Google in the foundational model race - not just in benchmarks, but in usability. Based on early experiments, benchmark data, and hands-on developer reactions, it's a model worth serious attention from enterprise technical decision-makers, particularly those who've historically defaulted to OpenAI or Claude for production-grade reasoning. Here are four major takeaways for enterprise teams evaluating Gemini 2.5 Pro. 1. Transparent, structured reasoning - a new bar for chain-of-thought clarity What sets Gemini 2.5 Pro apart isn't just its intelligence - it's how clearly that intelligence shows its work. Google's step-by-step training approach results in a structured chain of thought (CoT) that doesn't feel like rambling or guesswork, like what we've seen from models like DeepSeek. And these CoTs aren't truncated into shallow summaries like what you see in OpenAI's models. The new Gemini model presents ideas in numbered steps, with sub-bullets and internal logic that's remarkably coherent and transparent. In practical terms, this is a breakthrough for trust and steerability. Enterprise users evaluating output for critical tasks - like reviewing policy implications, coding logic, or summarizing complex research - can now see how the model arrived at an answer. That means they can validate, correct, or redirect it with more confidence. It's a major evolution from the "black box" feel that still plagues many LLM outputs. For a deeper walkthrough of how this works in action, check out the video breakdown where we test Gemini 2.5 Pro live. One example we discuss: When asked about the limitations of large language models, Gemini 2.5 Pro showed remarkable awareness. It recited common weaknesses, and categorized them into areas like "physical intuition," "novel concept synthesis," "long-range planning," and "ethical nuances," providing a framework that helps users understand what the model knows and how it's approaching the problem. Enterprise technical teams can leverage this capability to: One limitation worth noting: While this structured reasoning is available in the Gemini app and Google AI Studio, it's not yet accessible via the API - a shortcoming for developers looking to integrate this capability into enterprise applications. 2. A real contender for state-of-the-art - not just on paper The model is currently sitting at the top of the Chatbot Arena leaderboard by a notable margin - 35 Elo points ahead of the next-best model - which notably is the OpenAI 4o update that dropped the day after Gemini 2.5 Pro dropped. And while benchmark supremacy is often a fleeting crown (as new models drop weekly), Gemini 2.5 Pro feels genuinely different. It excels in tasks that reward deep reasoning: coding, nuanced problem-solving, synthesis across documents, even abstract planning. In internal testing, it's performed especially well on previously hard-to-crack benchmarks like the "Humanity's Last Exam," a favorite for exposing LLM weaknesses in abstract and nuanced domains. (You can see Google's announcement here, along with all of the benchmark information.) Enterprise teams might not care which model wins which academic leaderboard. But they'll care that this one can think - and show you how it's thinking. The vibe test matters, and for once, it's Google's turn to feel like they've passed it. As respected AI engineer Nathan Lambert noted, "Google has the best models again, as they should have started this whole AI bloom. The strategic error has been righted." Enterprise users should view this not just as Google catching up to competitors, but potentially leapfrogging them in capabilities that matter for business applications. 3. Finally: Google's coding game is strong Historically, Google has lagged behind OpenAI and Anthropic when it comes to developer-focused coding assistance. Gemini 2.5 Pro changes that - in a big way. In hands-on tests, it's shown strong one-shot capability on coding challenges, including building a working Tetris game that ran on first try when exported to Replit - no debugging needed. Even more notable: it reasoned through the code structure with clarity, labeling variables and steps thoughtfully, and laying out its approach before writing a single line of code. The model rivals Anthropic's Claude 3.7 Sonnet, which has been considered the leader in code generation, and a major reason for Anthropic's success in the enterprise. But Gemini 2.5 offers a critical advantage: a massive 1-million token context window. Claude 3.7 Sonnet is only now getting around to offering 500,000 tokens. This massive context window opens new possibilities for reasoning across entire codebases, reading documentation inline, and working across multiple interdependent files. Software engineer Simon Willison's experience illustrates this advantage. When using Gemini 2.5 Pro to implement a new feature across his codebase, the model identified necessary changes across 18 different files and completed the entire project in approximately 45 minutes - averaging less than three minutes per modified file. For enterprises experimenting with agent frameworks or AI-assisted development environments, this is a serious tool. 4. Multimodal integration with agent-like behavior While some models like OpenAI's latest 4o may show more dazzle with flashy image generation, Gemini 2.5 Pro feels like it is quietly redefining what grounded, multimodal reasoning looks like. In one example, Ben Dickson's hands-on testing for VentureBeat demonstrated the model's ability to extract key information from a technical article about search algorithms and create a corresponding SVG flowchart - then later improve that flowchart when shown a rendered version with visual errors. This level of multimodal reasoning enables new workflows that weren't previously possible with text-only models. In another example, developer Sam Witteveen uploaded a simple screenshot of a Las Vegas map and asked what Google events were happening nearby on April 9 (see minute 16:35 of this video). The model identified the location, inferred the user's intent, searched online (with grounding enabled), and returned accurate details about Google Cloud Next - including dates, location, and citations. All without a custom agent framework, just the core model and integrated search. The model actually reasons over this multimodal input, beyond just looking at it. And it hints at what enterprise workflows could look like in six months: uploading documents, diagrams, dashboards - and having the model do meaningful synthesis, planning, or action based on the content. Bonus: It's just... useful While not a separate takeaway, it's worth noting: This is the first Gemini release that's pulled Google out of the LLM "backwater" for many of us. Prior versions never quite made it into daily use, as models like OpenAI or Claude set the agenda. Gemini 2.5 Pro feels different. The reasoning quality, long-context utility, and practical UX touches - like Replit export and Studio access - make it a model that's hard to ignore. Still, it's early days. The model isn't yet in Google Cloud's Vertex AI, though Google has said that's coming soon. Some latency questions remain, especially with the deeper reasoning process (with so many thought tokens being processed, what does that mean for the time to first token?), and prices haven't been disclosed. Another caveat from my observations about its writing ability: OpenAI and Claude still feel like they have an edge on producing nicely readable prose. Gemini. 2.5 feels very structured, and lacks a little of the conversational smoothness that the others offer. This is something I've noticed OpenAI in particular spending a lot of focus on lately. But for enterprises balancing performance, transparency, and scale, Gemini 2.5 Pro may have just made Google a serious contender again. As Zoom CTO Xuedong Huang put it in conversation with me yesterday: Google remains firmly in the mix when it comes to LLMs in production. Gemini 2.5 Pro just gave us a reason to believe that might be more true tomorrow than it was yesterday. Watch the full video of the enterprise ramifications here:
[17]
Google launches Gemini 2.5 Pro, its 'most intelligent AI model' yet
In a blog post today, Google announced Gemini 2.5 Pro (experimental) for developers and Advanced subscribers, aiming to help you tackle increasingly complex problems. It's the first in the family and set up to "think" before it speaks. Google says it'll be available today in Google AI Studio (its developer platform) and for Advanced subscribers, with Vertex AI support coming soon. Google also claims to outperform the competition, and that Gemini 2.5 Pro takes the number one spot on the LMArena leaderboard with 18.8%, surpassing other AI models such as ChatGPT and Deepseek. Recommended Videos "Gemini 2.5 Pro is state-of-the-art across a range of benchmarks requiring advanced reasoning. Without test-time techniques that increase cost, like majority voting, 2.5 Pro leads in math and science benchmarks like GPQA and AIME 2025. It also scores a state-of-the-art 18.8% across models without tool use on Humanity's Last Exam, a dataset designed by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning." If you're already subscribed to Gemini Advanced, you can use Gemini 2.5 Pro at no extra cost, but if you need to use it at a higher scale, Google will release pricing in the coming weeks for the higher usage. With 2.5 Pro, you get a one-million context window, but a two-million window is coming soon. Google also says Gemini 2.5 Pro can handle difficult problems from sources like audio, text, images, entire code repositories, and video. Google has kept the AI ball rolling since it launched the "thinking" version of the AI model in December. Recently, Gemini has been available on Google Maps, allowing you to talk to Maps. Google has also announced that Gemini will replace Google Assistant in the coming months.
[18]
Google's Gemini 2.5, Alibaba's new Qwen, and upgraded DeepSeek V3: This week's AI launches
Google (GOOGL) announced Gemini 2.5 this week, which it called its "most intelligent AI model." Gemini 2.5 Pro Experimental is the first release of Gemini 2.5, and is Google's "most advanced model for complex tasks." The Gemini 2.5 models are "thinking models" that can reason through inquiries before responding. Following its first "thinking" model, Gemini 2.0 Flash Thinking, Gemini 2.5 combines an "enhanced base model with improved post-training," which includes techniques such as reinforcement learning and chain-of-thought prompting. Gemini 2.5 Pro Experimental is at the top of the crowdsourced Chatbot Arena rankings, and demonstrates strong reasoning and coding capabilities, Google said. The model is available to Gemini Advanced users in the app.
[19]
Gemini 2.5: Our most intelligent AI model
Today we're introducing Gemini 2.5, our most intelligent AI model. Our first 2.5 release is an experimental version of 2.5 Pro, which is state-of-the-art on a wide range of benchmarks and debuts at #1 on LMArena by a significant margin. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. In the field of AI, a system's capacity for "reasoning" refers to more than just classification and prediction. It refers to its ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions. For a long time, we've explored ways of making AI smarter and more capable of reasoning through techniques like reinforcement learning and chain-of-thought prompting. Building on this, we recently introduced our first thinking model, Gemini 2.0 Flash Thinking. Now, with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training. Going forward, we're building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents.
[20]
Google introduces Gemini 2.5 Pro with chain-of-thought reasoning built-in - SiliconANGLE
Google introduces Gemini 2.5 Pro with chain-of-thought reasoning built-in Google LLC said today it's updating its flagship Gemini artificial intelligence model family by introducing an experimental Gemini 2.5 Pro version. The company added it is the "most intelligent" yet and will include "thinking" capabilities built-in. All upcoming Gemini 2.5 models will be thinking models, capable of breaking down tasks into multiple steps and reasoning through them before responding. The company said this will result in enhanced performance and improved accuracy. "In the field of AI, a system's capacity for 'reasoning' refers to more than just classification and prediction," Koray Kavukcuoglu, chief technology officer of Google DeepMind, the company's research arm, explained in the announcement. "It refers to its ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions." This thinking capability was first introduced by Google in its Gemini 2.0 Flash Thinking Experimental AI model, which was released in December. To create the model, the company explored AI-building practices including reinforcement learning and chain-of-thought prompting. In the case of Gemini 2.0 Flash Thinking, users can activate the thinking capability by clicking a button when prompting the model and it would then "think" through tasks. It also shows its reasoning, allowing the user to see the process and the chain of thought that it took to reach its conclusion. Google is no longer adding the "Thinking" label to its models. The company said with the new reasoning capability, Gemini 2.5 Pro Experimental has achieved a new level of performance above the base model due to post-training. It is the most advanced model for complex tasks and topped the LMArena leaderboard - which measures human preferences - by a significant margin. It also led with an 18.8% in Humanity's Last Exam, a dataset designed by hundreds of subject matter experts about human knowledge and reasoning, compared to 14% for OpenAI's o3-mini and 8.6% DeepSeek R1. For context, o3-mini and R1 are both thinking models capable of complex reasoning in the same manner that Google has designed Gemini 2.5 Pro Experimental. "We've been focused on coding performance, and with Gemini 2.5 we've achieved a big leap over 2.0 -- with more improvements to come," said Kavukcuoglu. To demonstrate the model's new capabilities, Google researchers prompted it to generate an endless-runner-style dinosaur video game using HTML, CSS and JavaScript using a single prompt and it successfully did so in one pass. The experimental Gemini 2.5 Pro model comes with a context window of 1 million tokens, which allows it to ingest extremely large documents, audio and videos, which is around 1.5 million words. Google said it intends to expand the window to 2 million. With its large context window and high performance, Gemini 2.5 Pro provides a powerful foundation for AI agents. This enables them to process vast datasets and tackle complex problems more effectively. Because AI agents operate and plan autonomously, the model's enhanced reasoning capability will significantly improve their ability to understand data and utilize tools to complete tasks. Developers and enterprise users can start experimenting with Gemini 2.5 Pro in Google AI Studio now, and Gemini Advanced users can select it immediately from the dropdown on desktop and mobile. Users of Vertex AI, Google's managed machine learning platform for building and deploying AI, will be able to experiment with the new model in the coming weeks. In addition to the experimental Gemini 2.5 Pro, Google also announced TxGemma, a collection of open AI models designed to improve the efficiency of drug and therapy development using large language models. The new models build on Gemma, Google DeepMind's existing lightweight open-source models, specifically trained to understand and predict the properties of drugs and gene therapies throughout the entire process of discovery. This includes identifying promising entries and predicting clinical trial outcomes. Google trained TxGemma's family of models from Gemma 2 using 7 million training examples. The models come in three sizes including 2 billion, 9 billion and 27 billion parameters. Each size includes a "predict" version, tailored for narrow tasks drawn from the Therapeutic Data Commons. Examples of these specific tasks include classifying drugs for capability such as crossing the blood-brain barrier, regression for predicting a drug's binding capability or generating other types of drugs based on a particular reaction. TxGemma 9B and 27B also include "chat" versions. These models explain their reasoning, answer questions and engage in conversation. As a result, researchers could ask TxGemma-Chat why it predicted a particular molecule might be toxic and delve into the molecule's structure. Just like every other model that Google builds, TxGemma is designed for integration into advanced agentic AI systems and includes tool use to tackle more complex research problems. "Standard language models often struggle with tasks requiring up-to-date external knowledge or multi-step reasoning," Shekoofeh Azizi, a staff research scientist at Google. "To address this, we've developed Agentic-Tx, a therapeutics-focused agentic system powered by Gemini 2.0 Pro." Agentic-Tx is equipped with 18 tools that include TxGemma for multi-step reasoning; general search tools from PubMed, Wikipedia and the web; specific molecular tools; and gene and protein tools. This AI agent tool can be used to orchestrate therapeutic research design work and answer multi-step research questions for scientists and clinicians.
[21]
Google Unveils Gemini 2.5, Crushes OpenAI GPT-4.5, DeepSeek R1, & Claude 3.7 Sonnet
The reasoning model is capable of comprehending text, audio, images, video, and full code repositories. Google has announced Gemini 2.5, its latest AI model that can handle complex reasoning and coding tasks. The release includes Gemini 2.5 Pro Experimental, which ranks first on the LMArena leaderboard and leads common coding, math, and science benchmarks. "Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy," said Koray Kavukcuoglu, CTO of Google DeepMind. According to Google, the model's reasoning capabilities extend beyond classification and prediction, allowing it to analyse information, draw logical conclusions, and incorporate context and nuance. This new "thinking model" outperforms OpenAI o3 mini, GPT-4.5, DeepSeek-R1, Grok 3 and Claude 3.7 Sonnet in several benchmarks. It also achieves a state-of-the-art 18.8% among models without tool use on Humanity's Last Exam, a dataset created by hundreds of subject matter experts to reflect the limits of human knowledge and reasoning. "For a long time, we've explored ways of making AI smarter and more capable of reasoning through techniques like reinforcement learning and chain-of-thought prompting," the company stated. "Now, with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training." The model is available in Google AI Studio and the Gemini app for advanced users, with availability on Vertex AI expected soon. Google plans to introduce pricing in the coming weeks for higher-rate production use. Developers and enterprises can start using Gemini 2.5 Pro in Google AI Studio now. "Going forward, we're building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents," Google said. "2.5 Pro excels at creating visually compelling web apps and agentic code applications, along with code transformation and editing," the company said. On SWE-Bench Verified, Gemini 2.5 Pro scores 63.8% with a custom agent setup. Google highlights improvements in Gemini 2.5's context-handling capabilities. "2.5 Pro ships today with a 1 million token context window (2 million coming soon), with strong performance that improves over previous generations," the company stated. The model is capable of comprehending text, audio, images, video, and full code repositories. Gemini 2.5 follows the recent release of Google Gemma 3, the latest iteration in its Gemma family of open-weight models. It succeeds Gemma 2, which was released last year. The tech giant also recently introduced Gemini's native image generation in Gemini 2.0 Flash, which integrates multimodal input, advanced reasoning, and natural language processing (NLP) to produce high-quality visuals. Google's rival OpenAI has also launched image-generation capabilities in GPT-4o. Meanwhile, DeepSeek on Monday announced a new update to its general-purpose AI model DeepSeek-V3. The updated model 'DeepSeek V3-0324' now ranks highest in benchmarks among all non-reasoning models. Artificial Analysis, a platform that benchmarks AI models, stated, "This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source." The model scored the highest points among all non-reasoning models on the platform's 'Intelligence Index'. Moreover, recently, Reuters reported that DeepSeek plans to release R2 "as early as possible". The company initially intended to launch it in early May but is now contemplating an earlier timeline.
[22]
Did Google Just Build The Best AI Model for Coding?
Gemini 2.5 is better than the Claude 3.7 Sonnet for coding in the Aider Polyglot leaderboard. It's a sign of trouble when just one company, with a single new feature, manages to monopolise the internet's collective attention. For days, every social media feed was flooded exclusively with 'Ghibli-fied' visuals, all thanks to ChatGPT's newly released image generation feature. Google has gone all in. Instead of merely following OpenAI's spotlight, Google's latest announcement of the Gemini 2.5 family of models-the first one being the 2.5 Pro Experimental-now leads several benchmarks as the top frontier AI model. The model ranks first in the GPQA benchmark, which tests AI models on graduate-level science questions. It scored 83%, outperforming OpenAI's o1-Pro (79%) and Claude 3.7 Sonnet (77%) with extended thinking. Similarly, it ranked the highest in many other benchmarks. Source: Artificial Analysis Moreover, Gemini 2.5 is already receiving acclaim as potentially the best AI model for coding, a title that no other model besides Anthropic's Claude has managed to claim convincingly. Could Claude 3.7 Sonnet finally be facing some genuine competition? In the Aider Polyglot leaderboard, which evaluates LLMs' capabilities in writing and editing code, Gemini 2.5 Pro Experimental scored 72.9%. It performed better than Claude 3.7 Sonnet (64.9%), OpenAI's o1 (61.7%), and the o3-mini high at 60.4%. "Gemini 2.5 Pro is now easily the best model for code," Mckay Wrigley, a developer, said. He also highlighted how the model doesn't just agree with the user all the time, and demonstrated "flashes of genuine brilliance". "Google delivered a real winner here," Wrigley added. Even in various real-world scenarios, the experiences of many developers aligned with the benchmark scores, particularly when compared to Anthropic's Claude 3.7 Sonnet. A user on Reddit shared their experience of spending approximately three to four hours building an app with Claude 3.7 Sonnet, resulting in non-functional code with poor security practices, including hardcoded APIs. After they switched to Gemini 2.5 and provided the entire faulty codebase as input, it identified and explained the flaws, while also rewriting the entire application effectively. In another instance, Gemini 2.5 outperformed Claude 3.7 Sonnet in accurately reproducing a user interface. A user on X tested both models' abilities in recreating ChatGPT's user interface. Gemini 2.5 provided a more accurate representation. All things considered, Gemini 2.5 is also a huge leap for Google over the preceding models. Alex Mizrahi, a developer, shared how he used the model to recall about 80-90% of Rell syntax purely from memory -- a significant improvement over earlier Gemini versions, which previously struggled even when provided with examples. Moreover, users expressed a preference for Gemini 2.5 over other models in the realm of vibe coding. Developer Matthew Berman said on X, "It (Gemini 2.5 Pro) asks me clarifying questions along the way, which no other model has done." This indicates that it is "much more" collaborative. Gemini 2.5 also has an advantage over other coding models due to its long 1 million input context window. OpenAI models, the o1 and the o3-mini, support only 250k tokens, while Anthropic is reportedly planning to extend to 500k tokens. While it is an improvement over other models, it is still imperfect. It continues to pose all the classical concerns that are associated with AI models in coding. Kaden Bilyeu, a developer, said on X that Gemini 2.5 was trying to create a client-side API for generating a chat response, indicating that the AI model was going to leak the API key. Besides, there are also mixed reviews of the model handling large codebases. Louie Bacaj, a developer, revealed that Gemini 2.5 struggled significantly when working with a codebase of 3,500 lines of code. He noted that despite claims of enhanced context handling, the model had trouble performing requested tasks even when API calls succeeded. So there's still a huge necessity for human judgment and intervention for using any AI model for coding. Besides, Google's Gemini 2.5's first model is the 2.5 Pro Experimental, which means it is still in the experimental phase. Hence, it is very likely to expect further refinements and improvements. However, one area where Google needs to improve its game is packing its AI models better. This is precisely the reason why OpenAI's GPT-4o gained more traction for image generation, even when Google released the same feature with Gemini 2.0 Flash model a few days ago. "I feel a little bit for the Google DeepMind team," said Nikunj Kothari, an angel investor. "You build a world-changing model and everyone is posting Ghibli-fied pictures instead." He also said that this has been the core problem with Google, where they can build the best AI models in the world, but fail to focus on consumer experience. "I beg of them to take 20% of their best talented folks and give them free rein on building world-class consumer experiences," Kothari added. Besides, he added that the model's personality is quite basic compared to the others. Notably, several other users also resonate with this. When native image generation in Gemini 2.0 Flash was released, it earned praise for its capabilities. However, it wasn't easy for many users to find and use the feature in the first place. The user interface was quite unintuitive, with options needlessly buried under menus. But circling back to the entire Ghibli mania, it might not be that Google failed in marketing its product effectively, but rather that OpenAI excelled at tapping into user psychology. "You post two pictures and everyone gets it," said a user on X, on OpenAI showcasing the image generation capabilities in GPT-4o. "You ask the same people to read a report generated by 2.0 and compare [it] to 2.5, and that requires more time than scrolling and liking," he added. Scenarios like these highlight that regardless of how powerful your AI models are or how groundbreaking the underlying research might be, the average user tends to gravitate toward results that are enjoyable, relatable, and emotionally engaging.
[23]
Google calls Gemini 2.5 its smartest AI yet
Google DeepMind unveiled Gemini 2.5 on March 25, 2025, calling it their most intelligent AI model yet. This experimental version, Gemini 2.5 Pro, claims the top spot on the LMArena leaderboard and showcases improved reasoning and coding skills. Gemini 2.5 models are designed as "thinking models." They can reason through their thoughts before responding, which boosts both performance and accuracy. This involves analyzing information, making logical deductions, understanding context, and making informed decisions. Gemini 2.5 Pro Experimental is built for complex tasks. The model leads in common coding, math, and science benchmarks. Gemini 2.5 Pro is currently available in Google AI Studio and the Gemini app for Gemini Advanced users. It's slated to arrive on Vertex AI in the coming weeks, with pricing and higher rate limits for production use also on the way. The model achieved a state-of-the-art score of 18.8% on Humanity's Last Exam, a dataset crafted by hundreds of experts to measure human knowledge and reasoning, and leads the field in math and science benchmarks like GPQA and AIME 2025. Gemini 2.5 Pro has made strides in coding performance. The model received 63.8% on SWE-Bench Verified with a custom agent setup, in agentic code evaluations. Gemini 2.5 Pro retains the core strengths of Gemini models, including native multimodality and a long context window. It ships with a 1-million-token context window, with a 2-million-token version coming soon. This allows it to process large datasets and tackle problems involving text, audio, images, video, and even entire code repositories. Developers and businesses can experiment with Gemini 2.5 Pro in Google AI Studio immediately. Gemini Advanced users can select it in the model dropdown, and Vertex AI access will follow in the coming weeks.
[24]
Google releases new AI model as ChatGPT retains 43% market share
ChatGPT dominated the AI tools market with a 43% market share, while competitor DeepSeek ranked third with just 6%. Google introduced Gemini 2.5, its latest experimental artificial intelligence model; it ranked second in a competitive leaderboard for AI-driven web development tools. On March 25, Google announced that it will allow developers to try out Gemini 2.5 Pro. The company described it as a thinking model, capable of reasoning through thoughts before responding. Google said this improves both accuracy and performance, particularly in coding, science and math tasks. It said Gemini 2.5 can support more context-aware agents. Citing self-reported data compiled by the AI benchmarking platform LMArena, Google shared that the new AI model topped the charts in reasoning and knowledge, science and mathematics. Google described Gemini 2.5 as its "most intelligent AI model." Google's new AI model ranked second in LMArena's WebDev leaderboard, a real-time AI coding competition where models compete in web development challenges created by the AI benchmarking platform. The AI model had an arena score of 1267.70, which surpassed competitors including DeepSeek, Grok and ChatGPT. Still, the top spot went to Anthropic's AI model Claude 3.7 Sonnet, which had an arena score of 1354.01. Related: 44% are bullish over crypto AI token prices: CoinGecko survey While many companies are working to improve their models' performances, OpenAI's ChatGPT continues to dominate the AI tools market. In 2024, the AI chatbot recorded more than 40 billion yearly visits, representing a market share of nearly 40%. Data from AI statistics and usage trends platform aitools.xyz showed that overall, the AI tools market had 101 billion visits throughout the year. Canva's AI generator came in second place, with 10.4 billion visits, gaining a 10.25% market share. More recently, new contenders in the AI tools market have surfaced. In February, the data showed that DeepSeek's AI tools climbed in popularity and now rank third with a 6.58% market share. DeepSeek also ranks first in the Trending list, recording a growth rate of 195% and monthly visits of 792 million. Despite this, ChatGPT continues to dominate the space, with a 43% market share in February and 5.2 billion monthly visits.
[25]
Google Raises the AI Bar with Gemini 2.5 Pro Reasoning - Phandroid
Gemini 2.5 Pro reasoning marks a critical step in Google's push to build AI that doesn't just predict -- but thinks. The new release climbs to the top of the LMArena leaderboard, a sign of its growing preference among human evaluators. But beyond the benchmark wins and code demos, what does it actually mean for an AI model to "reason"? Google defines reasoning not just as pattern-matching, but as the ability to work through context, nuance, and logic. With Gemini 2.5, this ambition starts to materialize. The model scores state-of-the-art results on science and math tests like GPQA and AIME 2025, outperforming rivals like GPT-4.5 and Claude 3.7 Sonnet. And it does so without resorting to expensive test-time tricks like majority voting. More impressive still, Gemini 2.5 Pro reasoning shows up in code. On SWE-Bench Verified, it scores 63.8% using a custom agent setup, which is pretty solid performance for tasks like code transformation, editing, and building apps from one-liner prompts. Google even demos a working video game built from a single sentence. These aren't just numbers. They reflect a model trained to pause and evaluate before responding, rather than immediately regurgitating the most likely output. Google calls it a "thinking model," and with a million-token context window (two million coming soon), Gemini 2.5 is built to handle complex, multi-modal input across code, audio, and video. Yet there's still a question of how useful this "reasoning" is in practice. Benchmarks are one thing; real-world dependability is another. Can users trust these models to make correct decisions in ambiguous or high-stakes settings? Gemini 2.5 Pro reasoning may be the most sophisticated yet. But the true test will be what it gets wrong, and whether it knows when to pause and say, "I don't know."
[26]
Google Introduces Gemini 2.5 Pro as 'Its Most Intelligent AI Model'
The model is available via Google AI Studio and Gemini Advanced Google released the successor to its Gemini 2.0 series artificial intelligence (AI) models on Wednesday. Dubbed Gemini 2.5 Pro Experimental, it is the first model the company is releasing from the 2.5 family. The Mountain View-based tech giant says that this series of models will have "thinking" or reasoning capability built directly into the models. It also notes improved benchmark scores across a wide range of functions, outperforming OpenAI's o3-mini in several areas. Google has begun rolling out the model to users. In a blog post, Koray Kavukcuoglu, the CTO of Google DeepMind, detailed the new large language model (LLM). The most notable aspect of the Gemini 2.5 series is that there will no longer be any "Thinking" models such as the Gemini 2.0 Flash Thinking. The tech giant used an enhanced base model, which was further improved in post-training to deliver inherent reasoning capabilities to all the Gemini 2.5 AI models. Hence, Google will not denote a particular "Thinking" label to a model as all of them can carry out advanced reasoning and show chain-of-thought (CoT). Google did not reveal a lot about the model specifications, so details around its dataset, training methods, and architecture are not known. However, the tech giant shared its benchmark scores based on internal testing. It is said to have scored 18.8 percent on Humanity's Last Exam, a dataset considered the toughest benchmarking test for AI models. Gemini 2.5 Pro's score was state-of-the-art (SOTA) among models without tool use. Gemini 2.5 Pro is also claimed to have outperformed models like OpenAI's o3-mini, Grok 3 Beta, Claude 3.7 Sonnet, and DeepSeek R1 in several benchmarks, such as GPQA Diamond, AIME 2024 and 2025, Aider Polyglot, and MMMU. Apart from this, Gemini 2.5 Pro also ranked at the top of the LMArena leaderboard at release. LMArena is a user-based platform where AI enthusiasts and developers rate models based on their experiences. Currently, it is followed by Grok 3 preview, GPT 4.5 preview, Gemini 2.0 Flash Thinking, and Gemini 2.0 Pro for the second, third, fourth, and fifth positions, respectively. Google claims that the latest LLM also improves coding performance and can create "visually compelling" web apps and agentic code applications. Gemini 2.5 Pro also comes with native multimodal support and a context window of one million tokens. Gemini 2.5 Pro is available to developers and enterprises via the Google AI Studio, and Gemini Advanced subscribers can access the model in Gemini's web client and apps. The company plans to make it available on Vertex AI in the coming weeks.
[27]
Gemini 2.5 Pro: All about Google's 'most intelligent AI model'
Google has revealed its Gemini 2.5, which allows it to take a significant leap forward in the AI capabilities. Gemini 2.5 Pro is currently available in the Google AI Studio along with the Gemini app for its Advanced users. Google plans to unveil it for Vertex AI soon. The pricing will be notified by the tech giant in the coming weeks.Tech giant Google has introduced its "most intelligent AI model" Gemini 2.5 for its advanced subscribers and developers. Highlighting a major leap forward in its artificial intelligence (AI) capabilities, the company stated that all the Gemini 2.5 models have the thinking ability, thus making them capable of "reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy", read an official blog. Google's first release in the 2.5 series is an experimental version of 2.5 Pro, which the company claims is "state-of-the-art" across various benchmarks. It also claims that the version has showcased significant advancement in coding and reasoning. Also Read : NBA 2024-2025 OKC Thunder vs Sacramento Kings: All you need to know about the West Conference face-off The all-new Gemini 2.5 has an advanced reasoning system at its core allowing the platform to generate effective responses based on its abilities. The company aims to build this thinking capability directly into all of its models, "so they can handle more complex problems and support even more capable, context-aware agents," it said. In comparison with the 2.0 Flash Thinking that got introduced in December last year, Google is not attaching the "Thinking" label anymore, 9to5Google reported. When it comes to AI, the "reasoning" capacity of a system is not only restricted to classification and prediction, but also refers to the ability to "analyse information, draw logical conclusions, incorporate context and nuance, and make informed decisions," Google said. With Gemini 2.5, the company claims to have achieved a "new level of performance" through the combination of enhanced base model along with improved post-training. Touted as its "most advanced model," Gemini 2.5 Pro Experimental is Google's answer to the complex tasks. It has claimed the top position on the LMArena leaderboard that measures human preferences for artificial intelligence-based interactions. Further, it has gone on to reach benchmarks like GPQA and AIME 2025, proving its strong mathematics and scientific reasoning abilities. Also Read : Fortnite OG latest update launched after a short downtime Google's latest model received a score of 18.8 per cent on Humanity's Last Exam that has been specifically designed to test knowledge and reasoning capabilities. The company has also highlighted its advanced coding, which it said is a "big leap over 2.0," while there were "more improvements to come" in the future. 2.5 Pro could serve as a game changer when it comes to creating visually compelling web apps as well as agentic code applications. In its blog, the company stated that the new model builds on with "native multimodality and a long context window." The 2.5 Pro has a 1 million token context window, while there are plans to increase it to 2 million soon. 1. How to get Gemini 2.5 Pro? Gemini 2.5 Pro is currently available in Google AI Studio as well as in the Gemini app for its Advanced users. It will be introduced for Vertex AI soon. 2. What's the pricing of Gemini 2.5 Pro? The pricing of Gemini 2.5 Pro will be announced in the coming weeks.
[28]
Google unveils Gemini 2.5, claims enhanced reasoning and coding capabilities
Google has introduced Gemini 2.5, its latest artificial intelligence (AI) model, which it claims has improved reasoning and coding performance. In a blog post on Tuesday, the company stated that Gemini 2.5 Pro, an experimental version, is "state-of-the-art across a wide range of benchmarks" and ranks #1 on LLM Arena by a significant margin. Google describes Gemini 2.5 as a "thinking model", capable of reasoning through its processes before responding, leading to enhanced accuracy. The company claims the model can analyse information, draw logical conclusions, and tackle complex problems more effectively. "Without test-time techniques that increase costs, such as majority voting, 2.5 Pro outperforms in mathematics and science benchmarks, including GPQA and AIME 2025," Google stated. In coding, Google asserts that Gemini 2.5 Pro makes a "big leap over 2.0", excelling in web app development, agentic code applications, and code transformation. On SWE-Bench Verified, a key benchmark for AI-generated code, it scores 63.8% with a customised agent setup. The model retains Gemini's core features, including multimodal capabilities and a 1 million token context window (set to expand to 2 million tokens soon). This allows it to process extensive datasets across text, audio, images, video, and code repositories. Gemini 2.5 Pro is now available in Google AI Studio and the Gemini app for Advanced users, with plans to launch on Vertex AI. Pricing details are expected to be announced in the coming weeks.
[29]
Google's Gemini 2.5 Pro AI Thinking Performance Tested
Google has recently launched its Gemini 2.5 Pro, an advanced AI model designed to provide improved reasoning, coding, and problem-solving capabilities. With its innovative multimodal functionality, extended context window, and record-breaking benchmark performance, this model represents a significant step forward in artificial intelligence. Whether you are a developer, researcher, or business professional, Gemini 2.5 Pro provides tools to address complex challenges with precision, efficiency, and adaptability. What sets Gemini 2.5 Pro apart isn't just its impressive technical specs -- it's how it bridges the gap between innovative technology and real-world usability. With new reasoning capabilities, record-breaking benchmark scores, and a context window that can process a staggering 1 million tokens, this AI is built to handle complexity like never before. Whether you're dreaming up dynamic web applications, decoding ethical dilemmas, or simply looking for smarter ways to work, Gemini 2.5 Pro promises to be the partner you didn't know you needed. Prompt Engineering provides more insight into what makes this model a true standout model in the ever-evolving world of AI. Gemini 2.5 Pro is the flagship model in Google's Gemini 2.5 series, offering state-of-the-art reasoning and coding capabilities. Its core features include: These features make Gemini 2.5 Pro a versatile and powerful tool for tackling intricate problems across various domains, offering solutions that are both practical and scalable. Gemini 2.5 Pro has set new standards in AI performance, particularly in reasoning, mathematics, and scientific problem-solving. Its achievements on industry benchmarks underscore its advanced capabilities: These results highlight Gemini 2.5 Pro's ability to deliver accurate and reliable solutions, making it a valuable asset for professionals who require advanced problem-solving tools. Explore further guides and articles from our vast library that you may find relevant to your interests in advanced AI reasoning. Gemini 2.5 Pro offers robust coding functionalities, making it an indispensable tool for software developers. Its advanced features include: These capabilities empower developers to automate repetitive tasks, tackle complex coding challenges, and focus on innovative solutions, enhancing productivity and creativity in software development. The versatility of Gemini 2.5 Pro extends far beyond coding and reasoning, making it a valuable tool across multiple industries. Its multimodal functionality and extended context window enable a wide range of applications, including: These applications demonstrate the model's adaptability and potential to drive innovation across a variety of fields, offering solutions that are both practical and forward-thinking. Gemini 2.5 Pro is currently available in an experimental phase through the Gemini Advanced subscription, with plans for broader accessibility via AI Studio upon its official release. Google has outlined several key areas for future development: These ongoing improvements are expected to further solidify Gemini 2.5 Pro's position as a leader in artificial intelligence, paving the way for new possibilities in AI-driven solutions. Gemini 2.5 Pro represents a significant advancement in AI technology, combining superior reasoning, advanced coding capabilities, and multimodal functionality. Its ability to handle complex tasks with an extended context window and achieve record-breaking benchmark performance positions it as a valuable asset for professionals across various industries. As Google continues to refine and expand its capabilities, Gemini 2.5 Pro is poised to play a pivotal role in shaping the future of artificial intelligence, offering tools that empower users to solve problems, innovate, and achieve their goals with unparalleled efficiency.
[30]
Google Unveils Gemini 2.5 Pro, Shattering Records on Humanity's Last Exam
Gemini 2.5 Pro is now rolling out to Gemini Advanced users. You can also try the new model on Google's AI Studio for free. Google has released a groundbreaking AI model called Gemini 2.5 Pro that has scored 18.8% on Humanity's Last Exam (HLE) without using web search or any other tools. HLE is a rigorous benchmark, designed by subject matter experts and top academicians from around the world to test in-depth knowledge on various subjects. Previously, OpenAI's o3-mini-high achieved 14% on the same benchmark without using any tools. Gemini 2.5 Pro is a thinking model, meaning it's a reasoning model, built on top of a larger base LLM, using reinforcement learning and chain-of-thought prompting. Before the Gemini 2.5 Pro model, Google had released the smaller Gemini 2.0 Flash Thinking model. Google says the Gemini 2.5 Pro model can "analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions." Gemini 2.5 Pro was being tested on LMArena under the codename "nebula". Now, Gemini 2.5 Pro has taken the top position on the LMArena leaderboard with the highest score of 1,443, beating Grok 3 and GPT-4.5. As for other benchmarks, Google says Gemini 2.5 Pro performs exceptionally well in coding, math, and science. In GPQA Diamond, Gemini 2.5 Pro scored 84%; in AIME 2025, the model achieved 86.7%. Even in the SWE-bench verified benchmark that tests the ability to solve real-world software issues, Gemini 2.5 Pro scored 63.8%, second only to Claude 3.7 Sonnet Extended Thinking, which scored 70.3%. Google says the new Gemini 2.5 Pro model is capable of advanced coding and reasoning. It's rolling out to Gemini Advanced users. Those who want to test the Gemini 2.5 Pro model for free can head to Google AI Studio (visit) and select the "Gemini 2.5 Pro Experimental 03-25" model from the drop-down menu.
[31]
New Google Gemini 2.5 : The Thinking Family of AI Models
Google's Gemini 2.5 Pro represents a significant leap in artificial intelligence, offering advanced reasoning, problem-solving, and multimodal functionality. Building on the foundation of its predecessors, this experimental model is designed to tackle complex tasks that demand deep analysis, ethical decision-making, and the seamless integration of diverse inputs such as text and images. Available for testing through AI Studio and the Gemini app, Gemini 2.5 Pro sets a new benchmark for AI performance and versatility, paving the way for innovative applications across industries. Google's Gemini 2.5 Pro is available now in Google AI Studio and in the Gemini app for Gemini Advanced users, and will be coming to Vertex AI soon. But let's be real -- AI models are often hyped as new, only to leave us grappling with their limitations. What makes Gemini 2.5 Pro different? For starters, it's designed with a focus on reasoning, structured problem-solving, and multimodal capabilities, meaning it doesn't just spit out answers -- it thinks through them. From generating functional code to navigating ethical dilemmas, this model is built to handle the kind of nuanced tasks that demand more than surface-level intelligence. And while it's still in its experimental phase, the early results are hard to ignore. Sam Witteveen, explains what exactly sets Gemini 2.5 Pro apart, and how it might transform the way we work and create. Gemini 2.5 Pro is the latest milestone in Google's ongoing AI development journey, following the success of Gemini 2.0 Pro. This iteration benefits from advanced pre-training and post-training methodologies, including the use of synthetic data and refined data filtering techniques. By incorporating extensive user feedback and using platforms like AI Studio for rigorous testing, Google has significantly enhanced the model's ability to deliver logical, context-aware outputs. These improvements ensure that Gemini 2.5 Pro is not only more accurate but also more efficient in addressing a wide range of tasks, from technical problem-solving to creative applications. At the heart of Gemini 2.5 Pro lies a suite of features designed to excel in reasoning, structured problem-solving, and multimodal integration. Key functionalities include: These features enable Gemini 2.5 Pro to excel in tasks ranging from coding and image analysis to addressing ethical dilemmas and formulating long-term strategies. Its ability to process and synthesize multimodal inputs ensures that it remains versatile and adaptable across various use cases. Here are more guides from our previous articles and guides related to Google Gemini 2.5 Pro that you may find helpful. Gemini 2.5 Pro has demonstrated exceptional performance across a variety of benchmarks, solidifying its position as a leader in AI innovation. Notable achievements include: These accomplishments underscore the model's capacity to handle diverse and challenging tasks, making it a valuable tool for professionals across multiple domains. The versatility of Gemini 2.5 Pro positions it as a fantastic tool for various industries. Its capabilities extend to several impactful use cases, including: These applications highlight the model's potential to streamline workflows and enhance productivity in fields such as software engineering, education, and research. While Gemini 2.5 Pro represents a significant advancement in AI, it is not without its limitations. Current challenges include: These limitations highlight areas where future iterations could improve, making sure the model becomes even more reliable and effective for broader deployment. Looking ahead, Gemini 2.5 Pro is poised for further development, with several promising advancements on the horizon. Potential enhancements include: These developments could further solidify Gemini 2.5 Pro's role as a cornerstone of AI innovation, driving progress in both technical and creative domains.
[32]
Building Agent Workflows with Google's Gemini 2.5 Pro Free Powerful AI Model
Gemini 2.5 Pro is a sophisticated language model designed to transform the development of complex agent workflows. With its advanced capabilities in coding, reasoning, and function execution, it provides a robust framework for creating diverse applications. Key features include a large context window, support for both parallel and sequential function calls, and options for automated or manual function execution. These attributes make Gemini 2.5 Pro highly suitable for building tools such as SQL assistants, business intelligence dashboards, and travel planning systems. Its versatility and precision allow developers to tackle intricate tasks with confidence and efficiency. Gemini 2.5 Pro establishes itself as a leader in coding performance, excelling in industry-standard tests like Polyglot ADER. Its large context window is a standout feature, capable of processing up to 1 million input tokens and generating up to 65,000 output tokens. This extensive capacity enables the model to handle complex datasets and multi-step workflows with remarkable ease. Applications requiring detailed analysis or intricate data processing benefit significantly from this capability, as it ensures both efficiency and accuracy. By combining high performance with scalability, Gemini 2.5 Pro sets a new standard for handling large-scale computational tasks. A defining feature of Gemini 2.5 Pro is its robust function execution capabilities, which are accessible through the Gemini SDK for Python. The model supports both automated and manual function execution, giving developers the flexibility to tailor workflows to specific needs. It excels in handling basic, parallel, and sequential function calls, making sure precision and adaptability in application design. Additionally, Gemini 2.5 Pro generates structured outputs and demonstrates effective tool usage, making it highly applicable to real-world scenarios. These capabilities allow developers to create workflows that are both efficient and reliable, addressing a wide range of practical challenges. The versatility of Gemini 2.5 Pro is evident in its ability to power a wide array of applications. Its advanced features and adaptability make it a valuable tool across various domains: Gemini 2.5 Pro is accessible for free through AI Studio, with rate limits of 5 requests per minute and 50 requests per day. Its knowledge cutoff date, set at January 2025, ensures the model operates with relatively up-to-date information, making it a reliable resource for modern applications. However, it does not provide chain-of-thought visibility in API responses, which can limit transparency during debugging. To overcome this, developers may need to rely on careful prompt engineering to guide the model effectively. Despite this limitation, the model's accessibility and advanced features make it a valuable tool for developers seeking to build innovative solutions. To maximize the potential of Gemini 2.5 Pro, it is essential to adopt strategic development practices. Crafting detailed system prompts is a key step in guiding the model's behavior and making sure accurate outputs. For complex workflows, step-by-step instructions can help the model navigate intricate tasks with greater precision. Additionally, adapting workflows to handle unstructured or messy real-world data is crucial for achieving reliable results. By focusing on these strategies, you can unlock the full capabilities of Gemini 2.5 Pro and create applications that deliver meaningful and consistent outcomes. While Gemini 2.5 Pro offers advanced capabilities, it is not without limitations. The absence of chain-of-thought outputs in API responses can make debugging and transparency more challenging, requiring developers to invest additional effort in prompt engineering. Furthermore, achieving optimal performance often depends on precise and thoughtful input design, which may demand a higher level of expertise. Despite these challenges, the model's strengths in coding, reasoning, and function execution make it a powerful tool for addressing complex workflows when used effectively.
[33]
Is AI Becoming Just Another Commodity? The Truth Behind Gemini 2.5 and DeepSeek V3
The release of Gemini 2.5 Pro, DeepSeek V3, and advancements from AI leaders like OpenAI and Microsoft has reignited a critical debate: is artificial intelligence (AI) becoming commoditized? With the performance gap between leading AI models narrowing, the industry is undergoing a significant transformation. Increasingly, innovation seems to depend on compute resources rather than new discoveries, signaling a shift in how AI is developed, valued, and deployed. As performance gaps narrow, the focus seems to be moving away from innovative breakthroughs to something far more pragmatic: compute power and resource allocation. But what does this mean for the future of AI? Will the race for innovation give way to a battle of infrastructure and efficiency? Google's Gemini 2.5 Pro has set a new benchmark in the competitive AI landscape. With its ability to process up to 1 million tokens, it excels in long-context understanding, visual processing, and solving complex problems in fields like mathematics, reasoning, and science. These capabilities position it as a strong competitor to OpenAI's GPT-4.5 and Microsoft's advanced AI systems. However, the success of Gemini 2.5 Pro also highlights a broader industry trend: the diminishing uniqueness of AI systems. As the performance gap between leading models narrows, it becomes increasingly difficult for any single system to maintain a distinct competitive edge. This convergence reflects the growing challenge of differentiation in a crowded and rapidly evolving field. DeepSeek V3, developed by Chinese AI researchers, exemplifies the global nature of AI development. This reasoning-focused model rivals OpenAI's GPT-4.5 in areas like mathematics and coding, although it lags slightly in science and general knowledge. Its advancements demonstrate the increasing parity between Western and Chinese AI labs, underscoring the international competition driving the field forward. The progress of DeepSeek V3 raises important questions about the future of AI differentiation. As more labs achieve comparable levels of performance, the emphasis may shift from technological innovation to other factors, such as cost efficiency, resource allocation, and infrastructure. This shift could redefine what it means to compete in the AI industry. Stay informed about the latest in Gemini 2.5 Pro and DeepSeek V3 by exploring our other resources and articles. Microsoft CEO Satya Nadella has been vocal about the commoditization of AI, emphasizing that performance is now more closely tied to compute investment than to unique technological breakthroughs. Microsoft's internal AI models reportedly perform at near-parity with those from OpenAI and Anthropic, reinforcing this perspective. This trend reflects a broader industry shift toward resource-driven development. As companies invest heavily in compute power, the focus moves away from fantastic innovations and toward incremental improvements. This raises concerns about whether the industry is prioritizing scalability over creativity, potentially limiting the scope for new advancements. Across the AI sector, benchmarks reveal a clear pattern: the performance differences between leading models are shrinking. Companies employ various optimization strategies, such as majority voting and tool integration, to enhance their benchmark scores. While these methods improve specific metrics, they also complicate direct comparisons between models, making it harder to identify clear leaders. This convergence suggests that the competitive edge in AI may no longer lie in new advancements but in the ability to use resources effectively. As a result, the industry risks becoming more homogenized, with fewer opportunities for differentiation. This trend could reshape the competitive dynamics of the AI market, emphasizing operational efficiency over innovation. Despite rapid advancements, AI systems continue to face significant limitations. Tasks requiring advanced reasoning or domain-specific expertise, such as complex coding or nuanced decision-making, remain challenging for even the most sophisticated models. Predictions of AI dominance in fields like software engineering often overlook these practical constraints. These challenges highlight the gap between AI's theoretical potential and its real-world applications. While AI systems are becoming increasingly capable, human expertise remains indispensable in many areas. This underscores the importance of tempering expectations about AI's fantastic impact and recognizing its current limitations. The commoditization of AI has profound implications for the industry's future. As performance differences narrow, the focus is likely to shift from innovation to cost efficiency and compute power. This could lead to a more homogenized market, where access to resources becomes the primary determinant of success. For AI labs, staying competitive will require more than just technological advancements. Strategic investments in infrastructure, resource management, and operational efficiency will become increasingly critical. Balancing innovation with scalability may define the next phase of AI development, as companies strive to adapt to a landscape where differentiation is harder to achieve. The AI industry is at a pivotal moment. The release of Gemini 2.5 Pro, DeepSeek V3, and advancements from OpenAI and Microsoft exemplify the growing trend of performance convergence. As the competitive landscape evolves, the emphasis on compute power and cost efficiency is reshaping priorities across the sector. While the commoditization of AI raises concerns about the loss of differentiation, it also presents opportunities for new strategies and approaches. Companies that can effectively balance innovation with resource allocation will be well-positioned to thrive in this changing environment. The future of AI development will likely hinge on the industry's ability to adapt to these shifts, making sure that progress continues while addressing the challenges of a more homogenized market.
[34]
Google Gemini 2.5 Pro Crushes GPT-4.5 in AI Coding Wars
Google Gemini 2.5 Pro has already established itself as a leader in the rapidly evolving AI landscape, surpassing competitors like GPT-4.5 and Claude 3.5 in critical areas such as coding, reasoning, and creative problem-solving. With its advanced capabilities, including a large context window, self-correction mechanisms, and iterative reasoning, it has become an indispensable tool for developers, researchers, and innovators. This overview by Wes Roth explores its performance, features, and applications, offering an in-depth perspective on its fantastic role in AI-driven solutions. Google Gemini 2.5 Pro has consistently set new standards in coding and reasoning tasks, achieving top scores across various benchmarks. Its standout attributes include: These features make Gemini 2.5 Pro a reliable and efficient solution for addressing intricate challenges, making sure high-quality outcomes with reduced effort. One of the most remarkable aspects of Gemini 2.5 Pro is its exceptional coding capabilities. The model demonstrates the ability to generate functional, production-ready code for complex projects, often requiring only a single prompt. Examples of its coding expertise include: The outputs generated by Gemini 2.5 Pro typically require minimal debugging, significantly reducing the time and effort needed for development. This efficiency makes it an invaluable resource for developers seeking to streamline their workflows. Here are more detailed guides and articles that you may find helpful on Google Gemini. Gemini 2.5 Pro introduces a range of innovative features that enhance its usability and effectiveness, setting it apart from other AI models: These features not only enhance the model's performance but also make it accessible and user-friendly for a wide range of applications, from educational purposes to professional development projects. Despite its impressive capabilities, Gemini 2.5 Pro is not without its challenges. Some of the limitations include: While these limitations highlight areas for improvement, they do not detract significantly from the model's overall effectiveness and utility. Users have consistently praised Gemini 2.5 Pro for its ability to handle complex prompts with minimal guidance, making it a valuable tool for both beginners and experienced developers. Key aspects of the user experience include: These qualities contribute to a positive user experience, making sure that the model remains accessible and practical for diverse use cases. The versatility of Gemini 2.5 Pro is evident in its wide-ranging applications, which span both creative and practical domains. Examples of its real-world uses include: These applications highlight the model's ability to push the boundaries of AI innovation, demonstrating its potential to transform various industries and fields. Looking ahead, Gemini 2.5 Pro is poised for further advancements that promise to enhance its capabilities and expand its applications. Anticipated developments include: These advancements are likely to solidify Gemini 2.5 Pro's position as a leader in AI-driven coding and problem-solving, unlocking new possibilities for developers and researchers worldwide.
[35]
Google Debuts Touted Gemini 2.5 in the 'Winner-Take-All' AI Model Race | PYMNTS.com
Google unveiled its most powerful generative artificial intelligence model yet, one that opens a new front in the competitive AI race as it performs mostly head-and-shoulders above its staunchest rivals in industry benchmarks. "Google's Gemini 2.5 has landed -- a masterpiece of reasoning, multimodality and raw computational might," Anders Indset, founder of investment firm Njordis, told PYMNTS. Google is "thrusting itself into an AI race that's no longer a sprint but a relentless, winner-take-all siege." In reasoning and knowledge, Gemini 2.5 beat OpenAI's o3-mini and GPT-4.5, Claude 3.7 Sonnet, Grok 3 Beta and DeepSeek R1. The same went for code editing, visual reasoning, long context and multilingual performance. In science, it beat everything except Claude. In math, it was second only to Grok. It performed comparatively weakest in code generation, where it was third. The first version of Gemini 2.5 that Google is releasing is an experimental Pro version. It is available to Gemini Advanced paid users and will head to Google Cloud's Vertex AI platform "soon," per the blog post. Developers and enterprises can try it out at the Google AI Studio. Read also: Google Wants 500 Million Gemini AI Users by Year's End Gemini 2.5 is a thinking or reasoning model, which pauses to cycle through its logic before responding to improve the accuracy and performance of its answers. It analyzes information, comes to logical conclusions, adds context and understands nuance to reach a decision, according to the post. Google's rivals have already released their own reasoning models, which include those from OpenAI, Anthropic, Grok, DeepSeek and others. Google itself has released a reasoning model called Gemini 2.0 Flash Thinking. However, Gemini 2.5 goes beyond the reasoning capabilities of Gemini 2.0 Flash Thinking, which uses reinforcement learning (rewarding right answers, punishing wrong ones) and chain-of-thought prompting, per the post. With Gemini 2.5, Google was able to reach a new level of performance by combining a "significantly enhanced base model with improved post-training," the post said. Going forward, Google will incorporate these thinking capabilities directly into all its models, so they can handle "more complex problems and support even more capable, context-aware agents," according to the post. See also: Google's Gemini 2.0 Promises Autonomous Control of Complex Business Tasks Like Google's other Gemini models, Gemini 2.5 is natively multimodal, meaning it can analyze and understand text, audio, video, images and code -- capabilities built in from the ground up, not bolted on. Gemini 2.5 also offers a context window of 1 million tokens (about 750,000 English words), so it can accept very long prompts, a feature matched only by Alibaba on some of its Qwen generative AI models. "The context window is incredibly important for the AI race," Ilia Badeev, head of data science at Trevolution Group, told PYMNTS. "The length of the context is one of the most crucial parameters for the practical use of [AI models]." "With a larger context, the model can provide better assistance with programming, answering questions and text generation -- anything basically," Badeev said. Gemini 2.5 blew away the competition in long-context performance with 83.1%, the blog post said. OpenAI's o3-mini came in at 61.4% and its GPT-4.5 at 64%. Google plans to double the context window soon, per the post. "If Google does indeed implement a 2 million token context, it will be an unprecedented advantage over other models, even with lower benchmarks," Badeev said.
[36]
Google unveils Gemini 2.5 Pro with advanced reasoning and coding capabilities
Google on Tuesday introduced Gemini 2.5, its most intelligent AI model yet. The first release, an experimental version called 2.5 Pro, debuted at number one on the LMArena leaderboard "by a significant margin," said Koray Kavukcuoglu, CTO of Google DeepMind. He noted that these "thinking models" reason through their thoughts before responding, improving performance and accuracy. Gemini 2.5 Pro, described as the "most advanced model for complex tasks," leads benchmarks like GPQA and AIME 2025 without extra techniques such as majority voting. It scores a state-of-the-art 18.8% on Humanity's Last Exam, a dataset crafted by experts to push knowledge and reasoning limits, and excels in math and science. Kavukcuoglu said they've achieved "a big leap over 2.0" in coding, with more improvements planned. The 2.5 Pro scores 63.8% on SWE-Bench Verified, an industry-standard test, using a custom agent setup. It creates "visually compelling web apps," handles code transformation, and can generate executable video game code from a single prompt. Gemini 2.5 retains native multimodality and launches with a 1 million token context window, set to reach 2 million soon. It processes text, audio, images, video, and entire code repositories effectively. Kavukcuoglu emphasized that feedback will help enhance its abilities rapidly, with the goal of "making our AI more helpful." Developers can test 2.5 Pro in Google AI Studio now, and Gemini Advanced users can select it from the model dropdown on desktop and mobile. It will arrive on Vertex AI in the coming weeks, with pricing for "higher rate limits for scaled production use" to follow, he added.
[37]
Google's Gemini AI Introduces Real-Time Video, Smarter Research, and Personalised Assistants
Gemini AI Gets a Major Upgrade: Real-Time Video, Smarter Research, and More !!! Google's Gemini AI keeps getting better, with a whole new set of features that greatly improve its functionality. With enhanced research support, personalisation, app integrations, and Gems for all, Gemini AI is fast becoming an essential tool for users across the globe. Here are a few closer insights into what's new and how these features might even make your experience feel more seamless and productive. Gemini 2.0 Flash Thinking Experimental brings new features and a longer context window. The major new feature of the 2.0 Flash Thinking Experimental model is additional support for file uploads. The model, which was famous for analysing prompts into formatted steps to enhance reasoning ability, is now even faster and more efficient. Additionally, advanced users of Gemini can now utilise a 1M token context window, which enables them to process and analyse much greater quantities of information. This upgrade allows users to handle more sophisticated issues with greater precision and depth than ever previously.
[38]
Gemini 2.5 Pro: What is it and How Does it Work?
Google has launched Gemini 2.5 Pro, its most advanced artificial intelligence model. The model improves reasoning and problem-solving abilities. It also excels in coding and multimodal data processing. The Gemini 2.5 Pro version enhances users' capacity for logical thought processes and helps them make better decisions. This system performs information analysis, applies context and solves complex problems. displays leadership performance in both GPQA and AIME 2025 benchmarks because they evaluate artificial intelligence knowledge and understanding capabilities. The expert-developed "Humanity's Last Exam" shows 18.8% success at problem solving and this result surpasses all other models.
[39]
Google launches Gemini 2.5: Its most sophisticated AI model to date By Investing.com
Investing.com -- Alphabet Inc's Google (NASDAQ:GOOGL) has revealed Gemini 2.5, representing a significant leap forward in its artificial intelligence capabilities. The initial release, Gemini 2.5 Pro Experimental, showcases remarkable advancements in reasoning and coding, substantially outperforming competitors across major industry benchmarks. At the core of Gemini 2.5's innovation is its advanced reasoning system, allowing the AI to effectively "think through" problems before generating responses. This deliberate reasoning process -- analyzing information, drawing logical conclusions, and incorporating contextual nuances -- results in significantly higher accuracy and performance. The model builds upon Google's earlier work with reinforcement learning and chain-of-thought prompting, which first materialized in Gemini 2.0 Flash Thinking. The 2.5 series combines an enhanced foundation model with sophisticated post-training techniques, enabling it to tackle more complex challenges and support more capable, context-aware applications. Gemini 2.5 Pro Experimental has claimed the top position on the LMArena leaderboard, which measures human preferences for AI interactions. The model demonstrates exceptional reasoning abilities, particularly excelling in mathematics and scientific reasoning benchmarks like GPQA and AIME 2025 -- without relying on costly test-time enhancement techniques. Notably, the model achieved a state-of-the-art score of 18.8% on Humanity's Last Exam, a challenging dataset designed to test the boundaries of knowledge and reasoning capabilities. Google has placed special emphasis on enhancing Gemini's coding abilities. The 2.5 Pro model excels at creating visually sophisticated web applications and complex code structures, while also demonstrating proficiency in code transformation and editing tasks. On SWE-Bench Verified, the industry's standard for evaluating code-related AI capabilities, Gemini 2.5 Pro achieved an impressive 63.8% score with a customized agent configuration. Building on previous generations, Gemini 2.5 features native multimodal processing and an expansive context window of 1 million tokens (with plans to increase to 2 million). This allows the model to process and comprehend massive datasets from diverse sources, including text, audio, images, video, and entire code repositories with improved performance over earlier versions. Gemini 2.5 Pro is currently available through Google AI Studio and in the Gemini app for Gemini Advanced subscribers. Google plans to expand access via Vertex (NASDAQ:VRTX) AI soon, with pricing details to be announced in the coming weeks. The new pricing structure will enable scaled production use with higher rate limits.
Share
Share
Copy Link
Google has launched Gemini 2.5 Pro, its latest AI model boasting advanced reasoning capabilities, multimodality, and improved performance across various benchmarks. This release marks a significant step in the ongoing AI race among tech giants.
Google has unveiled Gemini 2.5 Pro Experimental, touted as its "most intelligent" AI model to date 1. This latest addition to the Gemini family introduces advanced reasoning capabilities, multimodality, and an expansive context window, positioning it as a formidable competitor in the rapidly evolving AI landscape 2.
Gemini 2.5 Pro boasts a remarkable 1 million token context window, allowing it to process vast amounts of information equivalent to multiple long books in a single prompt 1. Google plans to double this capacity to 2 million tokens in the near future, further extending its capabilities 2.
The model incorporates built-in reasoning, enabling it to fact-check and refine its outputs during generation. This feature, which Google terms "simulated reasoning," is particularly beneficial for complex tasks such as coding 1. Impressively, Gemini 2.5 Pro can generate a fully functional video game from a single prompt, showcasing its advanced agentic coding abilities 14.
Google claims that Gemini 2.5 Pro outperforms competing models on several key benchmarks:
However, it's worth noting that on the SWE-bench Verified test for software development abilities, Gemini 2.5 Pro (63.8%) fell short of Anthropic's Claude 3.7 Sonnet (70.3%) 2.
Gemini 2.5 Pro is now available through Google AI Studio and the Gemini app for subscribers of the $20-a-month Gemini Advanced plan 25. Google has announced that all its future AI models will incorporate reasoning capabilities 2.
The release of Gemini 2.5 Pro comes at a time of intense competition in the AI sector. With companies like OpenAI, Anthropic, DeepSeek, and xAI all vying for dominance in the reasoning model space, Google's latest offering represents a significant move to maintain its position at the forefront of AI innovation 3.
This development is part of Google's broader strategy to integrate AI across its product portfolio. The company plans to invest $75 billion in AI development in 2025 alone, underscoring the importance of this technology to its future 4.
As AI models continue to advance rapidly, the introduction of Gemini 2.5 Pro raises important questions about the future of AI capabilities and their potential applications. While these advancements promise significant benefits in areas such as coding, scientific research, and content creation, they also bring forth considerations regarding ethical use, data privacy, and the societal impact of increasingly sophisticated AI systems 35.
The AI industry's rapid growth, with projections reaching $1.8 trillion by 2030, highlights the critical nature of these developments for tech companies and society at large 4. As models like Gemini 2.5 Pro push the boundaries of what's possible, ongoing discussions about responsible AI development and deployment will remain crucial.
Reference
[1]
[2]
Google's experimental AI model Gemini-Exp-1121 has tied with OpenAI's GPT-4o for the top spot in AI chatbot rankings, showcasing rapid advancements in AI capabilities. However, this development also raises questions about the effectiveness of current AI evaluation methods.
5 Sources
5 Sources
Google has made its latest AI model, Gemini 2.5 Pro (Experimental), available to free users just days after its initial release to paying subscribers. This model, touted as Google's "most intelligent" yet, introduces simulated reasoning capabilities and outperforms competitors on various benchmarks.
13 Sources
13 Sources
Google's Gemini 2.0 introduces advanced multimodal AI capabilities, integrating text, image, and audio processing with improved performance and versatility across various applications.
59 Sources
59 Sources
Google has announced the release of new Gemini models, showcasing advancements in AI technology. These models promise improved performance and capabilities across various applications.
2 Sources
2 Sources
Recent leaks suggest Google is preparing to launch Gemini 2.0, a powerful AI model that could rival OpenAI's upcoming o1. The new model promises enhanced capabilities in reasoning, multimodal processing, and faster performance.
5 Sources
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved