Curated by THEOUTPOST
On Thu, 12 Dec, 12:04 AM UTC
59 Sources
[1]
Google Gemini 2 Multimodal and Spatial Awareness in Python
Google's Gemini 2 offers a unified framework that integrates text, images, and structured data. Positioned as a potential competitor to OpenAI's models, it features remarkable capabilities in agent-based applications and specialized tasks, such as underwater image analysis. While still in its experimental phase, Gemini 2 demonstrates significant promise, though certain limitations highlight areas for further refinement. Imagine trying to describe the vibrant, chaotic beauty of an underwater coral reef to someone who's never seen it before. The intricate patterns of coral, the darting movements of fish, the play of light filtering through the water -- it's a scene so rich in detail that words often fall short. Now, imagine an AI capable of not only capturing this complexity in words but also generating images, structured data, and actionable insights from it. As with any innovative technology, Gemini 2 isn't without its quirks and growing pains. While it excels at tasks like identifying fish species and labeling coral in underwater images, it occasionally stumbles on subtleties or produces repetitive outputs. Yet, these imperfections don't overshadow its potential. What makes Gemini 2 particularly exciting is its adaptability and promise for agent-based applications, where AI can take on more autonomous, task-specific roles. In this overview by James Briggs, learn more about what makes Gemini 2 stand out, explore its capabilities and limitations, and consider how it might reshape the landscape of multimodal AI. Gemini 2 is Google's latest multimodal AI model, designed to process and generate outputs across multiple modalities, including text, images, and structured data. Unlike traditional models that focus on a single domain, Gemini 2 adopts a more versatile approach, excelling in tasks that demand contextual understanding and complex outputs. Its agentic capabilities further enhance its functionality, allowing it to autonomously perform task-specific actions with minimal human intervention. By integrating diverse data types into a cohesive framework, Gemini 2 offers a flexible solution for industries requiring advanced multimodal processing. Its design emphasizes adaptability, making it suitable for a wide range of applications, from creative content generation to scientific analysis. Gemini 2 distinguishes itself in the multimodal AI landscape with a suite of advanced features that enhance its versatility and practical utility. These capabilities include: These features make Gemini 2 a powerful tool for industries that rely on multimodal data processing, offering both flexibility and precision in handling complex tasks. Uncover more insights about Gemini 2.0 and AI in previous articles we have written. Extensive testing has highlighted both the strengths and limitations of Gemini 2. In underwater image analysis, the model has demonstrated the ability to identify various fish species and coral types, even under challenging conditions such as motion blur or image noise. For instance, it successfully recognized a clownfish within a coral reef but struggled to differentiate between closely related coral species. While its performance in such scenarios is impressive, occasional inaccuracies -- such as mislabeling objects or failing to distinguish subtle differences -- indicate room for improvement. These observations underscore the experimental nature of the model and the importance of ongoing updates to enhance its reliability in specialized applications. Gemini 2's ability to process multimodal inputs and generate meaningful outputs positions it as a valuable tool for researchers and practitioners. However, its performance in highly specialized tasks, such as detailed spatial analysis, remains an area for further refinement. Accessing Gemini 2 requires a Google AI Studio API key, which provides entry to the model's capabilities. Users can run the model locally or in a cloud-based environment like Google Colab, depending on their computational resources and project requirements. Setting up the model involves configuring system prompts and task-specific parameters to optimize its outputs for particular use cases. To tailor Gemini 2 for specific tasks, consider the following steps: This flexibility allows users to adapt Gemini 2 to a wide range of applications, from generating creative content to analyzing complex datasets. Proper configuration ensures that the model delivers outputs aligned with specific project goals. Despite its advanced capabilities, Gemini 2 has certain limitations that may affect its performance in specific scenarios. These include: These challenges highlight the experimental nature of Gemini 2 and the need for continued development to achieve production-level reliability. Users should be aware of these limitations when deploying the model in critical applications. Gemini 2's multimodal capabilities position it as a promising tool for a variety of industries and applications. Its ability to integrate text, images, and structured data into a unified framework opens up new possibilities for innovation and efficiency. Potential use cases include: As the model continues to evolve, fine-tuning for specific tasks and environments will likely enhance its utility. This could encourage broader adoption of non-OpenAI models within the AI community, providing researchers and practitioners with a robust alternative for multimodal data processing. Gemini 2 represents a significant step forward in the development of multimodal AI. Its ability to integrate diverse data types into a cohesive framework sets it apart from many existing models. While challenges such as inconsistent object identification and repetitive outputs remain, its potential for specialized applications and agent-based tasks is evident. With further refinement, Gemini 2 could become a leading AI model, offering a compelling alternative to current industry standards.
[2]
How Gemini 2.0 is Changing the Game for Multi-Modal AI
The release of Google's Gemini 2.0 this month marks a significant advancement in artificial intelligence (AI), blending multi-modal capabilities with agentic functionality to deliver a highly versatile system. This new AI platform from Google processes and generates text, images, audio, and video, allowing seamless integration of diverse inputs and outputs. The experimental Flash version of Gemini 2.0 further improves these capabilities, enabling real-time collaboration, advanced reasoning, and novel applications across various industries. Whether you are a developer, researcher, or industry professional, Gemini 2.0 is poised to transform how you interact with and use technology. Imagine a world where technology doesn't just respond to your needs -- it anticipates them, effortlessly combining text, images, audio, and video to create solutions you never thought possible. Whether you're a developer tackling complex coding challenges, a researcher exploring uncharted data, or someone simply curious about the future of AI, Google's Gemini 2.0 is here to transform the game. This isn't just another AI upgrade; it's a leap into a future where multi-modal intelligence redefines how we interact with technology and the world around us. At its core, Gemini 2.0 isn't just about doing more -- it's about doing it smarter, faster, and with unprecedented versatility. From generating lifelike images and audio to solving real-world problems through advanced reasoning and collaboration, this experimental model is already making waves. But what does that mean for you? Whether you're building innovative applications or dreaming of new possibilities, Gemini 2.0 provides tools that could transform your work, spark creativity, and reshape your perspective on what AI can achieve. Let's provide more insight into what makes this innovation so new. Gemini 2.0 is engineered to handle a wide array of tasks with exceptional precision and efficiency. Its multi-modal capabilities make it a versatile tool for tackling challenges across industries. Here are the standout features that define its innovative functionality: These features position Gemini 2.0 as a powerful solution for addressing challenges in sectors such as healthcare, education, entertainment, and beyond. Its ability to integrate multiple data formats and provide real-time insights makes it a valuable asset for professionals seeking innovative solutions. Gemini 2.0 is at the core of several experimental projects that demonstrate its versatility and potential for real-world applications. These initiatives highlight how its capabilities can be harnessed to drive innovation and efficiency: These projects illustrate the diverse applications of Gemini 2.0, from streamlining workflows to allowing new forms of exploration and problem-solving. By using its advanced capabilities, organizations can unlock new opportunities for growth and innovation. Gemini 2.0 offers developers a comprehensive suite of tools and APIs designed to simplify the creation of advanced applications. Its multi-modal live API supports a range of functionalities, allowing developers to build innovative solutions with ease. Key features include: To further support developers, Google offers open source resources on platforms like GitHub. These resources include examples and documentation that make it easier to experiment with Gemini 2.0's capabilities and integrate its features into your projects. By providing these tools, Gemini 2.0 enables developers to push the boundaries of what is possible in AI-driven applications. The experimental Flash version of Gemini 2.0 provides a glimpse into the future of artificial intelligence. Its superior performance on benchmarks and expanded capabilities set a new standard for multi-modal AI systems. As the technology continues to evolve, Gemini 2.0 is expected to unlock new possibilities, from real-time problem-solving to enhanced coding assistance and beyond. By combining multi-modal integration with agentic functionality, Gemini 2.0 is not just a tool but a platform for innovation. Its potential to transform industries and redefine workflows underscores its importance in the rapidly advancing field of AI. As more professionals and organizations adopt this technology, the possibilities for its application will continue to expand, shaping the future of artificial intelligence in profound ways.
[3]
Gemini 2.0 Overview: Features, Tools, and How to Get Started for Free
Google has introduced Gemini 2.0, an innovative multimodal AI model designed to transform how we interact with technology. Seamlessly integrated across Google's ecosystem, including Search, Gemini 2.0 features capabilities like real-time interactions and task automation, aiming to improve productivity and user experience. While some features are already available, others remain in testing, signaling even greater possibilities in the near future. Imagine an AI assistant that not only responds to your questions but also understands your environment, anticipates your needs, and assists you in real-time. With Gemini 2.0, this vision is becoming a reality. Whether managing work deadlines, handling household tasks, or seeking smarter ways to stay productive, this multimodal AI model is designed to transform your relationship with technology. Key features such as real-time interaction, task automation, and gaming integration demonstrate its potential to simplify and enhance everyday tasks. However, new technology often feels overwhelming, especially when it's surrounded by technical jargon and experimental tools. What exactly does "multimodal processing" mean, and how does it improve daily life? This article provides insights into Gemini 2.0's standout features, including Project Astra and Mariner, and explains how you can start using it for free. Whether you're a tech enthusiast, a busy professional, or simply curious about AI's future, Gemini 2.0 promises to integrate seamlessly into your life, making complex tasks simpler and more intuitive. Gemini 2.0 represents a significant leap forward in AI technology, building on its predecessor with enhanced processing power and improved user engagement. Its standout features include: These features are accessible through Google AI Studio, which provides free credits for initial exploration. This allows you to test Gemini 2.0's capabilities without any upfront investment, making it easier to evaluate its potential for your specific needs. One of the most notable tools within Gemini 2.0 is Project Astra, an AI assistant designed to interpret both visual and audio inputs. Astra can identify objects, read labels, and assist with everyday tasks such as setting laundry machine cycles or organizing household chores. This functionality underscores AI's growing role in real-world scenarios, offering practical solutions to simplify daily life. Beyond household applications, Astra has potential uses in professional environments. For instance, it could assist with inventory management by scanning and categorizing items or provide accessibility support for individuals with visual impairments. These capabilities highlight the versatility and convenience that Project Astra brings to the table. Expand your understanding of Gemini AI with additional resources from our extensive library of articles. Project Mariner focuses on automating repetitive browser-based tasks, such as filling out forms, conducting online research, and managing data entry. This tool is particularly useful for professionals who handle large volumes of information daily. By automating these processes, Mariner reduces manual workloads, allowing users to focus on more strategic or creative tasks. However, Mariner is not without its challenges. Maintaining accuracy across diverse tasks remains a hurdle, especially when dealing with complex or highly specific workflows. As the tool evolves, it is expected to address these limitations, potentially becoming an indispensable resource for businesses and individuals alike. For developers, Gemini 2.0 introduces Jewels, a suite of tools designed to use the model's multimodal capabilities. These tools include APIs for live streaming, advanced reasoning, and multimodal input/output processing. Jewels also supports native audio and image output, allowing developers to create more interactive and engaging applications. Currently, access to Jewels is limited, with a waitlist in place for broader availability. Despite this, the potential of these tools is immense, particularly for developers looking to build applications that require seamless integration of text, audio, and visual data. Jewels represents a significant step forward in empowering developers to harness the full potential of AI. Gemini 2.0 extends its capabilities to the gaming industry, introducing AI agents that assist with turn-based games. These agents provide strategy recommendations, gameplay insights, and even tutorials, enhancing the overall gaming experience. While their current focus is on simpler games, future updates could expand their functionality to include more complex scenarios, such as real-time strategy games or multiplayer environments. This integration not only enhances entertainment but also demonstrates the potential for AI to collaborate with humans in creative and competitive settings. By bridging the gap between AI and human interaction, Gemini 2.0 opens new possibilities for gaming and beyond. Gemini 2.0 delivers significant improvements in performance compared to its predecessor, Gemini 1.5. The Flash model enhances processing speed, personalization, and tone adaptation, making interactions feel more natural and user-centric. Benchmarks reveal faster response times and improved accuracy, positioning Gemini 2.0 as a leader in the competitive AI landscape. These advancements are particularly evident in applications requiring real-time processing, such as live customer support or dynamic content creation. By prioritizing speed and precision, Gemini 2.0 sets a new standard for AI performance. Despite its impressive capabilities, Gemini 2.0 is not without challenges. Tools like Project Astra and Jewels are still in preview stages, limiting their availability to a broader audience. Additionally, features such as live streaming and data handling raise important security concerns, particularly regarding privacy and unauthorized access. Addressing these issues will be critical for Gemini 2.0's long-term success. Google will need to implement robust security measures and transparent data policies to build trust among users and ensure the safe adoption of its tools. You can explore Gemini 2.0 through Google AI Studio, which offers free credits to help you get started. This platform provides an accessible entry point for individuals and businesses interested in testing the model's capabilities. However, some features, such as Jewels, require joining a waitlist for access. As Google continues to roll out updates, broader access to Gemini 2.0's tools is expected in the coming months. This gradual expansion will allow users to fully explore the model's potential and integrate it into their workflows effectively.
[4]
Gemini 2.0 breaks new ground in AI
Google's Gemini 2.0 has new features and capabilities. These include improved multimodal understanding, agentic AI, increased speed, better battery life (even for phones with excellent batteries), and broader integration with other Google solutions. Gemini 2.0 processes information differently than its predecessor and achieves more complex tasks. ✕ Remove Ads Integrations with Google products such as Search, Maps, and Workspace are key focus areas, although some features are still rolling out. Gemini 2.0 is accompanied by a major UI update to NotebookLM, Google's Gemini-powered AI information warehouse that leverages your research materials, links, and datasets. Related Google Gemini: Everything you need to know about Google's next-gen multimodal AI Google Gemini is here, with a whole new approach to multimodal AI Posts 5 Native image and audio processing Eliminating translation promises better responses Source: Grabster / Unsplash.com / Android Police ✕ Remove Ads Unlike previous models, which required converting images and audio into text before analysis, Gemini 2.0 processes them. The goal is to eliminate the information loss associated with translation. Direct processing allows a richer, more nuanced understanding of the input, capturing subtleties and contextual cues that would otherwise be lost. Gemini 2.0 promises a more accurate and efficient interpretation of multimedia content by bypassing the intermediary text conversion step. Gemini 2.0 identifies objects in an image and understands their relationships and the scene context. I tested its abilities, and the response was detailed and accurate. It even recognized the materials from which objects on my coffee table were constructed. I also ran the image through version 1.5 Pro. While it provided some of the same information, its response was less detailed. The Gemini 2.0 Flash model still refused to process an image with people. If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful. - Sundar Pichai, Google CEO ✕ Remove Ads 4 Agentic AI Gemini 2.0 can do more with less Source: Alex Knight / Pexels Agentic AI describes AI models that actively interact with the world to achieve specific goals. Gemini 2.0 powers AI agents, allowing them to execute complex, multistep tasks that require planning, decision-making, and interaction with external systems. Agentic AI may mark a turning point where AI becomes a more proactive problem-solver. Gemini 2.0's agentic capabilities are slated to integrate with external tools like Google Search, Maps, and Lens. For example, a Gemini 2.0 AI agent could leverage Google Maps to plan a complex itinerary involving multiple destinations and modes of transportation. However, this functionality wasn't available to me in the 2.0 Flash desktop or from Maps. Google recently rolled out 2.0 in a pre-release version of its mobile app, which is where we expect to see some of these capabilities shine. ✕ Remove Ads In its blog post, Google discusses how the new model relates to two major Google initiatives: Project Astra and Project Mariner. Project Astra focuses on agentic AI capabilities integrated with services such as Search and Maps. Project Mariner touches on automated web features such as filling out forms, booking reservations, and gathering information from multiple websites. 3 Deeper integrations across the Google ecosystem AI goes everywhere with Gemini 2.0 Source: Google Gemini 2.0 integrates deeply across Google's ecosystem of products and services. The promise is a more unified and seamless user experience. Gemini 2.0's extended integrations point toward Google's strategy of using Gemini as a common thread woven throughout Workspace. ✕ Remove Ads Google Search is getting deeper integration with Gemini 2.0, facilitating more conversational search experiences and leveraging AI Overviews for comprehensive answers to complex queries, as we predicted in early November. Within Google Workspace, AI-powered features driven by Gemini 2.0 are being incorporated into applications like Docs, Slides, and Meet to enhance productivity and collaboration. Android Assistant is set to receive new capabilities powered by Gemini 2.0. Your mileage may vary during the rollout process. 2 Faster responses and better battery life Gemini 2.0 Flash doubles the speed of 1.5 The full name of the latest version is Gemini 2.0 Flash Experimental. It's been streamlined for speed and responsiveness. Gemini 2.0 Flash delivers enhanced performance while reducing latency. This positions Gemini 2.0 Flash to better power real-time multimodal interactions. ✕ Remove Ads Gemini 2.0 Flash claims notable performance improvements. Google says it's twice the speed of its predecessor. In my experimentation, responses were nearly instantaneous. They were markedly faster than when I fed the same queries to version 1.5 Pro. The faster response times make interactions feel natural and fluid. For audio conversations, the reduced latency could reduce delays and create a more engaging and realistic experience. Gemini 2.0 Flash might extend the battery life for AI processes on mobile devices such as your Google Pixel 9 or other smartphone. This could mean less frequent charging, something everyone can appreciate. 1 NotebookLM's reinvented UI Gemini 2.0 is accompanied by a redesign of NotebookLM's interface and new features ✕ Remove Ads It isn't in Gemini 2.0, but the two are different sides of the same coin. The arrival of Gemini 2.0 marks a parallel iteration in NotebookLM. The iteration goes beyond its underlying AI capabilities and into its user interface. The overhaul seeks to make it more intuitive and efficient for users to interact with their notes and documents. It focuses on streamlining workflows, improving navigation, and providing a more refined visual environment. Related I tested NotebookLM and saw the promise of something great NotebookLM has a lot of issues, but the idea behind it has potential Posts Gemini moves fast and isn't slowing down Gemini 2.0 has cool tricks for maximum productivity. Along with recognizing text, it also understands images and sounds. This version promises to do things for you, like using Google Search or Maps to find information or complete complex tasks. Moreover, it has a larger context window than its predecessor. Google pegs Gemini 2.0 Flash at 2 million tokens, meaning it retains and processes twice as much information as Gemini 1.5 Pro. ✕ Remove Ads By focusing on multimodal understanding, agentic capabilities, deeper integrations with Google apps, and performance enhancements, Google is making Gemini the foundation of its ecosystem. As mainstream AI continues to mature, 2025 will be an interesting year.
[5]
How Google's Gemini 2.0 Multimodal API is Changing the Game for Developers and Creators
Google's Gemini 2.0 represents a significant advancement in multimodal artificial intelligence, offering a versatile API that transforms user interactions with AI systems. By supporting text, voice, and visual inputs alongside real-time streaming capabilities, this platform provides a comprehensive toolkit for diverse applications. From assisting with coding tasks to generating creative content, Gemini 2.0 demonstrates its ability to improve productivity and streamline workflows. While some features remain in early access, its robust performance and accessibility position it as a leader in AI-driven solutions. Imagine a single platform seamlessly integrating text, voice, and visual inputs while adapting to your needs in real time. With its multimodal API and real-time streaming capabilities, Gemini 2.0 simplifies workflows and enhances creativity across industries. Whether you're a developer seeking coding assistance or a content creator exploring image generation, this platform offers versatile tools for a range of users. While some features are still in early access, its potential is clear. This guide by All About AI explores how Gemini 2.0 is transforming AI-driven interactions and why it might become an indispensable tool for many. At the core of Gemini 2.0 is its ability to handle real-time multimodal interactions, allowing seamless engagement through text, voice, or visual inputs. This flexibility allows users to interact with the API in ways that feel intuitive and efficient. Whether you're debugging code, summarizing lengthy articles, or analyzing on-screen content, the platform adapts to your needs. Its versatility makes it an invaluable tool for professionals across various industries, from software development to content creation. Gemini 2.0's real-time capabilities redefine how users interact with AI by offering a natural and efficient experience. This multimodal approach ensures that the API can adapt to diverse tasks, whether technical or creative. For example: This adaptability ensures that Gemini 2.0 meets the needs of users across different domains, enhancing both productivity and user experience. Unlock more potential in Google Gemini by reading previous articles we have written. For developers, Gemini 2.0 offers a powerful toolset designed to simplify coding tasks. By integrating conversational AI with real-time coding assistance, the platform helps developers reduce errors and optimize workflows. Key features include: This seamless integration of coding assistance with conversational AI makes Gemini 2.0 an indispensable resource for developers looking to improve efficiency and accuracy in their work. Gemini 2.0 extends its capabilities into the creative realm, offering tools for image and text generation that open new possibilities for designers, marketers, and content creators. While some features are still in early access, they showcase the platform's potential to transform creative workflows. Examples include: These features highlight Gemini 2.0's ability to support creative professionals in generating high-quality content quickly and effectively. Gemini 2.0 enhances its functionality through integration with various tools, making it a versatile solution for tackling complex tasks. Key integrations include: This combination of analytical and creative tools ensures that Gemini 2.0 can handle a wide range of professional tasks, from technical problem-solving to content creation. While Gemini 2.0 offers an impressive array of features, some remain restricted during its early access phase. These limitations include: Despite these constraints, the platform's core features are robust and reliable, providing users with a strong foundation to explore its capabilities. As these experimental features are refined and released, Gemini 2.0 is expected to expand its utility even further. Gemini 2.0 is designed with accessibility in mind, offering free testing options that allow developers and businesses to experiment with its features without significant barriers. During testing, the platform demonstrated: This accessibility encourages innovation, allowing users to explore the API's potential and integrate it into their workflows with ease. The versatility of Gemini 2.0 makes it suitable for a wide range of industries and use cases. Whether you're a developer, a content creator, or a business professional, the platform offers tools to enhance productivity and creativity. Key applications include: By combining text, image, and tool-based functionalities, Gemini 2.0 positions itself as a leader in the AI space, offering solutions that cater to diverse professional needs.
[6]
Google raises the bar with Gemini 2.0 AI platform
Google has launched Gemini 2.0, a significant advancement in its AI models, designed to enhance user interaction and task execution across different platforms. This new model improves upon its predecessor, Gemini 1.5, which was introduced in December 2023. Gemini 2.0 features native multimodal capabilities, allowing it to process and generate content across text, video, images, audio, and code. This model aims to facilitate a more agentic experience in computer tasks, leveraging advanced reasoning to execute user-directed actions. Gemini 2.0 incorporates essential improvements such as enhanced multimodality, including natively generated audio outputs and images. The introduction of the Gemini 2.0 Flash serves as a workhorse model with low latency and high performance, outpacing its predecessor in key benchmarks. Notable capabilities now include the ability to handle multimodal inputs and outputs seamlessly, along with native tool integrations for Google Search and code execution. Sundar Pichai, CEO of Google and Alphabet, emphasized that this advancement builds on their long-standing mission to organize the world's information. "With Gemini 2.0, we're excited to launch our most capable model yet," he stated. The model will be integrated into Google products, starting with Gemini and Search, and will provide new functionalities such as Deep Research, a feature designed to assist with complex topic exploration. Salesforce CEO's big praise for Google Gemini Live The AI Overviews, a key feature of Google Search, now reaches approximately 1 billion users, facilitating an innovative way to pose queries. With Gemini 2.0's enhanced reasoning skills, AI Overviews will tackle more intricate topics, including advanced mathematics and coding tasks. This rollout began with limited testing this week, aiming for broader availability in early next year across various languages and regions. Decade-long investments in custom hardware capabilities, including the Trillium sixth-generation TPUs, have supported the development of Gemini 2.0. These TPUs powered the entirety of the training and inference processes. Gemini 2.0's intent is to not only understand information but also to make it significantly more useful, following an extensive evaluation of feedback from early testers. Gemini 2.0 also introduces several experimental prototypes that explore next-generation AI agent capabilities. The updated Project Astra, for example, enables Gemini 2.0 to perform complex task execution by understanding its environment through camera inputs. Users reported improved dialogue capabilities in multiple languages and better navigation of Google services like Search, Lens, and Maps. Project Astra can remember context for up to ten minutes of in-session communication, enhancing personalization while maintaining user control over memory retention. Project Mariner represents another pivotal prototype, designed for web navigation to aid users with everyday tasks. Demonstrated with a Chrome extension, Project Mariner can fluidly execute actions by interacting with text and images on-screen, exhibiting a benchmark performance of 83.5% against real-world web tasks. Additionally, Jules, a coding assistant powered by Gemini 2.0, integrates within GitHub workflows, enabling developers to delegate complex projects. These advancements showcase a shift in how AI can enhance productivity across various sectors, not limited to coding, but ultimately extending into everyday user applications. As Google DeepMind explores these new AI capabilities, the responsibility of deploying AI safely remains paramount. The company emphasizes an iterative approach that includes assessing risks, engaging trusted testers, and refining their models based on comprehensive risk evaluations. Significant attention is placed on user privacy and safety, especially with features that allow agents to remember or interact with user data. Controls are in place enabling users to easily delete past interactions, and additional measures are being researched to manage potential vulnerabilities, such as instruction manipulation. Hassabis and Pichai have voiced the importance of responsibly orchestrating AI development, indicating ongoing projects will focus on maintaining consistent user instruction adherence and mitigating risks associated with action execution by agents in both digital and physical realms. The developments surrounding Gemini 2.0 reflect Google's commitment to leading in AI innovation while navigating the intricacies of agentic technology. With the rollout of Gemini 2.0 Flash and its corresponding projects, Google aims to enhance user experience while addressing emerging challenges in the evolving landscape of AI. Further updates will continue to reveal how these capabilities will integrate into daily tasks and activities.
[7]
Google Gemini 2.0 Flash Multimodal AI Development Just Got Faster
Google today unveiled Gemini 2.0 Flash Experimental, designed to enable more immersive and interactive applications while introducing new coding agents that enhance workflows by acting directly on behalf of developers. By combining improved performance, real-time interactivity, and a unified development framework, it offers a versatile platform tailored to the needs of developers, businesses, and end-users. Whether focusing on text generation, spatial reasoning, or live interactions, Gemini 2.0 Flash provides tools to expand the possibilities of AI-driven innovation. Building on the success of Gemini 1.5 Flash, Flash 2.0 is twice as fast as 1.5 Pro and delivers stronger performance. It features new multimodal outputs and native tool integrations, alongside the introduction of a Multimodal Live API for creating dynamic applications with real-time audio and video streaming capabilities. Starting today, developers can explore Gemini 2.0 Flash through the Gemini API in Google AI Studio and Vertex AI during its experimental phase. General availability is expected early next year. For projects where speed and accuracy are paramount, Gemini 2.0 Flash sets a new standard. It delivers double the processing speed of its predecessor, Gemini 1.5 Pro, while maintaining an uncompromising level of precision. This performance boost significantly improves text generation, reasoning, and task execution, making sure faster and more dependable outcomes. The system's advanced spatial reasoning capabilities further enhance its utility, allowing it to handle complex spatial queries with ease. This feature is particularly beneficial for applications in navigation, design, and gaming, where spatial accuracy is critical. By integrating these performance upgrades, Gemini 2.0 Flash ensures that your projects are not only efficient but also highly precise, meeting the demands of even the most intricate tasks. Gemini 2.0 Flash introduces robust multimodal capabilities, allowing seamless interaction with a variety of input and output formats. It supports inputs such as images, video, and audio, while generating outputs that include native images, inline text, and multilingual, steerable text-to-speech (TTS) audio. This flexibility makes it a powerful tool for diverse applications. For example, you can create a multimedia recipe guide that combines text, visuals, and audio instructions, offering an engaging user experience. Additionally, its conversational image editing feature enables users to modify visuals using natural language prompts, streamlining creative and professional workflows. These multimodal capabilities open up new possibilities for innovation across industries, from education to entertainment. Find more information on Google Gemini 2.0 Flash by browsing our extensive range of articles, guides and tutorials. In today's interconnected world, real-time interactivity is a critical feature, and Gemini 2.0 Flash excels in this area with its bidirectional streaming API. This functionality supports live voice and video interactions, allowing natural, multilingual conversations with instantaneous responses. The ability to communicate in real time enhances user engagement and ensures seamless interactions. Whether you're developing a customer service chatbot or hosting multilingual meetings, Gemini 2.0 Flash assists smooth, real-time communication. Its integration with tools like Google Search and custom functions further enhances its ability to provide accurate, context-aware responses, bridging communication gaps effectively. This makes it an invaluable resource for businesses and developers aiming to create dynamic, interactive solutions. For developers, Gemini 2.0 Flash simplifies the AI development process with its unified SDK. By merging the functionalities of AI Studio and Vertex AI SDKs, it eliminates the need for extensive code adjustments when transitioning between platforms. This streamlined approach reduces development time and ensures compatibility across various environments. The unified SDK equips developers with the tools to create a wide range of applications, from interactive chatbots to gaming platforms and customer service solutions. Its flexibility and efficiency empower you to bring your ideas to life with minimal friction, making it an essential resource for modern AI development. The versatility of Gemini 2.0 Flash makes it suitable for a broad spectrum of applications. Its advanced features and capabilities enable innovative solutions across multiple industries. Key use cases include: From interactive storytelling to multilingual voice conversations, Gemini 2.0 Flash provides a solid foundation for developing innovative, AI-driven solutions. Its adaptability ensures that it can meet the unique requirements of various industries, making it a versatile tool for modern challenges. As AI technology continues to evolve, Gemini 2.0 Flash positions itself as a forward-thinking platform with ongoing updates and expanded language support. These enhancements pave the way for developers to create innovative applications that use its advanced capabilities. By adopting Gemini 2.0 Flash, you can stay ahead of the curve and contribute to the next generation of multimodal AI technologies. The platform's commitment to innovation ensures that it will remain a key player in the AI landscape, offering tools and features that adapt to the changing needs of users and developers alike. Its potential to shape the future of AI-driven solutions makes it an essential resource for those looking to push the boundaries of what is possible.
[8]
Google's Gemini 2.0 Is All About Efficiency
Google's latest AI model is lightweight, but improves on previous models. Google officially announced Gemini 2.0 on Wednesday, the company's latest upgrade to its flagship AI model. Specifically, Google is rolling out "Gemini 2.0 Flash experimental," succeeding Gemini 1.5 Flash. Google's Flash models are its "lightweight" models, designed for tasks that don't require the most powerful AI models possible, and focus more on efficiency. Still, Google says Gemini 2.0 Flash not only improves upon Flash models like Gemini 1.5 Flash, but also more powerful models like Gemini 1.5 Pro. How does Gemini 2.0 Flash compare to other models? Google says that 2.0 Flash beats both 1.5 Flash and 1.5 Pro in a number of categories, including General MMLU-Pro benchmarking, three different coding benchmarks, a factuality test, two math benchmarks, reasoning, two image benchmarks, and video benchmarks. Some of these wins were close to 1.5 Pro's performance, however others showed significant improvement, such as a 7.5 point increase in the Natural2Code benchmark, or a nine-point increase in the HiddenMath benchmark. 1.5 Pro still beats 2.0 Flash in audio benchmarking (40.1% vs. 39.2%) and long context benchmarking (82.6% vs. 69.2%). In addition to these improvements, Google says 2.0 Flash supports new multimodal outputs, such as AI-generated images combined with text and text-to-speech audio. Plus, it can pull in Google Search, run code, in addition to other third-party functions. Where you'll see Gemini 2.0 Flash You'll probably be seeing a lot of Gemini 2.0 Flash -- whether you know it or not. The company announced that it will be using Gemini 2.0 for Search, specifically AI Overviews. The initial rollout of Google's AI search summaries was, unequivocally, a hot mess. Nevertheless, the company is expressing optimism about the feature: Google says Gemini 2.0 will enable AI Overviews to handle more complicated topics and multi-step queries, as well as new functions like advanced math, multimodal questions (i.e. queries from text, images, documents, etc.) as well as coding. 2.0 Flash is also coming to the Gemini app. In fact, it's already available on desktop and the mobile web experience. You'll just need to choose the model from the drop down menu before testing it out. Google wants AI to do the work for you Google is advertising 2.0 Flash as part of its "agentic era." What this means is that Google wants its products to do more on your behalf, whether that be analyze a question or your surroundings, or to actually complete a task for you. The company says they're working on updates to Project Astra, Google's research department responsible for developing a "universal AI assistant"; Project Mariner, a Chrome extension that utilizes AI to help you while browsing the web; and Jules, the company's AI-powered agent to help developers write code. Google also highlighted a new feature it calls "Deep Research," an AI-powered research assistant that aims to analyze subjects and generate reports for you. You prompt the bot with a topic or question you want to investigate, and it develops a research plan for you to approve or revise. Once approved, it scrapes the internet for sources, and puts together a full report you can export to Google Docs. Like AI Overviews, it includes links to the sources it pulls from, so you can review them on your own.
[9]
Gemini 2.0 is Google's most capable AI model yet and available to preview today
The battle for AI supremacy is heating up. Almost exactly a week after OpenAI made its , Google today is offering a preview of its next-generation Gemini 2.0 model. In a attributed to Google CEO Sundar Pichai, the company says 2.0 is its most capable model yet, with the algorithm offering native support for image and audio output. "It will enable us to build new AI agents that bring us closer to our vision of a universal assistant," says Pichai. Google is doing something different with Gemini 2.0. Rather than starting today's preview by first offering its most advanced version of the model, Gemini 2.0 Pro, the search giant is instead kicking things off with 2.0 Flash. As of today, the more efficient (and affordable) model is available to all Gemini users. If you want to try it yourself, you can enable Gemini 2.0 from the dropdown menu in the Gemini web client, with availability within the coming soon. Moving forward, Google says its main focus is adding 2.0's smarts to Search (no surprise there), beginning with . According to the company, the new model will allow the feature to tackle more complex and involved questions, including ones involving multi-step math and coding problems. At the same time, following a , Google plans to make AI Overviews available in more languages and countries. Looking forward, Gemini 2.0 is already powering enhancements to some of Google's more moonshot AI applications, including , the multi-modal AI agent the company previewed at I/O 2024. Thanks to the new model, Google says the latest version of Astra can converse in multiple languages and even switch between them on the fly. It can also "remember" things for longer, offers improved latency, and can access tools like Google Lens and Maps. As you might expect, Gemini 2.0 Flash offers significantly better performance than its predecessor. For instance, it earned a 63 percent score on HiddenMath, a benchmark that tests the ability of AI models to complete competition-level math problems. By contrast, Gemini 1.5 Flash earned a score of 47.2 percent on that same test. But the more interesting thing here is that the experimental version of Gemini 2.0 even beats in many areas; in fact, according to data Google shared, the only domains where it lags behind are in long-context understanding and automatic speech translation. It's for that reason that Google is keeping the older model around, at least for a little while longer. Alongside today's announcement of Gemini 2.0, the company also debuted Deep Research, a new tool that uses Gemini 1.5 Pro's long-context capabilities to write comprehensive reports on complicated subjects.
[10]
Google's new Gemini 2.0 AI model is about to be everywhere
Less than a year after debuting Gemini 1.5, Google's DeepMind division was back Wednesday to reveal the AI's next-generation model, Gemini 2.0. The new model offers native image and audio output, and "will enable us to build new AI agents that bring us closer to our vision of a universal assistant," the company wrote in its announcement blog post. As of Wednesday, Gemini 2.0 is available at all subscription tiers, including free. As Google's new flagship AI model, you can expect to see it begin powering AI features across the company's ecosystem in the coming months. As with OpenAI's o1 model, the initial release of Gemini 2.0 is not the company's full-fledged version, but rather a smaller, less capable "experimental preview" iteration that will be upgraded in Google Gemini in the coming months. Recommended Videos "Effectively," Google DeepMind CEO Demis Hassabis told The Verge, "it's as good as the current Pro model is. So you can think of it as one whole tier better, for the same cost efficiency and performance efficiency and speed. We're really happy with that." Google is also releasing a lightweight version of the model, dubbed Gemini 2.0 Flash, for developers. Introducing Gemini 2.0 | Our most capable AI model yet With the release of a more capable Gemini model, Google advances its AI agent agenda, which would see smaller, purpose-built models taking autonomous action on the user's behalf. Gemini 2.o is expected to significantly boost Google's efforts to roll out its Project Astra, which combines Gemini Live's conversational abilities with real-time video and image analysis to provide users information about their surrounding environment through a smart glasses interface. Google also announced on Wednesday the release of Project Mariner, the company's answer to Anthropic's Computer Control feature. This Chrome extension is capable of commanding a desktop computer, including keystrokes and mouse clicks, in the same way human users do. The company is also rolling out an AI coding assistant called Jules that can help developers find and improve clunky code, as well as a "Deep Research" feature that can generate detailed reports on the subjects you have it search the internet for. Deep Research, which seems to serve the same function as Perplextiy AI and ChatGPT Search, is currently available to English-language Gemini Advanced subscribers. The system works by first generating a "multi step research plan," which it submits to the user for approval before implementing. Once you sign off on the plan, the research agent will conduct a search on the given subject and then hop down any relevant rabbit holes it finds. Once it's done searching, the AI will regurgitate a report on what its found, including key findings and citation links to where it found its information. You can select it from the chatbot's drop-down model selection menu at the top of the Gemini home page.
[11]
Gemini 2.0 doubles the speed of the AI assistant - and could supercharge search
Google has unveiled the next iteration of the Gemini AI model family, starting with the smallest version, Gemini 2.0 Flash. Gemini 2.0 promises faster performance, sharper reasoning, and enhanced multimodal capabilities, among other upgrades, as it is integrated into Google's various AI and AI-adjacent services. The timing for the news may have something to do with wanting to step on OpenAI and Apple's toes with their respective 12 Days of OpenAI and new Apple Intelligence news this week, especially since Gemini 2.0 is mostly built around experiments for developers. Still, there are some immediate opportunities for those on the non-enterprise side of things. Specifically, Gemini Assistant users and those who see AI Overviews when using Google Search will be able to engage with Gemini 2.0 Flash. If you interact with the Gemini AI through the website on a desktop or mobile browser, you can now play with Gemini 2.0 Flash. You can pick it from the list of models in the drop-down menu. The new model is also on its way to the mobile app at some point. It may not be life-changing, but Gemini 2.0 Flash's speed at processing and generating content is notable. It's far faster than Gemini 1.5 Flash; Google claims the new model will react at twice the speed while still outperforming even the more powerful Gemini 1.5 Pro model. Google is infusing Gemini 2.0 into its AI Overviews feature as well. AI Overviews already write summaries to answer search queries on Google without requiring clicking on websites. The company boasted that there are more than a billion people who have seen at least one AI Overview since the feature debuted and that it has led to engagement with a wider array of sources than usual. Incorporating Gemini 2.0 Flash has made AI Overviews even better at tackling complicated, multi-step questions, Google claims. For example, say you're stuck on a calculus problem. You can upload a photo of the equation, and AI Overviews will not only understand it but walk you through the solution step-by-step. The same goes for debugging code. If you describe the issue in a search submission, the AI Overview might produce an explanation for the issue or even write a corrected version for you. It essentially bakes Gemini's assistant abilities into Google Search. Most of the Gemini 2.0 news centers around developers, who can access Gemini 2.0 Flash through the Gemini API in Google AI Studio and Vertex AI; there's also a new Multimodal Live API for those who want to create interactive, real-time experiences, like virtual tutors or AI-driven customer service bots. Ongoing experiments for developers that may lead to changes for consumers are also getting a boost from Gemini 2.0, including universal AI assistant Project Astra, browser AI task aide Project Mariner, and partnerships with game developers to improve how in-game characters interact with players. It's all part of Google's ongoing effort to put Gemini in everything. But, for now, that just means a faster, better AI assistant that can keep up, if not outright, beat ChatGPT and other rivals.
[12]
Google launches Gemini 2 -- here's why its a big deal
Gemini 2 is the latest AI model from Google and the first version of the family to launch is the fast, yet powerful Gemini 2 Flash. The company says this is the start of its 'agent era' where AI is capable of performing tasks without human input. Google launched its Gemini AI just over a year ago, bringing in a new era for the search giant that saw the rollout of AI overviews, the Gemini chatbot and more. Initially only available to developers or as an experimental model for Gemini Advanced subscribers, Gemini Flash 2 outperforms the previous version on almost all benchmarks despite being smaller and faster. CEO Sundar Pichai says Gemini 2 has advanced reasoning capabilities and these will also be coming to AI Overviews in Google Search to offer more accurate responses to complex, multi-step questions. Gemini 2 will likely also have a Pro version, with the model powering all of the Google Gemini products including the Android app, chatbot and experiments. Gemini 2 is being dubbed the 'agent era' by Google. It is a model capable of advanced reasoning similar to OpenAI's o1 but can also natively output images, speech, text and more. The first model in the family is Gemini 2.0 Flash but the current release is branded as 'experimental'. Google says it is twice the speed of Gemini Pro 1.5, the previous flagship model, while also outperforming it on key benchmarks. Demis Hassabis, CEO of Google DeepMind describes Gemini 2.0 Flash as the "workhorse model" with low latency and enhanced performance. It can natively generate images, text and speech where previously Gemini had to call on other models like Imagen to perform those tasks. It also outperforms all previous Gemini models at reasoning, is significantly better at visual understanding, can translate speech at speed from audio and can analyze video better than Pro 1.5. Video analysis was already a Gemini special feature. When Gemini 1.0 launched we were in the 'chatbot' era of AI models where you could converse with them and use them to create content. Then, with the arrival of OpenAI o1 we entered the reasoning era, and simultaneously the agent era. Agents, in AI terms, are where a model can create versions of itself to perform a range of functions on your behalf. Google has also launched a new agent tool in Gemini that can go off and browse the web for you, and return a report on a complex topic -- this is known as Deep Research and is built into Gemini Advanced. Hassabis explained: "The practical application of AI agents is a research area full of exciting possibilities. We're exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done." He added that this includes Project Astra -- a universal virtual assistant revealed during Google I/O and the new Project Mariner. This "explores the future of human-agent interaction, starting with your browser," he said, as well as Jules which is a code agent to help developers. Gemini 2.0 Flash is currently available to subscribers of Gemini Advanced in the drop down model menu. It is labelled as experimental but I found it works fine. You can also use it as a developer in the Gemini API or the powerful Google Gemini AI Studio. Google says it will be coming to all Google products in 2025.
[13]
Gemini 2.0: Google's AI is Transforming Gaming and Beyond
Google has unveiled its wealth of new advancements in artificial intelligence today, spearheaded by the Gemini 2.0 model. These developments are designed to enhance practical applications across diverse domains, including voice assistance, gaming, web automation, spatial reasoning, and developer tools. By integrating multimodal capabilities with real-time interaction, Google is expanding how you engage with technology in everyday tasks, creative endeavors, and professional settings. "We're getting 2.0 into the hands of developers and trusted testers today. And we're working quickly to get it into our products, leading with Gemini and Search. Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users. We're also launching a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf. It's available in Gemini Advanced today." Whether you're a developer looking to build smarter tools, a gamer seeking an edge, or someone who just wants a little extra help in the kitchen, these advancements are tailored to fit into your life in meaningful ways. And while the possibilities may sound endless, the real magic lies in how these tools are designed to work with you -- not for you -- making sure that you remain in control every step of the way. Let's dive into how Google's Gemini 2.0 is reshaping the way we interact with technology and what it means for your everyday life. At the core of Google's AI advancements is the Astra AI Voice Assistant, a tool engineered to provide real-time guidance across a variety of activities. Whether you are cooking, gaming, or exercising, Astra delivers actionable insights tailored to your needs. For example: This level of interaction bridges the gap between human intuition and machine precision, making complex or repetitive tasks easier to manage while enhancing your overall experience. Project Mariner introduces a browser-based AI agent designed to simplify and automate intricate web tasks. Whether you are extracting data, navigating websites, or managing tools like Google Sheets, Mariner streamlines these processes while keeping you in control. Its standout features include: This tool is particularly beneficial for professionals handling large-scale data or repetitive online tasks. By automating these processes, Mariner saves time, improves efficiency, and reduces the cognitive load associated with manual operations. Google's AI tools are transforming the gaming landscape by acting as a virtual coach that provides real-time insights and strategies. By analyzing live data from sources like Reddit and other gaming communities, the AI offers tailored recommendations to enhance your gameplay. Key features include: Whether you are a competitive gamer seeking an edge or a casual player looking to improve, these tools enhance both skill development and the overall gaming experience. Gemini 2.0's multimodal capabilities empower developers to create applications that seamlessly integrate text, images, and real-time data. These tools open up a wide range of possibilities, such as: By allowing developers to merge multiple data types effortlessly, these tools foster innovation and problem-solving across industries, making it easier to address complex challenges with precision and creativity. Google's advancements in spatial reasoning extend to both 2D and experimental 3D capabilities, allowing the AI to analyze images, identify object positions, and reason about physical spaces. These features have practical applications in various fields, including: These capabilities are particularly valuable in industries where spatial accuracy and visualization are critical, offering tools that enhance both efficiency and precision. To encourage experimentation and customization, Google provides AI Studio and API integration tools. These resources allow developers to tailor Gemini 2.0's capabilities to meet specific needs. For instance, you can: This flexibility ensures that the technology adapts to your requirements, making it a versatile tool for a wide range of applications across different sectors. Google envisions AI as a foundational operating system that enables seamless interaction with technology. Potential applications include: By prioritizing responsible development, Google ensures that these technologies are rigorously tested under human oversight. This approach emphasizes trust, reliability, and ethical use, making sure that AI continues to evolve in ways that benefit individuals and society as a whole.
[14]
Google's Gemini 2.0 update will supercharge your phone -- 3 changes to try first
Google Gemini 2.0 is here, and it has some impressive changes to help improve the user experience, particularly for Android users. There was a lot to love about the initial Gemini release but also some issues and teething problems with the software. However, Google has since released Google Gemini 2.0, which outperforms the original on almost all benchmarks, despite being smaller and faster. Some of the biggest changes offer some major advantages to phone users. This includes improvements to speed, multimodal understanding and integration. Gemini 2.0 includes native image and audio processing. Essentially, Gemini doesn't have to convert images and audio into text before processing them, which is what the original Gemini did. The advantage to this is that it allows the AI to better capture more subtleties and contextual clues that can be lost in the transfer. This makes using it on your phone much more detailed, especially if you need to quickly get information on a captured image or YouTube video. So, for instance, you can take a photo or record a conversation and be sure to have a much more comprehensive breakdown. One of the biggest advantages of Google Gemini is simply how well it integrates with the rest of Google's services. For instance, Google Search is getting deeper integration, making for more conversational search experiences, and better AI overviews. Google Workspace will also see more integration in features like Docs, slides and Google Meet. What this means is that the tasks you usually do on your phone will become just that bit better with the AI involved, without you needing to actively do anything. The biggest advantage for phone users, however, is Gemini 2.0 Flash. This new, smaller model offers more speed and responsiveness than 1.5 Flash. Google has designed it to deliver enhanced performance, while at the same time reducing the latency. It helps to feel more natural when using the AI, while also making the whole experience a bit more streamlined Another advantage of this is that, while Gemini 2.0 is more powerful, you could see your battery last much longer while using it. So if you make use of it on something like the Pixel 9 Pro, you'll be able to make use of all the best features without having to run back and forth to charge the device. All of this alone would be a major win, but we are awaiting the implementation of Google Gemin's AI agents, which are capable of executing complex tasks with multiple steps. So, for instance, the AI can easily plan a trip on Google Maps with an itinerary and varied stops at the push of a button. While the feature wasn't available on the 2.0 flash desktop for maps, we expect it to be part of the mobile app which recently saw a pre-release rollout. There's a lot to be excited about with Gemini 2.0, and arguably we're only scratching the surface of what the AI is capable of. When we compared Gemini 2.0 to Gemini 1.5 the differences were astounding, so if you are a fan of AI then this is one to grab.
[15]
Gemini 2.0 is here, as Google continues push towards agentic AI
CEO Sundar Pichai says 2.0 will help us to understand information better Google has announced its latest family of AI models - Gemini 2.0 - with outlandish promises such as the ability to outperform previous models while being twice as fast. Launching in the same month as OpenAI's 12 Days of OpenAI announcements and new Apple Intelligence features in non-US markets, developers can now get their hands on the Gemini 2.0 Flash experimental model through the Gemini API in Google AI Studio and Vertex AI. Gemini 2.0 Flash can now generate text, images and audio in a single API call, helping to streamline AI content generation and boosting productivity. Gemini 2.0 can also execute searches, run rode and interact with third-party applications to expand its usefulness across more areas as a single tool to complete the job, rather than users having to jump between the AI chatbot and other applications. "Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," Google CEO Sundar Pichai explained. Apart from the direct improvements we can see and benefit from now, Gemini 2.0 also marks a step closer to agentic AI - models capable of planning, reasoning and taking actions with user guidance (and that bit's important). For example, Google has used Gemini 2.0 to power its experimental code agent, Jules, which integrates directly into GitHub workflows despite that platform's Microsoft ownership (and therefore its OpenAI, GPT affiliation). Google DeepMind CEO Demis Hassabis and CTO Koray Kavukcuoglu explained that Jules can "tackle an issue, develop a plan and execute it." They added that the company is building numerous safeguards to protect users from potential harm, including evaluating and training the model further and introducing mitigations against users unintentionally sharing sensitive information. Google's universal AI assistant, Project Astra, is also able to remember more of its previous conversations and work with more tools, like Google Search, Lens and Maps. Project Astra also sees improved latency, allowing it to understand human language "at about the latency of human conversation." Pichai added: "If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful."
[16]
Gemini Advanced Gets a More Capable Gemini 2.0 Model; Here's How to Use It
Google says it's designed to handle complex queries including coding, math, and reasoning problems. After releasing the Gemini 2.0 Flash model for all Gemini users, Google has now unveiled a more capable, "Gemini-Exp-1206" model for paid Gemini Advanced users. It's an experimental model, which was being tested on AI Studio for the last few weeks. The new Gemini-Exp-1206 model offers improved performance across coding, math, reasoning, and instruction following. I have used the Gemini-Exp-1206 model on AI Studio before, and it seemed like a powerful model. It doesn't do inference scaling like OpenAI's o1 reasoning models, but the performance is almost there. Basically, you get a much faster model that delivers top-notch performance. Note that Google says Gemini-Exp-1206 is an early preview so it might not work in some cases. Additionally, this model can't access real-time information from the web and some features are currently unavailable. If you are subscribed to Gemini Advanced, you can open the drop-down menu and select "2.0 Experimental Advanced" to use the new Gemini-Exp-1206 model. It's available on both desktop and mobile web, but you can't use it on the mobile app yet. And if you want to use it for free, you can head to aistudio.google.com. Amid the ongoing "12 days of OpenAI" announcements, Google has been releasing exciting new features, projects, and models to overshadow OpenAI's planned releases. So far, Google has launched Veo 2, its AI video generator which truly looks much more capable than competing models including Sora. Other than that, Google has demoed Project Mariner, Project Astra, and Deep Research for Advanced users.
[17]
Google releases Gemini 2.0 with new and improved capabilities
The new upgrades include an AI research assistant with agentic capabilities. Last year, Google released Gemini, its best effort to tackle OpenAI's ChatGPT, and yesterday (11 December), the company released Gemini 2.0, its "most capable model yet". As part of the announcement, the company said it is releasing an experimental version of Gemini 2.0 Flash, updates to Project Astra, a new Project Mariner, a multimodal live API and Deep Research, a new agentic feature in Gemini Advanced. Moreover, Google also released Trillium, its sixth-generation tensor processing unit - the company's custom circuits used to accelerate machine learning workloads - to customers. According to CEO Sundar Pichai, Gemini 2.0, which is currently in the hands of developers and testers, will enable Google to build new AI agents that bring them closer to their vision of a universal assistant. What are big the updates? The new Gemini 2.0 Flash builds on and outperforms its previous iteration - the 1.5 Flash - with twice as fast speed, enhanced performance and low latency, Google announced. The "workhorse model" supports multimodal inputs including images, video and audio as well as multimodal output including natively generated images mixed with text. Moreover, users can also access a chat-optimised version of the 2.0 Flash experimental, which, according to Google, is an "even more helpful Gemini assistant". Meanwhile, Google has rolled out Deep Research, its AI research tool in Gemini Advanced on desktop and mobile web browsers, which is expected to be available in a mobile app version in early 2025. According to Google, Deep Research uses AI to explore complex topics on a user's behalf. Based on a user's input, the AI assistant refines its analysis, conducts multiple search queries into its initial findings and provides a "comprehensive report" of its key findings with links to original sources - exportable into a Google Doc. Deep Research is the first feature in Gemini to bring agentic capabilities - or the ability to act independently - to life, said Google. In the announcement yesterday, Pichai said: "We're getting 2.0 into the hands of developers and trusted testers today. And we're working quickly to get it into our products, leading with Gemini and Search. "No product has been transformed more by AI than Search," he said. "As a next step, we're bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews ... We started limited testing this week and will be rolling it out more broadly early next year." Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.
[18]
Google Unveils Gemini 2.0 and 'Deep Research' For Advanced Users
It supports native multimodal output including native image generation and audio output. After months of anticipation, Google has finally released its next-generation Gemini 2.0 model. Google is first releasing the Gemini 2.0 Flash model which according to the company, outperforms its flagship Gemini 1.5 Pro model on key benchmarks. Google says it's 2x faster and more efficient than its larger models. Not only that, it supports native multimodal output such as native image generation and native text-to-speech multilingual audio output. It can also natively interact with Google Search and perform code execution as well. As for benchmarks, Gemini 2.0 Flash achieves 62.1% in GPQA (Diamond), 76.4% in MMLU-Pro, and 92.9% in Natural2Code. Gemini 2.0 Flash is now available on the web version of Gemini to all users including free and paid users. You need to click on the drop-down menu and select the "Gemini 2.0 Flash Experimental" model to use it. Google says the Gemini 2.0 Flash model will soon be added to the Gemini app on Android and iOS. As for paid Gemini Advanced users, they get access to a new feature called "Deep Research" which uses advanced reasoning to solve complex queries. It can also help you compile reports for you. It seems Gemini Advanced users have access to something like OpenAI's o1 reasoning models that can "think" step-by-step and reason through harder questions. Google says Gemini 2.0 is coming to AI Overviews early next year. AI Overviews will be able to understand complex queries, advanced math equations, coding questions, and multimodal input. Notably, the search giant says the Gemini 2.0 model was trained entirely on its custom TPUs like Trillium, and inference is also done on its in-house TPUs.
[19]
Google unveils Gemini 2.0 AI model built for agentic experiences
Google on Wednesday announced Gemini 2.0, its most advanced AI model to date, designed to support the evolving "agentic era." With features like native image and audio output, multimodal support, and advanced tool integration, Gemini 2.0 aims to create more capable AI agents, edging closer to Google's vision of a universal assistant. Gemini 2.0 Flash builds on the success of its predecessor, 1.5 Flash, offering significantly faster performance -- twice as fast as the 1.5 Pro model. It supports multimodal inputs such as images, videos, and audio, alongside multimodal outputs like generated images and multilingual text-to-speech (TTS) audio. The model also enables native tool usage, including Google Search and third-party functions, enhancing versatility. Gemini 2.0 Flash introduces capabilities such as multimodal reasoning, long-context understanding, and advanced planning, enabling the development of more complex AI agents. Google is testing these possibilities through various prototypes, including Project Astra, Project Mariner, and Jules. Leveraging DeepMind's gaming experience, Google is developing AI agents that can reason based on game actions and provide real-time suggestions. These agents are being tested in games like Clash of Clans and Hay Day in collaboration with developers like Supercell. Google is also exploring Gemini 2.0's potential for robotics and spatial reasoning in physical environments. Google is committed to responsible AI development, focusing on safety and security as it explores new agentic capabilities. Key steps include: Google remains focused on responsible AI development, ensuring safety and security are integral to their approach. Google noted that the release of Gemini 2.0 Flash and several research prototypes mark a new chapter in the Gemini era. The advancements signify an exciting milestone in AI development, as Google continues to build toward Artificial General Intelligence (AGI) while prioritizing safety. Speaking about Gemini 2.0, Google and Alphabet CEO Sundar Pichai said,
[20]
Google Unveils Gemini 2.0: A New AI Model for the Agentic Era
Deep Research provides advanced capabilities for exploring and summarizing complex topics. Google has announced the launch of Gemini 2.0, the latest iteration of its artificial intelligence (AI) model. Designed for what Google calls the "agentic era," Gemini 2.0 introduces advanced multimodal capabilities, enabling it to interact, reason, and take proactive actions across a range of tasks. Building on its predecessors, Gemini 1.0 (introduced last December) and Gemini 1.5, the new model further advances multimodality and long-context understanding to process information across text, video, images, audio, and code. Also Read: Google, Intersect Power and TPG Rise Climate Partner to Power AI Data Centers with Clean Energy "Information is at the core of human progress. It's why we've focused for more than 26 years on our mission to organise the world's information and make it accessible and useful. And it's why we continue to push the frontiers of AI to organise that information across every input and make it accessible via any output, so that it can be truly useful for you," said Sundar Pichai, CEO of Google and Alphabet. Gemini 2.0 Flash is now available as an experimental model to developers through the Gemini API in Google AI Studio and Vertex AI. Google aims to quickly integrate it into products like Gemini and Search. Starting December 11, Gemini 2.0 Flash will be accessible to all Gemini users. Google also unveiled Deep Research, a feature leveraging advanced reasoning and long-context capabilities to act as a research assistant. It explores complex topics and compiles reports on behalf of users. This feature is available within Gemini Advanced. Also Read: Google and Vodafone Expand Partnership to Bring AI-Powered Services Across Europe and Africa AI overviews now reach over 1 billion users globally. Google plans to incorporate Gemini 2.0's advanced reasoning capabilities into these overviews to address complex topics, multi-step questions, advanced math equations, multimodal queries, and coding challenges. Testing has begun, with broader rollout expected early next year. By 2025, AI overviews will expand to more countries and languages. "2.0's advances are underpinned by decade-long investments in our differentiated full-stack approach to AI innovation. It's built on custom hardware like Trillium, our sixth-generation TPUs. TPUs powered 100 percent of Gemini 2.0 training and inference," Pichai noted. "If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful." Also Read: Google Announces AI Collaborations for Healthcare, Sustainability, and Agriculture in India "We are releasing the first model in the Gemini 2.0 family of models: an experimental version of Gemini 2.0 Flash. It's our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale," said Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind on behalf of the Gemini team. The first model in the Gemini 2.0 family, Gemini 2.0 Flash, is optimised for low latency and enhanced performance at scale. According to Google, it outperforms Gemini 1.5 Pro on key benchmarks while operating at twice the speed. Notably, it supports multimodal outputs such as natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio. Google is also releasing a new Multimodal Live API, enabling real-time audio and video streaming inputs with the use of multiple combined tools. Also Read: Google Features Startups Using AI to Transform Mental Health Support According to Google, Gemini 2.0 Flash's native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences. Google said it is exploring prototypes built on Gemini 2.0, including: Project Astra: A personal AI assistant with enhanced memory, multilingual dialogue, and integration with Google tools like Search, Lens, and Maps. Astra now retains up to 10 minutes of in-session memory and can recall past conversations. Google plans to extend these capabilities to Gemini and AR glasses. Project Mariner: A browser-based agent capable of completing tasks by interpreting web elements and user interactions. "Project Mariner is an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser. As a research prototype, it's able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you," Google explained. Jules: An AI-powered coding assistant integrated with GitHub workflows to support software development. This effort is part of Google's long-term goal of building AI agents that are useful across all domains, including coding. At this juncture, Google has also highlighted the launch of its large-scale foundation world model, Genie 2, which the company unveiled on December 4. Genie 2 is a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. Building on this advancement, Google said it has built agents using Gemini 2.0 that can help users navigate the virtual world of video games. Also Read: Google AI Innovations: Key Announcements From October and November 2024 Beyond virtual applications, Google is experimenting with agents that apply Gemini 2.0's spatial reasoning capabilities to robotics, enabling new possibilities in the physical world. "Today's releases mark a new chapter for our Gemini model. With the release of Gemini 2.0 Flash, and the series of research prototypes exploring agentic possibilities, we have reached an exciting milestone in the Gemini era," Google said on December 11.. The company plans to integrate Gemini 2.0 across its suite of products, starting with Search and the Gemini app, while continuing to explore its capabilities in collaboration with developers, trusted testers, and experts.
[21]
Google unleashes Gemini 2.0 with new image and audio powers for the AI agent era
If Google wants to fill our phones with AI, it needs to give Pixels more storage This article covers a developing story. Continue to check back with us as we will be adding more information as it becomes available. Generative AI was every tech company's focus in 2023 thanks to its undeniable potential -- but as the calendar turned over to 2024, stagnation set in and things became more iterative. When Google released Gemini 1.0 almost exactly one year ago, multimodal AI was its primary focus, allowing input and output through various forms of media. Now, as the potential for AI agents injects new life into the scene, Gemini 2.0 has arrived to connect some dots between that multimodal past and the agentic future. ✕ Remove Ads Related Google Gemini: Everything you need to know about Google's next-gen multimodal AI Google Gemini is here, with a whole new approach to multimodal AI Posts Google announced Gemini 2.0 on its blog today, with a foreword from CEO Sundar Pichai (via The Verge). The headlining changes are native image and audio abilities and built-in access to external tools like Google Search, but there's a lot more going on under the surface. Native images and audio bring new multimodal powers With the native image and audio handling, Gemini 2.0 can understand photos and sounds just as easily as it does text. This is a major step forward for Google's multimodal approach, as the AI now works with images and audio directly rather than converting them to text first. The other side of this coin is that Gemini can now generate images and audio seamlessly -- no need to call on tools like Imagen 3. ✕ Remove Ads
[22]
Google Announces Gemini 2.0, New AI Agents
Google announced Gemini 2.0, a brand new Artificial Intelligence (AI) model, alongside a number of AI agents built on the system, on December 11. The model is capable of processing multimodal input and output natively and seamlessly within conversations and even use tools like Google Lens or Search. Multimodal models are AI models which are capable of working in different formats like audio, video, images and text. An experimental version of the model, called Gemini 2.0 Flash, is available for all Gemini users, but with text-to-speech and native image generation restricted to early-access partners. The model will be available to the general public in January next year. Google plans on bringing Gemini 2.0 to AI Overviews by early 2025, allowing it to tackle more complicated requests including advanced maths equations and multimodal queries, which include both audio and image inputs. The tech giant has also released a new feature called Deep Research, for Gemini Advanced, which compiles in-depth reports on a users behalf in response to prompts. Once a user enters a query, Deep Research will create a research plan for the user to approve, reject or modify. Gemini will then begin collecting relevant information from the internet and create a report of the key findings, alongside links to the sources. Project Astra, Google's experimental AI Assistant which debuted last year, returns with improvements. The latest version is built on Gemini 2.0 and includes new features like: Google also announced Project Mariner, a browser-based AI agent prototype built with Gemini 2.0 that can navigate web interfaces autonomously. It demonstrated an 83.5% success rate on the WebVoyager benchmark for completing real-world web tasks. The prototype can interpret browser content including text, images, and web elements, and interact with web pages through a Chrome extension. It also requires user confirmation for sensitive actions and limits interactions to the active browser tab. Jules is an experimental code agent built on Gemini 2.0 aimed at developers. Users can integrate Jules directly into a GitHub workflow and outsource bug fixes and other time-consuming tasks to the agent. Jules creates multi-step plans to address issues and is capable of modifying multiple files and even preparing pull requests to land fixes directly back into GitHub. Developers can review the plans along the way, and modify them as they see fit. They can also review, change and integrate the code Jules writes. The agent is currently undergoing beta testing and will be available to interested developers next year.
[23]
What is the new Gemini 2.0 all about? Here's a breakdown of Google's latest AI model
Google is on the verge of rolling out its latest AI model, Gemini 2.0, and its features are already amazing technology geeks across the world. Here are some of the most important features of Gemini 2.0, and the major differences from its predecessorGoogle has finally announced the unveiling of its latest AI model, Gemini 2.0, which will have some new and exquisite features that were not present with its predecessor, Gemini 1.0. This announcement by Google is now placing it ahead in the AI race, and it will be now taking on its competitors like Open AI and Meta Platforms in a major way. Moreover, this decision will hopefully bring in a major movement in Alphabet Inc's stocks as well, as per latest reports. According to Google's CEO Sundar Pichai, Gemini 2.0 is mostly about making information much more useful and this latest launch is projected to bring in 'a new era' for the world of artificial technology. Moreover, Gemini 2.0 will be equipped in a manner to understand context much better than its previous version, and will even have the capability of taking supervised actions on behalf of users. Also Read : US New Year's Resolutions: How Some People Actually Stuck to Their Goals Gemini 2.0's capability extends in a major way like every other AI assistant, but its features are way more powerful since it has the ability to think a few steps in advance about the user's needs and choices. Moreover, it will also be able to retrace the prompts of the user and think accordingly about what to do next, even before the user requests it, as per reports. Google's shares have suddenly shot through the roof after the latest announcement of Gemini 2.0 by the company's top leadership. It increased by a whopping 4 per cent on Wall Street recently, according to an AFP report. Also Read : After withdrawing from Donald Trump's attorney general pick, here's where Matt Gaetz is pivoting to Meanwhile, Google is promising that this will easily be the next stage of an AI revolution that was initially sparked by the 2022 launch of ChatGPT by OpenAI, which took the world by storm. Is Google launching Gemini 2.0? Google has already announced the launch of Gemini 2.0, and their CEO Sundar Pichai has said that this could be the next wave of AI revolution. What will be Gemini 2.0's best feature? Gemini 2.0 is set to work like one of the most advanced AI assistants in the world, and will have the ability to take supervised actions on behalf of its users.
[24]
Google announces 'agentic' Gemini 2.0 with image and audio support
Not to be outdone by OpenAI's Sora drop, Google just released its new AI model Gemini 2.0. On Wednesday, Google introduced Gemini 2.0 Flash, the first family member of the next generation of AI models. Gemini 2.0 Flash is described in the announcement as a "workhorse model" for developers, capable of powerful performance at scale. Flash supports image and audio generation, has native integration with Google Search, can write code, and works with third-party apps. Alongside the Gemini 2.0 Flash announcement, Google also introduced Deep Research, a Gemini feature that browses the web and compiles research reports based on the initial prompt. Gemini 2.0 Flash is a step up from Gemini 1.0 in that has improved reasoning, longer context windows, ability to understand complex instructions, and native tool use -- all of which has been designed to make the model more agentic, in other words capable of executing multi-step tasks on the user's behalf. As part of this, Google said Gemini 2.0 would be available for Project Astra, a research prototype for testing a universal AI assistant. Google also shared other research prototypes: Project Mariner, which is specifically designed to explore "human-agent interaction," and Project Jules for developers. Gemini 2.0 Flash is available as an "experimental model" via the Gemini API which can be accessed in Google AI Studio and Vertex AI. But casual users can also try out its improved chat capabilities in the Gemini desktop app, with mobile app support coming soon.
[25]
Google goes "agentic" with Gemini 2.0's ambitious AI agent features
On Wednesday, Google unveiled Gemini 2.0, the next generation of its AI-model family, starting with an experimental release called Gemini 2.0 Flash. The model family can generate text, images, and speech while processing multiple types of input including text, images, audio, and video. It's similar to multimodal AI models like GPT-4o, which powers OpenAI's ChatGPT. "Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times," said Google in a statement. "Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed." Gemini 2.0 Flash -- which is the smallest model of the 2.0 family in terms of parameter count -- launches today through Google's developer platforms like Gemini API, AI Studio, and Vertex AI. However, its image generation and text-to-speech features remain limited to early access partners until January 2025. Google plans to integrate the tech into products like Android Studio, Chrome DevTools, and Firebase. The company addressed potential misuse of generated content by implementing SynthID watermarking technology on all audio and images created by Gemini 2.0 Flash. This watermark appears in supported Google products to identify AI-generated content. Google's newest announcements lean heavily into the concept of agentic AI systems that can take action for you. "Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," said Google CEO Sundar Pichai in a statement. "Today we're excited to launch our next era of models built for this new agentic era."
[26]
Gemini 2.0: Our latest, most capable AI model yet
Today, we're announcing Gemini 2.0 -- our most capable AI model yet, designed for the agentic era. Gemini 2.0 has new capabilities, like multimodal output with native image generation and audio output, and native use of tools including Google Search and Maps. We're releasing an experimental version of Gemini 2.0 Flash, our workhorse model with low latency and enhanced performance. Developers can start building with this model in the Gemini API via Google AI Studio and Vertex AI. And Gemini and Gemini Advanced users globally can try out a chat optimized version of Gemini 2.0 by selecting it in the model dropdown on desktop. We're also using Gemini 2.0 in new research prototypes, including Project Astra, which explores the future capabilities of a universal AI assistant; Project Mariner, an early prototype capable of taking actions in Chrome as an experimental extension; and Jules, an experimental AI-powered code agent. We continue to prioritize safety and responsibility with these projects, which is why we're taking an exploratory and gradual approach to development, including working with trusted testers. This week, we started testing Gemini 2.0 in AI Overviews in Search, and early next year, we'll expand Gemini 2.0 to more Google products.
[27]
Google's "Agentic" AI Models are a Highlight of Gemini 2.0 - Phandroid
As it further commits to its AI efforts, Google recently announced the arrival of Gemini 2.0, the latest version of its Gemini AI model. The new version comes with support for new features including image and audio output, as well as "agentic era" capabilities, which essentially allows AI models to perform tasks on their own. These Agentic AI models are able to work in different situations and contexts, such as creating to-do lists, ordering items online, scheduling appointments, all through user prompts, and in an autonomous manner. These Agentic AI models include Project Astra which works with Android phones and can integrate with certain apps including Google Search, Lens and Maps. There's also Project Mariner which can find its way within web browsers, with minimal direction from users. READ: Here's How to Use the Spotify Extension for Gemini on Android Also announced was Gemini 2.0 Flash, an experimental AI version that packs improvements in tasks such as coding and mathematics, and the ability to generate images via Google DeepMind's Imagen 3 model. According to Google, Gemini 2.0 Flash is available now for developers via the Gemini API in Google AI Studio and Vertex AI, with multimodal input and text output available for all developers; text-to-speech and native image generation are also available for early-access partners. We can expect more Gemini 2.0 models next year, as early as January.
[28]
Gemini 2.0: Google's Next-Gen AI Model
Google has unveiled its newest AI model, Gemini 2.0. This Google AI model promises faster performance and more capabilities, like generating images and audio across multiple languages. The company revealed that Gemini 2.0 would assist users in a variety of tasks, from Google searches to coding projects. Gemini 2.0 is twice as fast as its predecessor. New AI agents under Gemini 2.0 will be able to think, remember, plan, and even act efficiently on behalf of users. Google's director of Product Management, Tulsee Doshi, pointed out that these upgrades would help create incredibly interactive and responsive AI-powered assistants. This step is part of Google's improvement measures to maintain its lead in the search and ad revenue sectors. The heating competition from companies like OpenAI has prompted Google to embark on this innovation.
[29]
Google launches Gemini 2.0 AI models, and showcases their powers in new agents
Google on Wednesday gave the public and developers a taste of the second generation of its Gemini frontier models, and a preview of some of the agents it will power. The new Gemini 2.0 family of models is designed to power new AI agents that understand more than just text, and reason and complete tasks with more autonomy. Google described how the new models will improve an experimental agent called Project Astra, which lets AI process information seen through a camera. It previewed another experimental agent, now called Project Mariner that's designed to perform web tasks on behalf of the user. "[O]ur next era of models [is] built for this new agentic era," said Google CEO Sundar Pichai in a blog post Wednesday. "With new advances in multimodality-like native image and audio output-and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant." The term "universal assistant" implies an AI agent with artificial general intelligence (AGI), or the ability to do most tasks as well or better than humans. Experts say the industry is anywhere from two to 10 years away from realising that aspiration. Google isn't yet unveiling the largest and most capable of its Gemini 2.0 models. That may come in another announcement in January. For now it's releasing to developers an experimental version of a smaller and faster variant called Gemini Flash 2.0. "It's our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale, Google Deepmind CEO Demis Hassabis says in a blog post.
[30]
Google announces Gemini 2.0 with agentic focus, coming to Gemini app
Just over a year after version 1.0, Google today announced Gemini 2.0 as its "new AI model for the agentic era." CEO Sundar Pichai summarizes it as such: "If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful." For Google, agents are systems that get something done on your behalf by being able to reason, plan, and have memory. The first model available is Gemini 2.0 Flash, which notably "outperforms 1.5 Pro on key benchmarks" -- across code, factuality, math, reasoning, and more -- at twice the speed. It supports multimodal output like "natively generated images mixed with text" for "conversational, multi-turn editing," and multilingual audio that developers can customize (voices, languages, and accents). Finally, it can natively call tools like Google Search (for more factual answers) and code execution. An experimental version of Gemini 2.0 Flash is available today in AI Studio and Vertex AI for developers. It will enter GA (general availability) in January, with more model sizes coming. Google also has a new Multimodal Live API for "real-time audio, video-streaming input" from cameras or screens. For end users in the Gemini app, Google says the new model results in an "even more helpful Gemini assistant." Both Gemini and Gemini Advanced users will be able to use a chat-optimized version of 2.0 Flash experimental in gemini.google.com this week. Go to the model dropdown menu in the top-left corner. Access is "soon" coming to the mobile app. Google is also beginning to test Gemini 2.0 in Search's AI Overviews. This will allow the generated response to answer "more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding." It will be "more broadly" available early next year. Finally, Gemini 2.0 will be coming to more Google products early next year.
[31]
Google Gemini 2.0: Could this be the beginning of truly autonomous AI?
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google unveiled Gemini 2.0 today, marking an ambitious leap toward AI systems that can independently complete complex tasks while introducing native image generation and multilingual audio capabilities -- features that position the tech giant for direct competition with OpenAI and Anthropic in an increasingly heated race for AI dominance. The release arrives almost exactly one year after Google's initial Gemini launch, emerging during a pivotal moment in artificial intelligence development. Rather than simply responding to queries, these new "agentic" AI systems can understand nuanced context, plan multiple steps ahead, and take supervised actions on behalf of users. How Google's new AI assistant could reshape daily digital life During a recent press conference, Tulsee Doshi, Director of Product Management for Gemini, outlined the system's enhanced capabilities while demonstrating real-time image generation and multilingual conversations. "Gemini 2.0 brings enhanced performance and new capabilities like native image and multilingual audio generation," Doshi explained. "It also has native intelligent tool use, which means that it can directly access Google products like search or even execute code." The initial release centers on Gemini 2.0 Flash, an experimental version that Google claims operates at twice the speed of its predecessor while surpassing the capabilities of more powerful models. This represents a significant technical achievement, as previous speed improvements typically came at the cost of reduced functionality. Inside the new generation of AI agents that promise to transform how we work Perhaps most significantly, Google introduced three prototype AI agents built on Gemini 2.0's architecture that demonstrate the company's vision for AI's future. Project Astra, an updated universal AI assistant, showcased its ability to maintain complex conversations across multiple languages while accessing Google tools and maintaining contextual memory of previous interactions. "Project Astra now has up to 10 minutes of in-session memory, and can remember conversations you've had with it in the past, so you can have a more helpful, personalized experience," explained Bibo Xu, Group Product Manager at Google DeepMind, during a live demonstration. The system smoothly transitioned between languages and accessed real-time information through Google Search and Maps, suggesting a level of integration previously unseen in consumer AI products. The battle for enterprise AI heats up as Google targets developer community For developers and enterprise customers, Google introduced Project Mariner and Jules, two specialized AI agents designed to automate complex technical tasks. Project Mariner, demonstrated as a Chrome extension, achieved an impressive 83.5% success rate on the WebVoyager benchmark for real-world web tasks -- a significant improvement over previous attempts at autonomous web navigation. "Project Mariner is an early research prototype that explores agent capabilities for browsing the web and taking action," said Jaclyn Konzelmann, Director of Product Management at Google Labs. "When evaluated against the WebVoyager benchmark, which tests agent performance on end-to-end, real-world web tasks, Project Mariner achieved the impressive results of 83.5%." Custom silicon and massive scale: The infrastructure behind Google's AI ambitions Supporting these advances is Trillium, Google's sixth-generation Tensor Processing Unit (TPU), which becomes generally available to cloud customers today. The custom AI accelerator represents a massive investment in computational infrastructure, with Google deploying over 100,000 Trillium chips in a single network fabric. Logan Kilpatrick, a product manager on the AI studio and Gemini API team, highlighted the practical impact of this infrastructure investment during the press conference. "The growth of flash usage has been like more than more than 900% which has been incredible to see," Kilpatrick said. "You know, we've had like six experimental model launches in the last few months, there's now millions of developers who are using, who are using Gemini." The road ahead: Safety concerns and competition in the age of autonomous AI Google's shift toward autonomous agents represents perhaps the most significant strategic pivot in artificial intelligence since OpenAI's release of ChatGPT. While competitors have focused on enhancing the capabilities of large language models, Google is betting that the future belongs to AI systems that can actively navigate digital environments and complete complex tasks with minimal human intervention. This vision of AI agents that can think, plan, and act marks a departure from the current paradigm of reactive AI assistants. It's a risky bet -- autonomous systems bring inherently greater safety concerns and technical challenges -- but one that could reshape the competitive landscape if successful. The company's massive investment in custom silicon and infrastructure suggests it's prepared to compete aggressively in this new direction. However, the transition to more autonomous AI systems raises new safety and ethical concerns. Google has emphasized its commitment to responsible development, including extensive testing with trusted users and built-in safety measures. The company's approach to rolling out these features gradually, starting with developer access and trusted testers, suggests an awareness of the potential risks involved in deploying autonomous AI systems. The release comes at a crucial moment for Google, as it faces increasing pressure from competitors and heightened scrutiny over AI safety. Microsoft and OpenAI have made significant strides in AI development this year, while other companies like Anthropic have gained traction with enterprise customers. "We firmly believe that the only way to build AI is to be responsible from the start," emphasized Shrestha Basu Mallick, Group Product Manager for the Gemini API, during the press conference. "We'll continue to prioritize making safety and responsibility a key element of our model development process as we advance our models and agents." As these systems become more capable of taking action in the real world, they could fundamentally reshape how people interact with technology. The success of Gemini 2.0 could determine not only Google's position in the AI market but also the broader trajectory of AI development as the industry moves toward more autonomous systems. One year ago, when Google launched the first version of Gemini, the AI landscape was dominated by chatbots that could engage in clever conversation but struggled with real-world tasks. Now, as AI agents begin to take their first tentative steps toward autonomy, the industry stands at another inflection point. The question is no longer whether AI can understand us, but whether we're ready to let AI act on our behalf. Google is betting we are -- and it's betting big.
[32]
Google unveils latest AI model, Gemini 2.0
Google unveiled Gemini 2.0, its most advanced AI model, marking a "new agentic era." This model enhances context understanding, multi-step thinking, and supervised actions for users. Initially released to developers, Gemini 2.0 will integrate into Google products, including Search, powered by Google's Trillium processors. Broader Search integration is slated for early 2025.Google on Wednesday announced the launch of Gemini 2.0, its most advanced artificial intelligence model to date, as the world's tech giants race to take the lead in the fast-developing technology. CEO Sundar Pichai said the new model would be marking what the company calls "a new agentic era" in AI development, with AI models designed to understand and make decisions about the world around you. "Gemini 2.0 is about making information much more useful," Pichai said in the announcement, emphasizing the model's enhanced ability to understand context, think multiple steps ahead, and take supervised actions on behalf of users. Google, ChatGPT maker OpenAI, Meta and Amazon are furiously taking steps to release more powerful AI models despite their immense cost and some questions about their immediate usefulness to the broader economy. An AI agent, the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. The tech giants promise that agents will be the next stage of an AI revolution that was sparked by the 2022 launch of ChatGPT, which took the world by storm. Gemini 2.0 is initially being rolled out to developers and trusted testers, with plans for broader integration across Google's products, particularly in Search and the Gemini platform. The technology is powered by Google's sixth-generation TPU (Tensor Processing Unit) hardware, dubbed Trillium, which the company has now made generally available to customers. Google emphasized that Trillium processors were used exclusively for both training and running Gemini 2.0. Most AI training has been monopolized by chip juggernaut Nvidia, which has been catapulted by the AI explosion to become one of the world's most valuable companies. Google said that millions of developers are already building applications with Gemini technology, which has been integrated into seven Google products, each serving more than two billion users. The broader rollout of Gemini 2.0's enhanced Search capabilities is scheduled for early 2025, with plans to expand AI Overviews to additional countries and languages throughout the year. The first release from the 2.0 family of models will be Gemini 2.0 Flash, offering faster performance while handling multiple types of input (text, images, video, audio) and output (including generated images and text-to-speech). The Gemini app is getting 2.0 Flash integration globally, with plans to expand to more Google products in early 2025.
[33]
Gemini 2.0 Flash AI Model Now Available on Web and Mobile Apps
Google said Gemini 2.0 Flash outperforms 1.5 Pro in several benchmarks Google introduced the successor of the Gemini 1.5 family of AI models, dubbed Gemini 2.0, on Wednesday. The new AI models come with improved capabilities including native support for image generation and audio generation, the company highlighted. Currently, the Gemini 2.0 model is available in beta to select developers and testers, whereas the Gemini 2.0 Flash AI model has been added to the web and mobile apps of the chatbot for all users. Google said the larger model will also be pushed to its products soon. Nine months after the release of the Gemini 1.5 series of AI models, Google has now introduced the upgraded version of the large language model (LLM). In a blog post, the company announced that it was releasing the first model in the Gemini 2.0 family -- an experimental version of Gemini 2.0 Flash. The Flash model generally contains fewer parameters and is not fit for complex tasks. However, it compensates for it with low latency and higher efficiency than larger models. The Mountain View-based tech giant highlighted that the Gemini 2.0 Flash now supports multimodal output such as image generation with text and steerable text-to-speech (TTS) multilingual audio. Additionally, the AI model is also equipped with agentic functions. 2.0 Flash natively calls tools like Google Search, code execution-related tools, as well as third-party functions once a user defines them via the API. Coming to performance, Google shared Gemini 2.0 Flash's benchmark scores based on internal testing. On the Massive Multitask Language Understanding (MMLU), Natural2Code, MATH, and Graduate-Level Google-Proof Q&A (GPQA) benchmarks, it outperforms even the Gemini 1.5 Pro model. Gemini users can select the experimental model from the model selector option located at the top left of the web and the top of the mobile app interface. Apart from that, the AI model is also available via the Gemini application programming interface (API) in Google AI Studio and Vertex AI. The model will be available to developers with multimodal input and text output. Image and text-to-speech capabilities are currently only available to Google's early-access partners.
[34]
Google Launches Gemini 2.0 Making the Age of AI Agents a Reality
2025 will be the year of AI agents and Gemini 2.0 will be the generation of models that underpin our agent-based work.' As predicted by AIM, Google has finally launched Gemini 2.0, its next-generation AI model, built to redefine multimodal capabilities and introduce agentic functionalities. "Today we're excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet. With new advances in multimodality -- like native image and audio output -- and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant," Google said in the blog post. "This is really just the beginning. 2025 will be the year of AI agents and Gemini 2.0 will be the generation of models that underpin our agent-based work," said Google DeepMind chief Demis Hassabis. Gemini 2.0 Flash supports multimodal inputs, including images, video, and audio, as well as multimodal outputs such as natively generated images combined with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, execute code, and integrate third-party user-defined functions. The Gemini 2.0 Flash model offers faster response times and outperforms its predecessors on major benchmarks. Developers can access Gemini 2.0 Flash through Google AI Studio and Vertex AI, with general availability expected by January 2025. Google has also launched the Multimodal Live API, bringing real-time audio and video input capabilities that allow developers to create dynamic, interactive applications. Introduced at Google I/O 2024, Google Project Astra, a universal AI assistant, has received several updates. It now supports multilingual and mixed-language conversations, with an improved understanding of accents and uncommon words. Powered by Gemini 2.0, Project Astra can also utilise Google Search, Lens, and Maps, making it a more practical assistant for daily tasks. Its memory has been enhanced, allowing up to 10 minutes of in-session recall and better personalisation through past interactions. Additionally, improved streaming and native audio processing reduce latency, enabling near-human conversational speeds. Google has also announced an early-stage research prototype, Project Mariner, which will understand and reason based on information that can be accessed while a user navigates on a web browser. Google says that the agent uses information it sees on the screen through a Google Chrome extension to complete related tasks. The agent will be able to read information, like text, code, images, forms and even voice-based instructions. "Book a flight from SF to Berlin, departing on March 5 and returning on the 12. The era of being able to give a computer a fairly complex high-level task and have it go off and do a lot of the work for you is becoming a reality," said Jeff Dean, chief scientist at Google DeepMind. Google has also introduced Jules, a developer-focused agent, that integrates with GitHub workflows to assist with coding tasks under supervision. Google DeepMind is working on AI agents that improve video games and navigate 3D worlds. It has partnered with game developers like Supercell to explore the future of AI-powered gaming companions. Gemini 2.0's spatial reasoning is also being tested in robotics for practical real-world use. Notably, it recently launched Genie 2, a large-scale foundation world model capable of generating a wide variety of playable 3D environments.
[35]
Google releases the first of its Gemini 2.0 AI models
Google released the first version of its Gemini 2.0 family of artificial intelligence models on Wednesday. Gemini 2.0 Flash, as the model is called, is available in a chat version for users globally while an experimental multimodal version of the model, with text-to-speech and image generation features, is available to developers. "If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful," Google CEO Sundar Pichai said in a statement. Google's latest large language model outperforms its predecessors in the majority of user request areas, such as code generation and the ability to provide factually correct responses from user requests. One area where it is inferior to Gemini 1.5 Pro is when it comes to evaluating longer contexts. To access the chat-optimized version of the experimental Flash 2.0, Gemini users can select it in the model drop-down menu on desktop and mobile web. It will be available on the Gemini mobile app soon, the company said. The multimodal version of Gemini Flash 2.0 will be available via the Google's AI Studio and Vertex AI developer platforms. General availability of Gemini 2.0 Flash's multimodal version will come in January, along with more Gemini 2.0 model sizes, Google said Wednesday. The company said it also plans to expand Gemini 2.0 to more Google products in early 2025. Gemini 2.0 represents Google's latest efforts in the tech industry's increasingly competitive AI race. Google is competing against the likes of rivals like tech giants Microsoft and Meta and startups like OpenAI, the maker of ChatGPT, Perplexity and Anthropic, which makes Claude. Along with the release of the new Flash model are other research prototypes aimed at developing more "agentic" AI models and experiences. Agentic models, according to the company, "can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." Last week, in a conversation with Andrew Ross Sorkin at The New York Times' DealBook Summit, Pichai challenged Microsoft's AI advancement, saying he'd "love to do a side-by-side comparison" of the two companies' models "any day, any time."
[36]
Google unveils Gemini 2.0 AI model for agentic era
Gemini 2.0 Flash cites multimodality advances and is expected to enable the development of AI agents as universal assistants. Emphasizing a new AI model for the agentic era, Google has introduced Gemini 2.0, which the company calls its most capable model yet. Announced December 11, the Gemini 2.0 Flash experimental model will be available to all Gemini users. Gemini 2.0 is billed as having advances in multimodality, such as native image and audio output, and native tool use. Google anticipates Gemini 2.0 enabling the development of new AI agents closer to the vision of a universal assistant. Agentic models can understand more, think multiple steps ahead, and take action on a user's behalf, with supervision, Google CEO Sundar Pichai said Gemini 2.0's advances are underpinned by decade-long investments in a differentiated full-stack approach to AI innovation, Pichai said. The technology was built on custom hardware such as Trillium, which features sixth-generation TPUs (tensor processing units), which powered Gemini 2.0 training and inference. Trillium is also generally available to customers who want to build with it.
[37]
Google's next generation of Gemini is here to get some work done
Agentic AI systems built with Gemini 2.0 promise complex problem-solving skills. A little over a year ago, we entered the era of Gemini. Google's ever-evolving AI aspirations picked up their latest branding as the company announced its set of Gemini 1.0 models. It only took another month to start moving into Gemini 1.5 territory, and recently we've been hearing rumors about when Google might cross the big 2.0 threshold. It looks like those predictions of a December launch were right on the money, as Google introduces Gemini 2.0, designed for the age of agentic AI.
[38]
Google's new Gemini model will power AI agents
The tech giant introduced Gemini 2.0, which "will enable us to build new AI agents that bring us closer to our vision of a universal assistant," Google chief executive Sundar Pichai said Wednesday, adding that the company is working to roll out the new model across its products. AI agents are software that can complete complex tasks autonomously for a user. Gemini 2.0 has new multimodal capabilities, including native image and audio output, Pichai said. Google launched Gemini 1.0 last December, which the company said was the first "natively multimodal" model, meaning it could process and respond to text, video, image, audio, and code inquiries. Developers and testers will be the first to get 2.0, while all Gemini users will have access to the Gemini 2.0 Flash experimental model. The Flash model builds off of Gemini 1.5 Flash, which Google launched in July as its fastest, most cost-efficient model. Google will add Gemini 2.0's reasoning capabilities to its AI Overviews feature, which Pichai said now reaches one billion people and is "quickly becoming one of our most popular Search features ever." With Gemini 2.0, AI Overviews will be able to solve advanced multi-step queries, such as mathematical equations and multimodal questions. Limited testing for Gemini 2.0 in AI Overviews started this week, Pichai said, but the reasoning feature will roll out to more users early next year. The reasoning model runs on Google's custom 6th-generation AI chip, Trillium, which became available to Google Cloud customers on Wednesday. The new chip delivers 4x better performance and is 67% more energy efficient than its predecessor, according to the company. Pichai also announced a new Gemini feature called Deep Research, which can "act as a research assistant" by using advanced reasoning and long context capabilities. Deep Research, which is available in Gemini Advanced, can compile research reports on behalf of a user.
[39]
Gemini 2.0, Google's newest flagship AI, can generate text, images, and speech | TechCrunch
Google's next major AI model has arrived to combat a slew of new offerings from OpenAI. On Wednesday, Google announced Gemini 2.0 Flash, which the company says can natively generate images and audio in addition to text. 2.0 Flash can also use third-party apps and services, allowing it to tap into Google Search, execute code, and more. An experimental release of 2.0 Flash will be available through the Gemini API and Google's AI developer platforms, AI Studio and Vertex AI, starting today. However, the audio and image generation capabilities are launching only for "early access partners" ahead of a wide rollout in January. In the coming months, Google says that it'll bring 2.0 Flash in a range of flavors to products like Android Studio, Chrome DevTools, Firebase, Gemini Code Assist, and others. The first-gen Flash, 1.5 Flash, could generate only text, and wasn't designed for especially demanding workloads. This new model is more versatile, Google says, in part because it can call tools like Search and interact with external APIs. "We know Flash is extremely popular with developers for its ... balance of speed and performance," Tulsee Doshi, head of product for Gemini model at Google, said during a briefing Tuesday. "And with 2.0 Flash, it's just as fast as ever, but now it's even more powerful." Google claims that 2.0 Flash, which is twice as fast as the company's Gemini 1.5 Pro model on certain benchmarks, per Google's own testing, is "significantly" improved in areas like coding and image analysis. In fact, the company says, 2.0 Flash displaces 1.5 Pro as the flagship Gemini model, thanks to its superior math skills and "factuality." As alluded to earlier, 2.0 Flash can generate -- and modify -- images alongside text. The model can also ingest photos and videos, as well as audio recordings, to answer questions about them (e.g. "What did he say?"). Audio generation is 2.0 Flash's other key feature, and Doshi described it as "steerable" and "customizable." For example, the model can narrate text using one of eight voices "optimized" for different accents and languages. "You can ask it to talk slower, you can ask it to talk faster, or you can even ask it to say something like a pirate," she added. Now, I'm duty-bound as a journalist to note that Google didn't provide images or audio samples from 2.0 Flash. We have no way of knowing how the quality compares to outputs from other models, at least as of the time of writing. Google says it's using its SynthID technology to watermark all audio and images generated by 2.0 Flash. On software and platforms that support SynthID -- that is, select Google products -- the model's outputs will be flagged as synthetic. That's to allay fears of abuse. Indeed, deepfakes are a growing threat. According to ID verification service Sumsub, there was a 4x increase in deepfakes detected worldwide from 2023 to 2024. The production version of 2.0 Flash will land in January. But in the meantime, Google is releasing an API, the Multimodal Live API, to help developers build apps with real-time audio and video streaming functionality. Using the Multimodal Live API, Google says, developers can create real-time, multimodal apps with audio and video inputs from cameras or screens. The API supports the integration of tools to accomplish tasks, and it can handle "natural conversation patterns" such as interruptions -- along the lines of OpenAI's Realtime API. The Multimodal Live API is generally available as of this morning.
[40]
Introducing Gemini 2.0: our new AI model for the agentic era
A note from Google and Alphabet CEO Sundar Pichai: Information is at the core of human progress. It's why we've focused for more than 26 years on our mission to organize the world's information and make it accessible and useful. And it's why we continue to push the frontiers of AI to organize that information across every input and make it accessible via any output, so that it can be truly useful for you. That was our vision when we introduced Gemini 1.0 last December. The first model built to be natively multimodal, Gemini 1.0 and 1.5 drove big advances with multimodality and long context to understand information across text, video, images, audio and code, and process a lot more of it. Now millions of developers are building with Gemini. And it's helping us reimagine all of our products -- including all 7 of them with 2 billion users -- and to create new ones. NotebookLM is a great example of what multimodality and long context can enable for people, and why it's loved by so many. Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision. Today we're excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet. With new advances in multimodality -- like native image and audio output -- and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant. We're getting 2.0 into the hands of developers and trusted testers today. And we're working quickly to get it into our products, leading with Gemini and Search. Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users. We're also launching a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf. It's available in Gemini Advanced today. No product has been transformed more by AI than Search. Our AI Overviews now reach 1 billion people, enabling them to ask entirely new types of questions -- quickly becoming one of our most popular Search features ever. As a next step, we're bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding. We started limited testing this week and will be rolling it out more broadly early next year. And we'll continue to bring AI Overviews to more countries and languages over the next year. 2.0's advances are underpinned by decade-long investments in our differentiated full-stack approach to AI innovation. It's built on custom hardware like Trillium, our sixth-generation TPUs. TPUs powered 100% of Gemini 2.0 training and inference, and today Trillium is generally available to customers so they can build with it too. If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful. I can't wait to see what this next era brings. -Sundar
[41]
Google's Gemini 2.0 AI promises to be faster and smarter via agentic advances
Google has long been obsessed with speed. Whether it's the time it takes to return a search result or the time it takes to bring a product to market, Google has always been in a rush. This approach has largely benefited the company. Faster, more comprehensive search results pushed Google to the top of the market. But fast product releases have resulted in a long history of public betas and failed or discontinued products. There's even a website called Killed by Google that catalogs all of Google's failures. While that list is shockingly extensive, the company has also launched winners like Gmail and Adsense. These products helped skyrocket the company way beyond search. Also: AI is moving undercover at work in 2025, according to Deloitte's Tech Trends report So, you can imagine how frustrated Google's management has been over the last year or so when the AI revolution seemed to leave the company in the dust. While Google has invested in AI technologies for years, ChatGPT just blasted through and achieved chatbot domination in a very short time. Google responded, of course. Its Gemini generative AI tool, introduced at the end of 2023, has been embedded at the top of the Google SERP (search engine results page). In a blog post today, Google and Alphabet CEO Sundar Pichai reports, "Our Al Overviews now reach 1 bilion people, enabling them to ask entirely new types of questions -- quickly becoming one of our most popular Search features ever." But, as I reported based on my own testing, Google's AI failed pretty hard, both at coding and even at its own awareness of its own capabilities. Yet Pichai, in that same blog post, contends that "Since last December when we launched Gemini 1.0, millions of developers have used Google AI Studio and Vertex AI to build with Gemini." Also: OpenAI rolls out Canvas to all ChatGPT users - and it's a powerful productivity tool I'm sure that's true, and it probably means that Google's AI is suitable for certain development tasks -- and not others. Because Google is so Python-centric, I'd bet that most of those developers were focusing on Python-related projects. In other words, there's been room for improvement. It's quite possible that improvement has just happened. Google today is announcing Gemini 2.0, along with a raft of developer-related improvements. The Gemini 2.0 announcement comes to us through a blog post by Demis Hassabis and Koray Kavukcuoglu, CEO and CTO of Google DeepMind, respectively. The top-level headline says that Google 2.0 is "our new Al model for the agentic era." We'll come back to the agentic bit in a minute because first we need to discuss the Gemini 2.0 model. Technically, Gemini 2.0 is a family of models, and what's being announced today is an experimental version of Gemini 2.0 Flash. Google describes it as "our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale." Also: Why Google's legal troubles could hasten Firefox's slide into irrelevance That's going to take some unpacking. The Gemini Flash models are not chatbots. They power chatbots and many other applications. Essentially, the Flash designation means that the model is intended for developer use. A key component of the announcement goes back to our speed theme. Gemini 2.0 Flash outperforms Gemini 1.5 Flash by two to one, according to Hassabis and Kavukcuoglu. Earlier versions of Gemini Flash supported multimodal inputs like images, video, and audio. Gemini 2.0 Flash supports multimodal output, such as "natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google search, code execution, as well as third-party user-defined functions." Steerable text-to-speech, by the way, is the idea that you can specify things like voice customizations (male or female, for example), the style of speech (i.e., formal, friendly, etc.), speech speed and candence, and possibly language. Also: Is this the end of Google? This new AI tool isn't just competing, it's winning Developers can use Gemini 2.0 Flash starting now. It comes in the form of an experimental model that can be accessed using the Google API in Google AI Studio and Vertex AI. Multimodal input and text output is available to all developers, but text-to-speech and image generation features are only available to Google's early-access partners. Non-developers can also play with Gemini 2.0 via the Gemini AI assistant, both in desktop and mobile versions. This "chat optimized" version of 2.0 Flash can be chosen in the model drop-down menu, where "users can experience an even more helpful Gemini assistant." So, now let's get back to the whole agentic thing. Google describes agentic as providing a user interface with "action-capabilities." Pichai, in his blog post, say agentic AI "can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." Also: Agentic AI is the top strategic technology trend for 2025 I'm glad he added "with your supervision" because the idea of an AI that understands the world around you and can think multiple steps ahead is the plot behind so many science fiction stories I've read over the years -- and they never ended well for the human protagonists. Gemini 2.0 has a laundry list of improvements including: Taken together, these improvements help set up Gemini 2.0 for agentic activities. Google's Project Astra illustrates just how all of these capabilities come together. Project Astra is a prototype AI assistant that integrates real-world information into its responses and results. Think of it as a virtual assistant, where both the location and the assistant are virtual. Also: 25 AI tips to boost your programming productivity with ChatGPT Tasks Astra might be asked to perform include recommending a restaurant or developing an itinerary. But unlike a chatbot AI, the assistant is expected to combine multiple tools, like Google Maps and Search, make decisions based on the user's existing knowledge, and even take the initiative if, say, there's road construction en route to a possible destination. In that case, the AI might recommend a different route or, if time is constrained, perhaps even a different destination. Project Mariner is another ambitious Google research project, although I find it a bit more scary as well. Mariner works with what's on your browser screen, essentially reading what you're reading, and then responding or taking action based on some criteria. Mariner is expected to interpret pixel content as well as text, code, images, and forms, and -- with some serious guard rails, one would hope -- take on real world tasks. Right now, Google admits that Mariner is doing fairly well, but isn't always accurate and can sometimes be somewhat slow. Jules is an experimental agent for developers. This one also seems scary to me, so it may well be that I'm just not quite ready to let AIs run loose on their own. Jules is an agent that integrates into GitHub workflows and is expected to manage and debug code. According to today's blog post by Shrestha Basu Mallick, Group Product Manager of the Gemini API and Kathy Korevec, Director of Product at Google Labs, "You can offload Python and Javascript coding tasks to Jules." Also: Gen AI could speed up coding, but businesses should still consider risks They go on to say, "Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build. Jules creates comprehensive, multi-step plans to address issues, efficiently modifies multiple files, and even prepares pull requests to land fixes directly back into GitHub." I can definitely see how Jules could foster an increase in productivity, but it also makes me uncomfortable. I've occasionally delegated my code to human coders and gotten back stuff that could only be described as, "Holy crap, what were you thinking?" I'm concerned about getting back similarly problematic work from artificial coders. Giving an Al the ability to go in and change my code seems risky. If something goes wrong, finding what was changed and reverting it, even with tools like Git and other version control tools, seems like a big step. I've had to undo work from underperforming human coders. It was not fun. I understand the benefits of automated coding. I certainly don't love debugging and fixing my own code, but giving up that level of control is daunting, at least to me. Also: Gen AI gives software developers surge in productivity - but it's not for everyone That said, if Google is willing to trust its own code base to Gemini 2.0 and Jules, who am I to judge? The company is certainly eating its own dog food, and that counts for a lot. Google seems to firmly believe that AI can help make its products more helpful in a wide range of applications. But the company also seems to get the obvious concerns, stating, "We recognize the responsibility it entails, and the many questions Al agents open up for safety and security." Hassabis and Kavukcuoglu say that they're "taking an exploratory and gradual approach to development, conducting research on multiple prototypes, iteratively implementing safety training, working with trusted testers and external experts and performing extensive risk assessments and safety and assurance evaluations." Also: 4 ways to turn generative AI experiments into real business value They give a number of examples of the risk management steps they're taking, including: Google states, "We firmly believe that the only way to build Al is to be responsible from the start and we'll continue to prioritize making safety and responsibility a key element of our model development process as we advance our models and agents." This is good. AI has enormous potential to be a boon to productivity but is also incredibly risky. While there's no guarantee BigTech won't accidentally create our own Forbin Project Colossus, or a cranky Hal-9000, at least Google is aware of the risks and is paying attention. So, what do you think about all of these Google announcements? Are you excited for Gemini 2.0? Do you think you might use a public version of Project Astra or Mariner? Are you currently using Gemini as your AI chatbot, or do you prefer another one? Let us know in the comments below.
[42]
Google puts AI agents at the center of Gemini update
SAN FRANCISCO - Alphabet's Google on Wednesday released the second generation of its artificial intelligence model Gemini and teased a slate of new ways to use AI beyond chatbots, including through a pair of eyeglasses. CEO Sundar Pichai in a blog post dubbed the moment as the start of a "new agentic era," referring to virtual assistants that can perform tasks with greater autonomy. "They can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." The releases underscore the methods by which Google is aiming to reclaim the lead in the race to dominate the emerging technology. Microsoft-backed OpenAI captured global attention when it released chatbot ChatGPT in November 2022. Google unveiled Gemini in December 2023 and now offers four versions. On Wednesday, it released an update to Flash, its second cheapest model, with improved performance and added features to process images and audio. Other models will come next year. OpenAI has in recent days announced a flurry of new offerings to diversify its prospects including a $200-a-month ChatGPT subscription for advanced research use and the availability of its text-to-video model Sora. Google's play involves injecting its AI advances into applications that already enjoy widespread adoption. Search, Android and YouTube are among seven products that the company says are used by more than 2 billion people monthly. That user base is a significant advantage over challenger startups such as search startup Perplexity, which is seeking a $9 billion valuation, and newer research labs like OpenAI, Anthropic or Elon Musk's xAI. The Gemini 2.0 Flash model will power applications including AI Overviews in its search engine. Alphabet's biggest bet is AI for search, Ruth Porat, the president and chief investment officer, said at the Reuters NEXT conference in New York on Tuesday. Google also showed reporters new capabilities for Project Astra, a prototype universal agent which can talk to users about anything captured on their smartphone camera in real time. The tool can now hold a conversation spoken in a mix of languages, as well as process information from Maps and image recognition tool Lens, DeepMind group product manager Bibo Xu told reporters. And Astra will also be tested on prototype eyeglasses, the company's first return to the product area since the failure of Google Glasses. Others have since entered the market including Meta which in September unveiled an AR glasses prototype. Google also showed reporters Project Mariner, a Chrome web browser extension which can automate keystrokes and mouse clicks in the vein of rival lab Anthropic's "computer use" feature, a feature to improve software coding called Jules, and a tool to assist consumers in making decisions like what to do or which items to buy in video games. (Reporting by Kenrick Cai in San Francisco; Editing by Peter Henderson and Kim Coghill)
[43]
Google unveils Gemini 2.0 Flash as the foundation for AI agent experiences - SiliconANGLE
Google unveils Gemini 2.0 Flash as the foundation for AI agent experiences Google LLC today released the first model in the Gemini 2.0 artificial intelligence family, an experimental version of Gemini 2.0 Flash designed to become the foundation for generative AI agents and assistants. Gemini 2.0 Flash builds on the company's 1.5 Flash, a lightweight workhorse large language model, optimized for speed and efficiency. The company noted that Flash 2.0 outperforms Gemini 1.5 Pro, the company's largest and most complex AI model, in some key benchmarks, while performing at twice the speed. The model supports inputs such as images, video and audio, but has been updated to support multimodal outputs such as natively generated images mixed with text and text-to-speech audio. To become a superior model for assistants, Google also enabled it to use external tools such as Google Search, code execution and third-party functions. Just as Gemini 1.5 Flash was popular with developers, 2.0 Flash is now available as an experimental model via the Gemini application programming interface to early access partners through Google AI Studio and Vertex AI, a Google Cloud platform that allows users to train and deploy models. General availability is planned for January. Starting today, 2.0 Flash experimental is available via dropdown menu on desktop and mobile web in the Gemini chat assistant for users to test. It will be available in the Gemini mobile app soon. The company said it will be coming to more Google products soon. Putting Gemini 2.0 Flash to work, Google's team said that it has been exploring several new products that will build on its foundation for using new features that will focus on generative AI agents and assistant capabilities. AI agents are pieces of intelligent software that can work proactively on behalf of human users gather information and use tools to achieve goals. For example, unlike current assistants, which are only conversational, answer questions and summarize information, an AI agent would be able to go out and complete tasks such as shopping or purchasing tickets. "Gemini 2.0 Flash's native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences," Google said about the update. Google introduced Project Astra as an initiative to develop a universal AI assistant at Google I/O 2024 in May. Astra is capable of natural-sounding speech conversations with users and answering questions about the world. With the addition of Gemini 2.0, Astra can interact with Google Search to retrieve information, Lens to identify objects and Maps to understand local areas. The team also improved its ability to remember things, allowing it to recall details from conversations such as reminders, where a user wants to go, phone numbers and lock codes. This also enables users to personalize the assistant. Also thanks to Gemini 2.0, Astra can switch between multiple languages mid-conversation. The same capability also makes it better at understanding accents and uncommon words, which can cause trouble even for many speech-recognition AI models today. Google said the company is working on bringing testers these AI assistant capabilities to more devices, such as hands-free glasses. The company is also expanding the number of trusted testers who have access to Astra. Another AI agent prototype that Google is building with Gemini 2.0 Flash is Project Mariner, which will allow the model to surf the web for users. It's able to take control of the browser and understand information on the screen, including elements such as links, text, code, buttons and forms to navigate web pages. Currently in testing, it works as a Chrome extension that can complete some tasks for users while keeping the human in the loop. In a demonstration, Google had Mariner go through a Google Sheet of company names and the names of people and prompted the AI model to find their contact emails. The model then took over the browser to go to the websites, find email addresses and finally display the information it found. Each step of the way, the model displayed its reasoning and the user could watch it in action - even interrupt it if necessary. Since users could prompt the model to go grocery shopping on e-commerce websites or purchase tickets, Google researchers said that it would not finalize purchases without direct human interaction, but could be tasked with going through the motions of finding items and loading up carts. Jules is an experimental AI-powered coding agent that uses Gemini 2.0 and can work on its own to complete tedious work through direct integration with a GitHub codebase based on prompts from a developer. "It's very good at bug fixes, small features, things like that, you can almost think of it like a junior engineer and you're there directing it," Kathy Korevec, director of product management at Google Labs, told SiliconANGLE in an interview. Jules exists as a standalone application that takes a GitHub repository and creates its own copy to work on. Once it's given a "task," which is what Google calls the prompt from the developer, it generates a plan to produce the bug fixes or code changes and then provides that to the user to see what it intends to do. From there, it begins a multi-step process of fixing and coding to make the appropriate changes. At any time during the process, the developer can interrupt it, change its plan - to redirect it in action. It might even change its plan if it runs into issues. It may even update code dependencies or modify entire files as it goes. When it's complete it will wait for the developer to affirm code changes and prepare a pull request so that the changes can be put into a pull request back to GitHub. "I didn't become a software engineer because I dream every day about fixing bugs, that wasn't my ambition," said Korevec. "I want to build really cool, creative apps. What's nice about Jules is that I can say 'Hey, go fix these bugs for me.'" Certainly, Korevec added, some engineers love fixing bugs, but they don't want to migrate from one version to another or other similarly tedious tasks. The impetus behind building Jules came from allowing developers to get to the work that they wanted to do and unleashing Jules on the busywork they don't want to do.
[44]
Gemini 2.0 Flash ushers in a new era of real-time multimodal AI
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google's release of Gemini 2.0 Flash this week, offering users a way to interact live with video of their surroundings, has set the stage for what could be a pivotal shift in how enterprises and consumers engage with technology. This release -- alongside announcements from OpenAI, Microsoft, and others -- is part of transformative leap forward happening in the technology area called "multimodal AI." The technology allows you to take video -- or audio or images -- that comes into your computer or phone, and ask questions about it. It also signals an intensification of the competitive race among Google and its chief rivals -- OpenAI and Microsoft -- for dominance in AI capabilities. But more importantly, it feels like it is defining the next era of interactive, agentic computing. This moment in AI feels to me like an "iPhone moment," and by that I'm referring to 2007-2008 when Apple released an iPhone that, via a connection with the internet and slick user interface, transformed daily lives by giving people a powerful computer in their pocket. While OpenAI's ChatGPT may have kicked off this latest AI moment with its powerful human-like chatbot in Nov 2022, Google's release here at the end of 2024 feels like major continuation of that moment -- at a time when a lot of observers had been worried about a possible slowdown in the improvements of AI technology. Gemini 2.0 Flash: the catalyst of AI's multimodal revolution Google's Gemini 2.0 Flash offers groundbreaking functionality, allowing real-time interaction with video captured via a smartphone. Unlike prior staged demonstrations (e.g., Google's Project Astra in May), this technology is now available to everyday users through Google's AI Studio. I encourage you to try it yourself. I used it to view and interact with my surroundings -- which for me this morning was my kitchen and dining room. You can see instantly how this offers breakthroughs for education and other use-cases. You can see why content creator Jerrod Lew reacted on X yesterday with astonishment when he used Gemini 2.0 Realtime to edit a video in Adobe Premier Pro. "This is absolutely insane," he said, after Google guided him within seconds on how to add a basic blur effect even though he was a novice user. Sam Witteveen, a prominent AI developer and co-founder of Red Dragon AI, was given early access to test Gemini 2.0 Flash, and he highlighted that Gemini Flash's speed -- it is twice as fast as Google's flagship until now, Gemini 1.5 Pro -- and "insanely cheap" pricing make it not just a showcase for for developers to test new products with, but a practical tool for enterprises managing AI budgets. (To be clear, Google hasn't actually announced pricing for Gemini 2.0 Flash yet. It is a free preview. But Witteveen is basing his assumptions on the precedent set by Google's Gemini 1.5 series.) For developers, the Live API of these multimodal live features offer significant potential, because they enable seamless integration into applications. That API is also available to use; a demo app is available. Here is the Google blog post for developers. Programmer Simon Willison called the streaming API next level: "This stuff is straight out of science fiction: being able to have an audio conversation with a capable LLM about things that it can 'see' through your camera is one of those 'we live in the future' moments." He noted how you ask the API to enable a code execution mode, which lets the models write Python code, run it and consider the result as part of their response - all part of an agentic future. The technology is clearly a harbinger of new application ecosystems and user expectations. Imagine being able to analyze live video during a presentation, suggest edits, or troubleshoot in real-time. Yes, the technology is cool for consumers, but it's important for enterprise users and leaders to grasp as well. The new features are the foundation of an entirely new way of working and interacting with technology -- suggesting coming productivity gains and creative workflows. The competitive landscape: A race to define the future Google's Gemini 2.0 Flash on Wednesday comes amid a flurry of releases by both Google and its major competitors, which are rushing to ship their latest technologies by the end of the year. They all promise to deliver consumer-ready multimodal capabilities -- live video interaction, image generation, and voice synthesis, but some of them aren't fully baked or even fully available. One reason for the rush is that some of these companies bonus their employees to deliver on key products before the end of the year. Another is bragging rights when they get new features out first. They can get major user traction by being first, which OpenAI showed in 2022 - when its ChatGPT become the fastest growing consumer product in history. Even though Google had similar technology, it was not prepared for a public release and was left flat-footed. Observers have sharply criticized Google ever since for being too slow. Here's what the other companies have announced in the past few days, all helping introduce this new era of multimodal AI. Navigating challenges and embracing opportunities While these technologies are revolutionary, challenges remain: However, all of these hurdles are outweighed by the technology's potential benefits, and there's no doubt that developers and enterprise companies will be rushing to embrace them over the next year. Conclusion: A new dawn, led for now by Google As developer Sam Witteveen and I discuss in our podcast taped Wednesday night after Google's release, Gemini 2.0 Flash is a truly an impressive release - the moment when multimodal AI has become real. Google's advancements have set a new benchmark, although it's true that this edge could be extremely fleeting. OpenAI and Microsoft are hot on its tail. We're still very early in this revolution, just like in 2008 when despite the iPhone's release, it wasn't clear how Google, Nokia, and RIM would respond. History showed Nokia and RIM didn't, and they died. Google responded really well, and has given the iPhone a run. Likewise, it's clear that Microsoft and OpenAI are very much in this race with Google. Apple, meanwhile, has decided to partner on the technology and this week announced a further integration with ChatGPT - but it's certainly not trying to win outright in this new era of multimodal offerings. In our podcast, Sam and I also cover Google's special strategic advantage around the area of the browser. For example, its Project Mariner release, a Chrome extension, allows you to do real-world web browsing tasks with even more functionality than competing technologies offered by Anthropic (called Computer Use) and Microsoft's OmniParser (still in research). Although its true that Anthropic's feature gives you more access to your computer's local resources. All of this gives Google a head start in the race to push forward agentic AI technologies in 2005 as well, even if Microsoft appears to be ahead on the actual execution side of delivering agentic solutions to enterprise. AI agents do complex tasks autonomously, with minimal human intervention - for example, they'll soon do advanced research tasks and database checks before performing ecommerce, stock trading or even real estate buying. Google's focus on making these Gemini 2.0 capabilities accessible to both developers and consumers is smart, because it ensures it is addressing the industry with a comprehensive plan. Until now, Google has suffered a reputation of not being as aggressively focused on developers as Microsoft. The question for decision-makers is not whether to adopt these tools, but how quickly you can integrate them into workflows. It is going to be fascinating to see where the next year takes us. Make sure to listen to our takeaways for enterprise users in the video below:
[45]
Google puts AI agents at the center of Gemini update
SAN FRANCISCO, Dec 11 - Alphabet's (GOOGL.O), opens new tab Google on Wednesday released the second generation of its artificial intelligence model Gemini and teased a slate of new ways to use AI beyond chatbots, including through a pair of eyeglasses. CEO Sundar Pichai in a blog post dubbed the moment as the start of a "new agentic era," referring to virtual assistants that can perform tasks with greater autonomy. "They can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." The releases underscore the methods by which Google is aiming to reclaim the lead in the race to dominate the emerging technology. Microsoft-backed (MSFT.O), opens new tab OpenAI captured global attention when it released chatbot ChatGPT in November 2022. Google unveiled Gemini in December 2023 and now offers four versions. On Wednesday, it released an update to Flash, its second cheapest model, with improved performance and added features to process images and audio. Other models will come next year. OpenAI has in recent days announced a flurry of new offerings to diversify its prospects including a $200-a-month ChatGPT subscription for advanced research use and the availability of its text-to-video model Sora. Google's play involves injecting its AI advances into applications that already enjoy widespread adoption. Search, Android and YouTube are among seven products that the company says are used by more than 2 billion people monthly. That user base is a significant advantage over challenger startups such as search startup Perplexity, which is seeking a $9 billion valuation, and newer research labs like OpenAI, Anthropic or Elon Musk's xAI. The Gemini 2.0 Flash model will power applications including AI Overviews in its search engine. Alphabet's biggest bet is AI for search, Ruth Porat, the president and chief investment officer, said at the Reuters NEXT conference in New York on Tuesday. Google also showed reporters new capabilities for Project Astra, a prototype universal agent which can talk to users about anything captured on their smartphone camera in real time. The tool can now hold a conversation spoken in a mix of languages, as well as process information from Maps and image recognition tool Lens, DeepMind group product manager Bibo Xu told reporters. And Astra will also be tested on prototype eyeglasses, the company's first return to the product area since the failure of Google Glasses. Others have since entered the market including Meta which in September unveiled an AR glasses prototype. Google also showed reporters Project Mariner, a Chrome web browser extension which can automate keystrokes and mouse clicks in the vein of rival lab Anthropic's "computer use" feature, a feature to improve software coding called Jules, and a tool to assist consumers in making decisions like what to do or which items to buy in video games. Reporting by Kenrick Cai in San Francisco; Editing by Peter Henderson and Kim Coghill Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence Kenrick Cai Thomson Reuters Kenrick Cai is a correspondent for Reuters based in San Francisco. He covers Google, its parent company Alphabet and artificial intelligence. Cai joined Reuters in 2024. He previously worked at Forbes magazine, where he was a staff writer covering venture capital and startups. He received a Best in Business award from the Society for Advancing Business Editing and Writing in 2023. He is a graduate of Duke University.
[46]
Google launched Gemini 2.0, its new AI model for practically everything
Google's latest AI model has a lot of work to do. Like every other company in the AI race, Google is frantically building AI into practically every product it owns, trying to build products other developers want to use, and racing to set up all the infrastructure to make those things possible without being so expensive it runs the company out of business. Meanwhile, Amazon, Microsoft, Anthropic, and OpenAI are pouring their own billions into pretty much the exact same set of problems.
[47]
Google Gemini: Everything you need to know about the generative AI models
Google's trying to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what's Gemini? How can you use it? And how does it stack up to other generative AI tools such as OpenAI's ChatGPT, Meta's Llama, and Microsoft's Copilot? To make it easier to keep up with the latest Gemini developments, we've put together this handy guide, which we'll keep updated as new Gemini models, features, and news about Google's plans for Gemini are released. What is Gemini? Gemini is Google's long-promised, next-gen generative AI model family. Developed by Google's AI research labs DeepMind and Google Research, it comes in four flavors: All Gemini models were trained to be natively multimodal -- that is, able to work with and analyze more than just text. Google says they were pre-trained and fine-tuned on a variety of public, proprietary, and licensed audio, images, and videos; a set of codebases; and text in different languages. This sets Gemini apart from models such as Google's own LaMDA, which was trained exclusively on text data. LaMDA can't understand or generate anything beyond text (e.g., essays, emails, and so on), but that isn't necessarily the case with Gemini models. We'll note here that the ethics and legality of training models on public data, in some cases without the data owners' knowledge or consent, are murky. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy contains carve-outs. Proceed with caution -- particularly if you're intending on using Gemini commercially. What's the difference between the Gemini apps and Gemini models? Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are clients that connect to various Gemini models and layer a chatbot-like interface on top. Think of them as front ends for Google's generative AI, analogous to ChatGPT and Anthropic's Claude family of apps. Gemini on the web lives here. On Android, the Gemini app replaces the existing Google Assistant app. And on iOS, the Google and Google Search apps serve as that platform's Gemini clients. On Android, it also recently became possible to bring up the Gemini overlay on top of any app to ask questions about what's on the screen (e.g., a YouTube video). Just press and hold a supported smartphone's power button or say, "Hey Google"; you'll see the overlay pop up. Gemini apps can accept images as well as voice commands and text -- including files like PDFs and soon videos, either uploaded or imported from Google Drive -- and generate images. As you'd expect, conversations with Gemini apps on mobile carry over to Gemini on the web and vice versa if you're signed in to the same Google Account in both places. Gemini Advanced The Gemini apps aren't the only means of recruiting Gemini models' assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs. To take advantage of most of these, you'll need the Google One AI Premium Plan. Technically a part of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It also enables what Google calls Gemini Advanced, which brings the company's more sophisticated Gemini models to the Gemini apps. Gemini Advanced users get extras here and there, too, like priority access to new features, the ability to run and edit Python code directly in Gemini, and a larger "context window." Gemini Advanced can remember the content of -- and reason across -- roughly 750,000 words in a conversation (or 1,500 pages of documents). That's compared to the 24,000 words (or 48 pages) the vanilla Gemini app can handle. Gemini Advanced also gives users access to Google's new Deep Research feature, which uses "advanced reasoning" and "long context capabilities" to generate research briefs. After you prompt the chatbot, it creates a multi-step research plan, asks you to approve it, and then Gemini takes a few minutes to search the web and generate an extensive report based on your query. It's meant to answer more complex questions such as, "Can you help me redesign my kitchen?" Google also offers Gemini Advanced users a memory feature, that allows the chatbot to use your old conversations with Gemini as context for your current conversation. Another Gemini Advanced exclusive is trip planning in Google Search, which creates custom travel itineraries from prompts. Taking into account things like flight times (from emails in a user's Gmail inbox), meal preferences, and information about local attractions (from Google Search and Maps data), as well as the distances between those attractions, Gemini will generate an itinerary that updates automatically to reflect any changes. Gemini across Google services is also available to corporate customers through two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business costs as low as $6 per user per month, while Gemini Enterprise -- which adds meeting note-taking and translated captions as well as document classification and labeling -- is generally more expensive, but is priced based on a business's needs. (Both plans require an annual commitment.) Gemini in Gmail, Docs, Chrome, dev tools, and more In Gmail, Gemini lives in a side panel that can write emails and summarize message threads. You'll find the same panel in Docs, where it helps you write and refine your content and brainstorm new ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas. Google's AI chatbot recently came to Maps, where Gemini can summarize reviews about coffee shops or offer recommendations about how to spend a day visiting a foreign city. Gemini's reach extends to Drive as well, where it can summarize files and folders, and give quick facts about a project. In Meet, meanwhile, Gemini translates captions into additional languages. Gemini recently came to Google's Chrome browser in the form of an AI writing tool. You can use it to write something completely new or rewrite existing text; Google says it'll consider the web page you're on to make recommendations. Elsewhere, you'll find hints of Gemini in Google's database products, cloud security tools, and app development platforms (including Firebase and Project IDX), as well as in apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps brainstorm video ideas), and the NotebookLM note-taking assistant. Code Assist (formerly Duet AI for Developers), Google's suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google's security products underpinned by Gemini, like Gemini in Threat Intelligence, which can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise. Gemini extensions and Gems Announced at Google I/O 2024, Gemini Advanced users can create Gems, custom chatbots powered by Gemini models. Gems can be generated from natural language descriptions -- for example, "You're my running coach. Give me a daily running plan" -- and shared with others or kept private. Gems are available on desktop and mobile in 150 countries and most languages. Eventually, they'll be able to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to complete custom tasks. Speaking of integrations, the Gemini apps on the web and mobile can tap into Google services via what Google calls "Gemini extensions." Gemini today integrates with Google Drive, Gmail, and YouTube to respond to queries such as "Could you summarize my last three emails?" Later this year, Gemini will be able to take additional actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth, and so on. Gemini Live in-depth voice chats An experience called Gemini Live allows users to have "in-depth" voice chats with Gemini. It's available in the Gemini apps on mobile and the Pixel Buds Pro 2, where it can be accessed even when your phone's locked. With Gemini Live enabled, you can interrupt Gemini while the chatbot's speaking (in one of several new voices) to ask a clarifying question, and it'll adapt to your speech patterns in real time. At some point, Gemini is supposed to gain visual understanding, allowing it see and respond to your surroundings, either via photos or video captured by your smartphones' cameras. Live is also designed to serve as a virtual coach of sorts, helping you rehearse for events, brainstorm ideas, and so on. For instance, Live can suggest which skills to highlight in an upcoming job or internship interview, and it can give public speaking advice. You can read our review of Gemini Live here. Spoiler alert: We think the feature has a ways to go before it's super useful -- but it's early days, admittedly. Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more "creative and detailed" in its generations. In addition, the model produces fewer artifacts and visual errors (at least according to Google), and is the best Imagen model yet for rendering text. Back in February, Google was forced to pause Gemini's ability to generate images of people after users complained of historical inaccuracies. But in August, the company reintroduced people generation for certain users, specifically English-language users signed up for one of Google's paid Gemini plans (e.g., Gemini Advanced) as part of a pilot program. Gemini for teens In June, Google introduced a teen-focused Gemini experience, allowing students to sign up via their Google Workspace for Education school accounts. The teen-focused Gemini has "additional policies and safeguards," including a tailored onboarding process and an "AI literacy guide" to (as Google phrases it) "help teens use AI responsibly." Otherwise, it's nearly identical to the standard Gemini experience, down to the "double check" feature that looks across the web to see if Gemini's responses are accurate. On the Google TV Streamer, Gemini uses your preferences to curate content suggestions across your subscriptions and summarize reviews and even whole seasons of TV. On the latest Nest thermostat (as well as Nest speakers, cameras, and smart displays), Gemini will soon bolster Google Assistant's conversational and analytic capabilities. Subscribers to Google's Nest Aware plan later this year will get a preview of new Gemini-powered experiences like AI descriptions for Nest camera footage, natural language video search and recommended automations. Nest cameras will understand what's happening in real-time video feeds (e.g., when a dog's digging in the garden), while the companion Google Home app will surface videos and create device automations given a description (e.g., "Did the kids leave their bikes in the driveway?," "Have my Nest thermostat turn on the heating when I get home from work every Tuesday"). Also later this year, Google Assistant will get a few upgrades on Nest-branded and other smart home devices to make conversations feel more natural. Improved voices are on the way, in addition to the ability to ask follow-up questions and "[more] easily go back and forth." What can the Gemini models do? Because Gemini models are multimodal, they can perform a range of multimodal tasks, from transcribing speech to captioning images and videos in real time. Many of these capabilities have reached the product stage (as alluded to in the previous section), and Google is promising much more in the not-too-distant future. Of course, it's a bit hard to take the company at its word. Google seriously underdelivered with the original Bard launch. More recently, it ruffled feathers with a video purporting to show Gemini's capabilities that was more or less aspirational -- not live. Also, Google offers no fix for some of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e., hallucinate). Neither do its rivals, but it's something to keep in mind when considering using or paying for Gemini. Assuming for the purposes of this article that Google is being truthful with its recent claims, here's what the different tiers of Gemini can do now and what they'll be able to do once they reach their full potential: What you can do with Gemini Ultra Google says that Gemini Ultra -- thanks to its multimodality -- can be used to help with things like physics homework, solving problems step-by-step on a worksheet, and pointing out possible mistakes in already filled-in answers. Ultra can also be applied to tasks such as identifying scientific papers relevant to a problem, Google says. The model can extract information from several papers, for instance, and update a chart from one by generating the formulas necessary to re-create the chart with more timely data. Gemini Ultra technically supports image generation. But that capability hasn't made its way into the productized version of the model yet -- perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Rather than feed prompts to an image generator (like DALL-E 3, in ChatGPT's case), Gemini outputs images "natively," without an intermediary step. Ultra is available as an API through Vertex AI, Google's fully managed AI dev platform, and AI Studio, Google's web-based tool for app and platform developers. Gemini Pro's capabilities Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning, and understanding capabilities. The latest version, Gemini 1.5 Pro -- which powers the Gemini apps for Gemini Advanced subscribers -- exceeds even Ultra's performance in some areas. Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, Gemini 1.0 Pro, perhaps most obviously in the amount of data that it can process. Gemini 1.5 Pro can take in up to 1.4 million words, two hours of video, or 22 hours of audio and can reason across or answer questions about that data (more or less). Gemini 1.5 Pro became generally available on Vertex AI and AI Studio in June alongside a feature called code execution, which aims to reduce bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.) Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or "grounding" process. For example, Pro (along with other Gemini models) can be instructed to use data from third-party providers like Moody's, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate datasets or Google Search instead of its wider knowledge bank. Gemini Pro can also be connected to external, third-party APIs to perform particular actions, like automating a back-office workflow. AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model's creative range and provide examples to give tone and style instructions -- and also tune Pro's safety settings. Vertex AI Agent Builder lets people build Gemini-powered "agents" within Vertex AI. For example, a company could create an agent that analyzes previous marketing campaigns to understand a brand style and then apply that knowledge to help generate new ideas consistent with the style. Gemini Flash is lighter, but packs a punch While the first version of Gemini Flash was made for less demanding workloads, the newest version, 2.0 Flash, is now Google's flagship AI model. Google calls Gemini 2.0 Flash its AI model for the agentic era. The model can natively generate images and audio, in addition to text, and can use tools like Google Search and interact with external APIs. The 2.0 Flash model is faster than Gemini's previous generation of models, and even outperforms some of the larger Gemini 1.5 models on benchmarks measuring coding and image analysis. You can try an experimental version of 2.0 Flash in the web version of Gemini or through Google's AI developer platforms, and a production version of the model should land in January. An offshoot of Gemini Pro that's small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it can analyze audio, video, images, and text (but it can only generate text). Google says that Flash is particularly well-suited for tasks like summarization and chat apps, plus image and video captioning and data extraction from long documents and tables. Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of information (e.g., a knowledge base or database of research papers) in a cache that Gemini models can quickly and relatively cheaply access. Context caching is an additional fee on top of other Gemini model usage fees, however. Gemini Nano can run on your phone Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it's efficient enough to run directly on (some) devices instead of sending the task to a server somewhere. So far, Nano powers a couple of features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard. The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of recorded conversations, interviews, presentations, and other audio snippets. Users get summaries even if they don't have a signal or Wi-Fi connection -- and in a nod to privacy, no data leaves their phone in process. Nano is also in Gboard, Google's keyboard replacement. There, it powers a feature called Smart Reply, which helps to suggest the next thing you'll want to say when having a conversation in a messaging app such as WhatsApp. In the Google Messages app on supported devices, Nano drives Magic Compose, which can craft messages in styles like "excited," "formal," and "lyrical." Google says that a future version of Android will tap Nano to alert users to potential scams during calls. The new weather app on Pixel phones uses Gemini Nano to generate tailored weather reports. And TalkBack, Google's accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind users. How much do the Gemini models cost? Gemini 1.0 Pro (the first version of Gemini Pro), 1.5 Pro, and Flash are available through Google's Gemini API for building apps and services -- all with free options. But the free options impose usage limits and leave out certain features, like context caching and batching. Gemini models are otherwise pay-as-you-go. Here's the base pricing -- not including add-ons like context caching -- as of September 2024: Tokens are subdivided bits of raw data, like the syllables "fan," "tas," and "tic" in the word "fantastic"; 1 million tokens is equivalent to about 700,000 words. Input refers to tokens fed into the model, while output refers to tokens that the model generates. Ultra and 2.0 Flash pricing has yet to be announced, and Nano is still in early access. What's the latest on Project Astra? Project Astra is Google DeepMind's effort to create AI-powered apps and "agents" for real-time, multimodal understanding. In demoes, Google has shown how the AI model can simultaneously process live video and audio. Google released an app version of Project Astra to a small number of trusted testers in December, but has no plans for a broader release right now. The company would like to put Project Astra in a pair of smart glasses. Google also gave a prototype of some glasses with Project Astra and augmented reality capabilities to a few trusted testers in December. However, there's not a clear product at this time, and it's unclear when Google would actually release something like this. Project Astra is still just that, a project, and not a product. However, the demoes of Astra reveal what Google would like its AI products to do in the future. Apple has said that it's in talks to put Gemini and other third-party models to use for a number of features in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with models, including Gemini, but he didn't divulge any additional details.
[48]
Google unveils latest AI model - Gemini 2.0
SAN FRANCISCO (AFP) - Google announced the launch of Gemini 2.0, its most advanced artificial intelligence (AI) model to date, as the world's tech giants race to take the lead in the fast developing technology. Chief executive officer Sundar Pichai said the new model would mark what the company calls "a new agentic era" in AI development, with AI models designed to understand and make decisions about the world around you. "Gemini 2.0 is about making information much more useful," Pichai said in the announcement, emphasising the model's enhanced ability to understand context, think multiple steps ahead and take supervised actions on behalf of users. The developments "bring us closer to our vision of a universal assistant," he added. The release sent shares in Google soaring by more than four percent on Wall Street a day after the stock already gained 3.5 per cent after the release of a breakthrough quantum chip. The tech giants are furiously taking steps to release more powerful AI models despite their immense cost and some questions about their immediate usefulness to the broader economy. An AI "agent", the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. The tech giants promise that agents will be the next stage of an AI revolution that was sparked by the 2022 launch of ChatGPT, which took the world by storm.
[49]
Google unveils Gemini 2.0 and next-gen TPU Trillium, ushering in the era of AI agents
Google has officially launched its upgraded AI model, Gemini 2.0, delivering enhanced performance and multimodal capabilities while paving the way for the future era of AI agents. Additionally, Google's sixth-generation Tensor Processing Unit (TPU) Trillium, first introduced at the I/O Developer Conference in May, is now operational to support Gemini 2.0's training and inference workloads. AI agents: Redefining productivity and automation Google CEO Sundar Pichai highlighted the company's shift toward developing agentic AI models. These models are designed to understand the surrounding environment, plan tasks across multiple steps, and take action under user supervision. According to reports from TechCrunch and VentureBeat, Google has unveiled three AI agent prototypes built on the Gemini 2.0 architecture: Project Astra, Project Mariner, and Jules, each tailored for distinct applications ranging from daily tasks to complex programming and web navigation. The general-purpose Project Astra can support multilingual conversations, access and integrate data from Google apps and tools like Google Search and Maps, and even retain previous interaction records for seamless continuity. Project Astra: General-purpose AI for seamless conversations Project Astra can support multilingual conversations, access and integrate data from Google apps and tools like Search and Maps, and retain conversation history for seamless continuity. For example, users can share a screenshot of a reading list, activate the camera, and let Astra "see" nearby books to recommend the most suitable one as a gift. Project Mariner, designed for developers and enterprise users, is a Chrome extension powered by Gemini 2.0, enabling automated web navigation. By capturing screenshots and processing them in the cloud, the AI agent can interpret and execute tasks, such as online shopping based on a list. Jaclyn Konzelmann, Director of Product Management at Google Labs, demonstrated how users can prompt Mariner to locate products, add them to carts, and complete purchases. While Mariner achieved an 83.5% task success rate on WebVoyager benchmarks, it still operates with a slight latency of about 5 seconds between mouse movements. Konzelmann emphasized that AI-driven web navigation represents a paradigm shift in user experience, potentially reducing the need to visit websites directly -- a development that may significantly impact online publishers and retailers. Jules is designed to assist developers by analyzing codebases, suggesting repair plans, and executing fixes across multiple files. It also integrates directly with platforms like GitHub, resembling Microsoft's GitHub Copilot. Currently, Jules is available only to a select group of trusted testers, with broader access anticipated in early 2025. Trillium: The engine behind Gemini 2.0 Trillium, a key player behind the scenes, has been instrumental in powering the training and inference of Gemini 2.0, as noted by CEO Pichai. The Trillium chip, compared to its predecessor, delivers a 4.7x performance boost, doubled HBM capacity and bandwidth, and a 67% energy efficiency improvement. Google has deployed over 100,000 Trillium chips within a single system, supported by its Jupiter network architecture. This configuration achieves a remarkable 13 petabytes per second of data transfer, facilitating the simultaneous utilization of hundreds of thousands of accelerators for a single training task. Mark Lohmeyer, VP of Compute and AI Infrastructure at Google Cloud, highlighted that testing Llama 2 70B with Trillium showed a clear correlation between performance gains and the number of chips used.
[50]
Google Gemini 2.0 Flash shines with real-time conversation
Chocolate Factory's latest multimodal model aims to power more trusted AI agents Google on Wednesday released Gemini 2.0 Flash, the latest addition to its AI model lineup, in the hope that developers will create agentic applications in AI Studio and Vertex AI. AI agents are all the rage at the moment among makers of machine learning models because there's a presumed market for software-derived labor that's capable, compliant, and cheap - qualities not yet consistently evident in current AI models. AI agents consist of AI models that can accomplish multi-step tasks, as directed by complex prompts, generally in conjunction with external data sources and tools. The pitch to shareholders of the AI-focused firms goes something like this: customers will be able to ask our AI agent to plan a vacation and - insert monetization strategy here - the agent will actually execute the various steps required, including making reservations and paying for transportation. We're not there yet because most people aren't ready to delegate purchasing authority or full application access to unreliable AI models. But the hope is that such concerns can be addressed to the point that people are willing to try it. And given the public's demonstrated risk tolerance for cryptocurrency investment and hands-free automated driving, that point isn't far off. "The practical application of AI agents is a research area full of exciting possibilities," said Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, in a blog post provided to The Register. "We're exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done. These include an update to Project Astra, our research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers." To realize its dreams of Jarvis - a reference to the personal assistant depicted in Marvel's Iron Man films - Google is aiming at software developers. Through its AI Studio and Vertex AI platforms, the biz is offering AI models that can be grounded - linked to specific sources of data to make model responses more accurate - and given access to specific functions and tools. "AI Studio is really intended to be sort of the interface for developers to get access to Google's latest models," explained Logan Kilpatrick, product manager for AI Studio and Gemini API, during a media briefing. "You have all those sort of experimental models that we released there. You have all the production models. The intent is sort of get you interested in the capabilities that Gemini has to offer and then ultimately get you building with the Gemini API and actually like putting Gemini into your apps and projects." Gemini 2.0 Flash arrives a year after Google's first Gemini model debuted. It joins a lineup that includes other Gemini family models: Gemini 1.0 Ultra, Gemini 1.5 Pro, Gemini 1.0 Pro, and Gemini 1.5 Flash, and Gemini 1.0 Nano. Generally, the 1.5 versions are more capable than the 1.0 versions, and the larger models tend to perform better than the smaller ones (Ultra, Pro, Flash, and Nano, in order of size); the Chocolate Factory has published benchmarks that provide more details. Gemini 2.0 Flash is said to be twice as fast as 1.5 Pro, with better performance. Gemini 2.0 Flash brings some new capabilities to the table. The model is multilingual and also multimodal - it can accept text, imagery, and audio as input and can respond in any of those modes. And it sports a multimodal live API - so it can engage in real-time conversation and image analysis. Plus the new model supports tool use, in the form of code execution and search, which provides access to recent information, calculation capabilities, and the ability to interact with data sources without extra setup. "The model is now able to natively output both audio and images, which will start off in an early access program but roll out more broadly over the next few months," said Kilpatrick. In conjunction with the debut of Gemini 2.0 Flash, Google is starting to roll out Jules, adding some "agentic data science capabilities" to Google Colab, and making its new model available within Gemini Code Assist, the company's AI coding extension for VS Code, IntelliJ PyCharm, and other IDEs. "Starting today [for trusted testers], you can offload Python and JavaScript coding tasks to Jules, an experimental AI-powered code agent that will use Gemini 2.0," said Shrestha Basu Mallick, group product manager for Gemini API, and Kathy Korevec, director of product for Google Labs, in a blog post provided to The Register. "Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build." Those not in the trusted tester program can sign up to try Jules in 2025. As a demonstration of Gemini 2.0 Flash, Basu Mallick played a game of 20 questions with the model by speaking to it and listening to its responses. She also asked it to count the number of fingers she was holding up in a video stream and to say what color her nails were painted. The model answered adequately in both cases, though we'd argue that "red" would have been a more accurate answer to the nail color query than "pink." That could just reflect differences in monitor color rendering, however. Basu Mallick also demonstrated how Gemini 2.0 Flash can handle a multistep prompt asking the model to identify the five longest movies by Denis Villeneuve, calculate their respective running times, then plot the data on a graph. The task involved having the model generate Python code and execute it in a sandbox to calculate the results. "This is the kind of complex prompt where you first have to solve the first part of the prompt and then the second part of the prompt," she explained. "Then I'm asking it to write some code to work out which has the longest and shortest runtimes and then do a plot." Another demonstration showed off Gemini 2.0 Flash's multimodal capabilities for generating recipes. The model was able to create visuals showing how ingredients might look in a frying pan to augment generated text instructions. The only thing missing was pricing information. "We aren't really saying pricing at this time," said Kilpatrick. "Developers will be able to use the multimodal live API and the 2.0 models for free through AI studio, and when we do a wider release early next year, we'll follow up with pricing." ®
[51]
Watch New Capabilities in Google's Project Astra With Gemini 2.0
Google DeepMind, the company's AI research wing, first unveiled Project Astra at I/O this year. Now, more than six months later, the tech giant announced new capabilities and improvements in the artificial intelligence (AI) agent. Drawing upon Gemini 2.0 AI models, it can now converse in multiple languages, access multiple Google platforms, and has improved memory. The tool is still in the testing phase, but the Mountain View-based tech giant stated that it is working to bring Project Astra to the Gemini app, Gemini AI assistant, and even form factors like glasses. Project Astra is a general-purpose AI agent that is similar in functionality to OpenAI's vision mode or the Meta Ray-Ban smart glasses. It can integrate with camera hardware to see the user's surroundings and process the visual data to answer questions about them. Additionally, the AI agent comes with limited memory that allows it to remember visual information even when it is not actively being shown via the camera. Google DeepMind highlighted in a blog post that ever since the showcase in May, the team has been working on improving the AI agent. Now, with Gemini 2.0, Project Astra has received several upgrades. It can now converse in multiple languages and mixed languages. The company said that it now has a better understanding of accents and uncommon words. The company has also introduced tool use in Project Astra. It can now draw upon Google Search, Lens, Maps, and Gemini to answer complex questions. For instance, users can show a landmark and ask the AI agent to show directions to their home, and it can recognise the object and verbally direct the user home. Memory function of the AI agent has also been upgraded. Back in May, Project Astra could only retain visual information from the last 45 seconds, it has now been extended to 10 minutes of in-session memory. Additionally, it can also remember more past conversations to offer more personalised responses. Finally, Google claims that the agent can now understand language at the latency of human conversation, making interactions with the tool more human-like.
[52]
Gemini 2.0 launch puts Google on road to AI agents
Google Gemini 2.0 -- a major upgrade to the core workings of Google's AI that the company launched Wednesday -- is designed to help generative AI move from answering users' questions to taking action on its own, Google DeepMind CEO Demis Hassabis tells Axios. Why it matters: Google, like others in the industry, is heavily touting the potential of AI agents. But the technology needs a boost in performance and accuracy if it's going to be able to act reliably with less human supervision.
[53]
Google Rolls Out Faster Gemini AI Model to Power Agents
Google debuted a new version of its flagship artificial intelligence model that it said is twice as fast as its previous version and will power virtual agents that assist users. The new model, Gemini 2.0, can generate images and audio across languages, and can assist during Google searches and coding projects, the company said Wednesday. The new capabilities of Gemini "make it possible to build agents that can think, remember, plan, and even take action on your behalf," said Tulsee Doshi, a director of product management at the company, in a briefing with reporters.
[54]
Google touts 'new agentic era' with AI agents powered by Gemini 2.0
Google AI research lab DeepMind says its newly released artificial intelligence model, Gemini 2.0, will be the bedrock used to build more advanced AI agents. An AI agent powered by Gemini 2.0, released Dec. 11, can understand complex instructions, plan, reason, take action across websites and can even assist with video game strategy, Google DeepMind CEO Demis Hassabis and chief technology officer Koray Kavukcuoglu said in a Dec. 11 blog post. "The practical application of AI agents is a research area full of exciting possibilities," Hassabis and Kavukcuoglu said. "We're exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done." According to Hassabis and Kavukcuoglu, there are several experimental Gemini powered AI assistant projects that all have a different function in mind. One, known as Deep Research, can help users explore complex topics by creating multistep research plans by searching the web and then generating a lengthy report on its findings. Project Astra, a universal AI assistant, is geared toward everyday tasks like providing recommendations and advice based on prompts supplied by the user, such as how to wash clothes or more information about a landmark. Project Mariner focuses on creating an AI agent that can take control of your Chrome browser, move the cursor, click buttons, fill out forms, and navigate websites. According to Hassabis and Kavukcuoglu, these projects are "still in the early stages of development," but they hope to make them "widely available in products in the future" after testing and more development. "It's still early, but Project Mariner shows that it's becoming technically possible to navigate within a browser, even though it's not always accurate and slow to complete tasks today, which will improve rapidly over time." Related: US, EU and UK sign world's first international AI treaty Meanwhile, Project Jules is being developed as an assistant for developers that can integrate directly into a GitHub workflow and help with tasks such as coding and planning. Hassabis and Kavukcuoglu said they have also built agents using Gemini 2.0 for video games that can offer suggestions for the player on what to do next in real-time conversation and search for a "wealth of gaming knowledge" online. "We're collaborating with leading game developers like Supercell to explore how these agents work, testing their ability to interpret rules and challenges across a diverse range of games, from strategy titles to farming simulators," they said. In November, Marc Benioff, CEO of American cloud computing software firm Salesforce, said the future of AI lies in autonomous agents rather than large language models (LLMs). "I actually think we're hitting the upper limits of the LLMs right now," he said on The Wall Street Journal's "Future of Everything" podcast on Nov. 23. Nvidia is also focusing on positioning itself in front of the trend. "We see the number of AI native companies continue to grow. And of course, we're starting to see enterprise adoption of agentic AI really is the latest rage," Nvidia CEO Jensen Huang said in a Q3 earnings call in November. In addition, Hassabis and Kavukcuoglu say the team is "experimenting with agents that can help in the physical world" through robotics. Google's AI agents are being released to testers and developers only at this stage.
[55]
Google Launches Gemini 2.0 and Anthropic Rolls Out Claude 3.5 Haiku Amid OpenAI's Year-End Blitz - Decrypt
Google unleashed Gemini 2.0 this week, packing its latest AI model with autonomous capabilities and multimodal features. What's immediately noticeable in this release is that Google sees AI chatbots as evolving into AI Agents -- customized software that uses generative AI to interact with users and understand and execute tasks in real time. "With new advances in multimodality -- like native image and audio output -- and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant," Google CEO Sundar Pichai said. The model builds upon Gemini 1.5's multimodal foundations with new native image generation and text-to-speech abilities, alongside improved reasoning skills. According to Google, the 2.0 Flash variant outperforms the previous 1.5 Pro model on key benchmarks while running at twice the speed. This model is currently available for users who pay for Google Advanced -- the paid subscription designed to compete against Claude and ChatGPT Plus. Those willing to get their hands dirty can enjoy a more complete experience by accessing the model via Google AI Studio. It is important to consider that this interface is more complex than the simple, straightforward, and user-friendly UI that Gemini provides. Also, it is more powerful but way slower. In our tests, we asked it to analyze a 74K token-long document, and it took nearly 10 minutes to produce a response. The output, however, was accurate enough, without hallucinations. Longer documents of around 200K tokens (nearly 150,000 words) will take considerably longer to be analyzed, but the model is capable of doing the job if you are patient enough. Google also implemented a "Deep Research" feature, available now in Gemini Advanced, to leverage the model's enhanced reasoning and long-context capabilities for exploring complex topics and compiling reports. This lets users tackle different topics more in-depth than they would using a regular model designed to provide more straightforward answers. However, it's based on Gemini 1.5, and there's no timeline to follow until there's a version that is based on Gemini 2.0. This new feature puts Gemini in direct competition with services such as Perplexity's Pro search, You.com's Research Assistant, and even the lesser-known BeaGo, all offering a similar experience. However, Google's service offers something different. Before providing information, the best approach to the task must be worked out. It presents a plan to the user, who can edit it to include or exclude info, add more research materials, or extract bits of information. Once the methodology has been set up, they can instruct the chatbot to start its research. Until now, no AI service has offered researchers this level of control and customizability. In our tests, a simple prompt like "Research the impact of AI in human relationships" triggered an investigation of over a dozen reliable scientific or official sites, with the model producing a 3 page-long document based on 8 properly cited sources. Not bad at all. Google also shared a video showing off Project Astra, its experimental AI assistant powered by Gemini 2.0. Astra is Google's response to Meta AI: An AI assistant that interacts with people in real time, using the smartphone's camera and microphone as information inputs and providing responses in voice mode. Google has given Project Astra expanded capabilities, including Multilingual conversations with improved accent recognition, integration with Google Search, Lens, and Maps, an extended memory that retains 10 minutes of conversation context, long-term memory, and low conversation latency through new streaming capabilities. Despite a tepid reception on social media -- Google's video has only gotten 90K views since launch -- the release of the new family of models seems to be getting decent traction among users, with a significant increase in web searches, especially considering it was announced during a major blackout of ChatGPT Plus. Google's announcement this week makes it clear that it's trying to compete against OpenAI to be the generative AI industry leader. Indeed, its announcement falls in the middle of OpenAI's "12 Days of Christmas" campaign, in which the company unveils a new product daily. Thus far, OpenAI has unveiled a new reasoning model (o1), a video generation tool (Sora), and a $200 monthly "Pro" subscription. Google also unveiled its new AI-powered Chrome extension, Project Mariner, which uses agents to navigate websites and complete tasks. In testing against the WebVoyager benchmark for real-world web tasks, Mariner achieved an 83.5% success rate working as a single agent, Google said. "Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," Pichai wrote in the announcement. The company plans to roll out Gemini 2.0 integration throughout its product lineup, starting with experimental access to the Gemini app today. A broader release will follow in January, including integration into Google Search's AI features, which currently reach over 1 billion users. Gemini 2's release comes as Anthropic silently unveiled its latest update. Claude 3.5 Haiku is a faster version of its family of AI models that claims superior performance on coding tasks, scoring 40.6% on the SWE-bench Verified benchmark. Anthropic is still training its most powerful model, Claude 3.5 Opus, which is set to be released later in 2025 after a series of delays. Both Google's and Anthropic's premium services are priced at $20 monthly, matching OpenAI's basic ChatGPT Plus tier. Anthropic's Claude 3.5 Haiku proved to be much faster, cheaper, and more potent than Claude 3 Sonnet (Anthropics medium size model from the previous generation), scoring 88.1% on HumanEval coding tasks and 85.6% on multilingual math problems. The model shows particular strength in data processing, with companies like Replit and Apollo reporting significant improvements in code refinement and content generation. Claude 3.5 Haiku is cheap at $0.80 per million tokens of input. The company claims users can achieve up to 90% cost savings through prompt caching and an additional 50% reduction using the Message Batches API, positioning the model as a cost-effective option for enterprises looking to scale their AI operations and a very interesting option to consider versus OpenAI o1-mini which costs $3.00 per million input tokens.
[56]
Google's Gemini 2.0 May Signal the Future of AI Agents | PYMNTS.com
A subtle but potentially significant shift is occurring in artificial intelligence (AI). Machines are advancing beyond processing commands to undertaking tasks with more autonomy. Google's release of Gemini 2.0 may mark a turning point in this evolution, showcasing AI systems that can independently navigate complex tasks across multiple platforms. For example, Gemini 2.0 powers projects like Astra, a universal assistant for Android devices, and Mariner, an agent capable of autonomous web navigation. These developments suggest the system could transform user interactions and task automation. For businesses, such advancements indicate possibilities for AI to impact operations ranging from warehouse management to customer service. "Gemini 2.0 improves on previous AI systems by advancing the capabilities of autonomous decision-making through the integration of more sophisticated AI agents that leverage real-time data processing and adaptive learning models," Prashant Kelker, chief strategy officer, partner and lead consulting sourcing and transformation - Americas, with global technology research and advisory firm ISG, told PYMNTS. "As a result, enterprises will need to strengthen the cross-functional alignment between technology, business and compliance teams. As agentic AI goes into production, we are expecting cloud and edge computing capabilities to scale." The key innovation lies in Gemini 2.0's ability to handle multistep processes with reduced human oversight. Unlike traditional AI that responds to specific prompts, this system aims to autonomously coordinate across platforms, potentially managing inventory or processing orders. "Rather than completely redesigning their eCommerce systems, businesses will likely extend existing accessibility and structured data standards to create an 'AI-enhanced HTML' layer that sits between pure visual interfaces and full APIs," Dev Nag, CEO of QueryPal, a support automation company, told PYMNTS. A distinguishing feature of Gemini 2.0 is its unified approach to processing different types of information. While previous systems often required separate tools for handling text, images and audio, this new model reportedly integrates them -- a development seen as critical for real-world applications where data often spans multiple formats. For retailers and logistics companies, such advancements might manage supply chains, from tracking shipments to predicting inventory needs, while also handling customer interactions across multiple channels. Financial institutions might consider deploying it to enhance fraud detection systems, potentially allowing human analysts to focus on strategic tasks. With further development, Gemini's agentic approach could have massive usefulness for practical and inefficient consumer tasks, Kevin Green, COO of Hapax, an AI for the financial services industry, told PYMNTS. But, he said, there are a few important things to note. "First, agentic is not a tidal wave. It will not come in and impact every aspect of consumer life. Much of what we do is already incredibly efficient and will not be improved upon, or at least not improved upon quickly, by agents. "Take online shopping for example. While AI may allow us to more quickly find what we are looking for or better introduce us to new products, the act of purchasing on Amazon is already incredibly efficient so products like AI shoppers are not something I'm expecting to take hold." Gemini 2.0 could be an indicator of broader changes in business operations. The ability to process multiple data types and make decisions autonomously could have implications for retail and manufacturing industries. Some early adopters are already exploring applications. For instance, logistics companies are testing AI agents for tasks like tracking shipments and rerouting them based on real-time conditions and customer preferences. Similarly, Salesforce recently announced Agentforce 2.0, a platform designed to enhance sales, marketing and customer service through AI-driven solutions. Customer service departments are experimenting with agents capable of resolving complex support issues by accessing multiple systems without human intervention. However, these advancements raise concerns. As AI systems gain more autonomy, ensuring their security becomes critical. A compromised AI agent could disrupt supply chains or execute unauthorized financial decisions. The potential cost savings and efficiency gains are considerable but not guaranteed. By automating routine processes, companies might reduce operational overhead and improve response times, though the extent of these benefits remains to be seen. For instance, SoundHound AI's voice technology, used in automotive and restaurant sectors, highlights the interest in AI-driven solutions but also underscores the challenges in scaling such systems effectively. For businesses, the outlook is complex. Autonomous AI is no longer just a theoretical prospect -- it is beginning to influence how companies operate and compete. However, success will likely depend on balancing automation with appropriate human oversight. Companies starting small-scale implementations now may position themselves better as these technologies evolve.
[57]
Google Announces Second Generation of Gemini AI Models
Google announced the second generation of its Gemini conversational AI on Wednesday, making a new, experimental version of Flash, its smaller and cheaper AI, available to developers. A generally available version of the model, which will be able to process and output text, images and audio, will come in January, the company said. Google also announced updates to Project Astra, its AI assistant
[58]
Google Crushes '12 Days of OpenAI' With Just 1 Day of Gemini 2
All eyes are on Google. The tech giant has reportedly been stealing the spotlight from OpenAI. Even AI insider Jimmy Apples agrees. "Beat down," said Apples, sharing a meme featuring the faces of Google CEO Sundar Pichai and co-founder Larry Page morphed into Justice's Stress video, burning OpenAI (the car) in a chaotic celebration of dominance. Adding fuel to the fire, Google recently took the internet by storm with the latest release of the much-anticipated Gemini 2 - a first-of-its-kind model that supports multimodal inputs like images, video, and audio. OpenAI, on the other hand, is going through a brief turbulence as Sora and ChatGPT's website experienced an outage, compelling its team to have sleepless nights fixing them on the backdrop of the '12 Days of OpenAI' livestream sprint. As they say, after every storm comes a rainbow. OpenAI is hinting at dropping something big, maybe even an 'AGI bomb', at the end of the event. This comes amid the backdrop of OpenAI's executive accidentally revealing the 'Super Secret AGI' calendar on the fifth day of shipmas. "AGI is in the air," said OpenAI co-founder and president Greg Brockman, in a cryptic post on X, hinting at a likely release of GPT-4.5 or GPT-5 in the coming days. That explains why Google has been in a hurry to ship products as if there is no tomorrow. Meanwhile, Amazon also released Nova and plans to release any-to-any model sometime next year. To date, OpenAI has launched Sora, its video generation model, along with o1, o1 Pro, and Canvas, alongside supporting Apple Intelligence for free. One thing is surely becoming clearer: Google seems to be focusing more on the impact side of things and less on the hype - just like in the case of releasing quantum chip Willow, alongside the release of a playable 3D world model Genie 2, which has been suspiciously gaining praises from former OpenAI co-founder Elon Musk, who once warned about Google DeepMind's AI dominance. Despite the differences, OpenAI chief Sam Altman also lauded Google's quantum breakthrough. "Big congrats!" Needless to say, Google really showed the world how AI is really done in 2024. 'The OG of AI' - in all sense, and Pichai couldn't agree more. Speaking of impact, the tech giant recently released Multimodal Live API, which offers real-time audio and video-streaming input, as well as the capability to combine various tools. Since its launch, many in the developer community have been experimenting with it. "Gemini 2.0 real-time AI is absolutely wild! Watch how I use it as an AI research assistant by sharing my screen and asking it about an AI paper. Ten times your paper reading skills, or just let Gemini summarise key points. What an incredible time to be alive!" said Elvis Saravia, co-founder of DAIR.AI, in a post on X. "Google Gemini 2.0 Flash's multimodal feature is next-level! I can't believe Google rolled this out before OpenAI. It takes live feeds and answers in real time with almost no latency. Pretty mind-blowing," AI evangelist Ashutosh Shrivastava said. OpenAI might be feeling the pressure as it hasn't yet launched advanced voice mode (AVM) with vision capabilities. However, a recent video surfaced showing Brockman demonstrating ChatGPT with real-time voice and vision features, indicating that the release might be imminent. Similar to OpenAI's 'Operator' agent, which is set to roll out early next year, Google launched Project Mariner yesterday, challenging Microsoft's Copilot Vision and Anthropic's Computer Use. This AI agent, Mariner, can browse the internet and reason across everything on your browser screen, including pixels and web elements like text, code, images, and forms. It can type, scroll, or click in the active tab on your browser. "Book a flight from San Francisco to Berlin, departing on March 5 and returning on the 12. The era of being able to give a computer a fairly complex high-level task and have it go off and do a lot of the work for you is becoming a reality," said Jeff Dean, chief scientist at Google DeepMind. "I think it's the future of computer control, where systems can understand what's on your browser, the interface, and create a new way of using the web through a more intuitive UI," said Google DeepMind chief Demis Hassabis in a recent interview with The Rundown CEO Rowan Cheung. In addition to this, Google announced new updates to Project Astra, which it introduced earlier this year at Google I/O. Google described it as a universal digital assistant capable of understanding and interacting with the world in real-time through multiple modalities, including text, speech, images, and video. Hassabis explained that the future goal with Astra is to build an assistant that knows the preferences of the users and what they are trying to achieve. He said their long-term goal is for it to have infinite memory. This is similar to what Microsoft is trying to do with Copilot Vision. Mustafa Suleyman, chief of Microsoft AI, recently revealed that AI with "near-infinite memory that just doesn't forget" is coming in 2025 and will be "truly transformative". There is more. Google released Trillium, its sixth-generation tensor processing unit, and 'Deep Research', an AI research assistant feature in Gemini Advanced. Google also introduced Jules, a developer-focused agent that integrates with GitHub workflows to assist with coding tasks under supervision. Google currently deserves praise for its groundbreaking products and AI models, but equal credit goes to OpenAI for setting the benchmark in the AI landscape. With just seven days left in the '12 Days of OpenAI,' the ball is firmly in its court. OpenAI clearly knows what it's doing - its quirky sense of fashion, Christmas vibes, and dad jokes have kept the livestreams entertaining and well-deserving of all the spotlight. This is unlike anything the industry has seen before, sending shockwaves through competitors and allies alike.
[59]
Google's Project Astra: The Next Big Leap in Generative AI
Google's Project Astra: Pioneering the Future of Generative AI Applications Google DeepMind's Project Astra, unveiled during a live demo in London, is turning heads in the race to make generative AI a mass-market reality. Built on the powerful Gemini 2.0 multimodal language model, Astra embodies Google's vision of a "universal assistant" capable of seamlessly handling tasks across text, speech, image, and video. Astra leverages Gemini 2.0's agent framework to perform tasks like answering queries, identifying objects, and even recalling past interactions. During the demo, Astra demonstrated impressive capabilities -- analyzing a cookbook recipe, selecting wine pairings, and identifying art replicas in an interactive gallery. Though still glitchy, its ability to adapt and learn from corrections highlights its potential as a personalized AI companion. What sets Astra apart is its deep integration with Google's suite of tools like Search, Maps, and Lens. For example, it can retrieve door codes from emails, track bus routes in real-time, or provide art history insights as you pass by a public sculpture. This merger of multimodal reasoning with real-world applications could position Astra as generative AI's long-awaited "killer app." Beyond Astra, Google DeepMind's Gemini 2.0 has also powered other innovative products, including Mariner (a web-browsing assistant), Jules (a coding tool), and Gemini for Games, which offers in-game assistance. These advancements reflect a broader push to redefine what AI can achieve beyond traditional benchmarks. However, challenges remain. Astra's performance, while enthralling, is inconsistent, and its lack of transparency about its inner workings raises questions about user trust and data privacy. Experts in AI research applaud the project's ambition, particularly in multimodal reasoning and memory recall, but acknowledge that the road to mass adoption is still long. However, these remain the challenges that Astra faces at the moment; however, anyone can see the company's potential. With a continuation of its current developmental path, it has the potential to revolutionise human interaction with endeavoured technology by providing the fundamental groundwork for intuitive AI systems of the future. As Google DeepMind moves forward towards making new breakthroughs, the world should wait and look forward to it.
Share
Share
Copy Link
Google's Gemini 2.0 introduces advanced multimodal AI capabilities, integrating text, image, and audio processing with improved performance and versatility across various applications.
Google has introduced Gemini 2.0, a significant advancement in artificial intelligence that promises to revolutionize how we interact with technology. This latest iteration of Google's AI model brings enhanced multimodal capabilities, improved performance, and broader integration across Google's ecosystem [1][2].
Gemini 2.0 stands out for its ability to seamlessly process and generate multiple types of data, including text, images, audio, and video. Unlike its predecessors, which required converting non-text inputs into text for analysis, Gemini 2.0 can directly process native image and audio inputs. This approach eliminates information loss associated with translation, allowing for more nuanced understanding and interpretation of multimedia content [3][4].
The model demonstrates remarkable improvements in various tasks:
A key feature of Gemini 2.0 is its agentic AI capabilities, allowing it to execute complex, multi-step tasks that require planning and decision-making. This is exemplified in projects like:
Gemini 2.0 is being deeply integrated across Google's product suite, including Search, Maps, and Workspace. This integration aims to provide a more unified and seamless user experience, enhancing productivity and collaboration in various professional settings [3][4].
The new model, particularly its Flash version, boasts significant performance enhancements:
Google is making Gemini 2.0 accessible through Google AI Studio, offering free credits for initial exploration. This allows developers and businesses to test the API's capabilities without significant upfront investment [2][5].
Gemini 2.0's versatility makes it suitable for a wide range of applications:
While some features are still in early access or experimental stages, the potential of Gemini 2.0 to transform industries and redefine AI-driven interactions is clear. As the technology continues to evolve, it is expected to unlock new possibilities in real-time problem-solving, creative content generation, and advanced data processing [2][3].
Despite its advancements, Gemini 2.0 faces some challenges:
As Google continues to refine and expand Gemini 2.0's capabilities, addressing these limitations will be crucial for its widespread adoption and impact across various sectors.
Reference
[1]
[2]
[4]
Google has announced the release of new Gemini models, showcasing advancements in AI technology. These models promise improved performance and capabilities across various applications.
2 Sources
Recent leaks suggest Google is preparing to launch Gemini 2.0, a powerful AI model that could rival OpenAI's upcoming o1. The new model promises enhanced capabilities in reasoning, multimodal processing, and faster performance.
5 Sources
Google has released a new AI model, Gemini 2.0 'Experimental Advanced', available exclusively to Gemini Advanced subscribers. This update follows closely on the heels of the Gemini 2.0 Flash release and promises improved performance in complex tasks.
8 Sources
Google introduces Gemini 2.0 Flash Thinking, an experimental AI model that showcases its reasoning process, offering enhanced problem-solving capabilities and transparency in decision-making.
20 Sources
Google is expected to release Gemini 2.0, the next generation of its AI model, in December 2024. This launch comes amid intense competition in the AI industry and may bring new capabilities and advancements to the field.
8 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved