Curated by THEOUTPOST
On Fri, 11 Apr, 4:02 PM UTC
2 Sources
[1]
DeepMind CEO Demis Hassabis says Google will eventually combine its Gemini and Veo AI models | TechCrunch
In a recent appearance on "Possible," a podcast co-hosted by LinkedIn co-founder Reid Hoffman, Google DeepMind CEO Demis Hassabis said Google plans to eventually combine its Gemini AI models with its Veo video-generating models to improve the former's understanding of the physical world. "We've always built Gemini, our foundation model, to be multimodal from the beginning," Hassabis said, "and the reason we did that [is because] we have a vision for this idea of a universal digital assistant, an assistant that [...] actually helps you in the real world." The AI industry is moving gradually toward "omni" models, if you will -- models that can understand and synthesize many forms of media. Google's newest Gemini models can generate audio as well as images and text, while OpenAI's default model in ChatGPT can natively create images -- including, of course, Ghibli-style art. Amazon has also announced plans to launch an "any-to-any" model later this year. These omni models require a lot of training data -- images, videos, audio, text, and so on. Hassabis implied that the video data for Veo is coming mostly from YouTube, a platform that Google owns. "Basically, by watching YouTube videos -- a lot of YouTube videos -- [Veo 2] can figure out, you know, the physics of the world," Hassabis said. Google previously told TechCrunch its models "may be" trained on "some" YouTube content in accordance with its agreement with YouTube creators. Reportedly, Google broadened its terms of service last year in part to allow the company to tap more data to train its AI models.
[2]
YouTube is quietly training Google's next AI brain
Google DeepMind CEO Demis Hassabis revealed plans to eventually fuse the company's Gemini AI with its Veo video generator, aiming to teach the AI more about the physical world, during a recent appearance on the Possible podcast. Hassabis explained the strategy aligns with their vision for a "universal digital assistant" capable of aiding users in real-world scenarios. "We've always built Gemini, our foundation model, to be multimodal from the beginning," he stated on the podcast co-hosted by Reid Hoffman. This move reflects a broader industry shift towards versatile "omni" models. Google's latest Gemini versions already handle audio, image, and text generation, while rivals like OpenAI enable image creation in ChatGPT, and Amazon intends to launch an "any-to-any" model. Developing these comprehensive models demands vast datasets spanning video, images, audio, and text. Hassabis hinted that the video data fueling Veo largely originates from YouTube, a Google-owned platform. He elaborated that by processing extensive YouTube content, Veo learns about real-world physics. "[Veo 2] can figure out, you know, the physics of the world," Hassabis commented regarding the model watching "a lot of YouTube videos." Google previously acknowledged to TechCrunch its models "may be" trained on "some" YouTube content, consistent with agreements with creators. Reports suggest Google updated its terms of service last year, potentially expanding access to data for AI training purposes.
Share
Share
Copy Link
Google DeepMind CEO Demis Hassabis reveals plans to combine Gemini AI with Veo video generator, aiming to create a universal digital assistant with improved real-world understanding. The move highlights the industry trend towards versatile "omni" AI models.
Google DeepMind CEO Demis Hassabis has unveiled ambitious plans to merge the company's Gemini AI models with its Veo video-generating models. This strategic move aims to enhance Gemini's understanding of the physical world, bringing Google closer to its vision of creating a "universal digital assistant" 1.
During an appearance on the "Possible" podcast, co-hosted by LinkedIn co-founder Reid Hoffman, Hassabis explained, "We've always built Gemini, our foundation model, to be multimodal from the beginning, and the reason we did that [is because] we have a vision for this idea of a universal digital assistant, an assistant that [...] actually helps you in the real world" 2.
The planned merger of Gemini and Veo reflects a broader industry trend towards developing versatile "omni" models capable of understanding and synthesizing multiple forms of media. These advanced AI systems can process and generate various types of content, including text, images, audio, and video 1.
Google's latest Gemini models already demonstrate multimodal capabilities, generating audio, images, and text. Similarly, OpenAI's ChatGPT now includes native image creation features. Amazon has also announced plans to launch an "any-to-any" model later this year, further highlighting the industry's direction 1.
A key aspect of Google's strategy involves using YouTube as a primary source of training data for its AI models. Hassabis revealed that Veo 2, the latest iteration of their video-generating model, learns about real-world physics by processing vast amounts of YouTube content 2.
"Basically, by watching YouTube videos -- a lot of YouTube videos -- [Veo 2] can figure out, you know, the physics of the world," Hassabis explained 1. This approach allows the AI to gain a deeper understanding of real-world dynamics and interactions.
Google's use of YouTube content for AI training raises questions about data usage and creator agreements. The company has previously stated that its models "may be" trained on "some" YouTube content, in accordance with its agreements with creators 1.
Reports suggest that Google broadened its terms of service last year, potentially to allow for expanded use of data in AI model training. This move highlights the ongoing debate surrounding data privacy and the ethical use of user-generated content in AI development 2.
The planned integration of Gemini and Veo models represents a significant step towards creating more sophisticated and versatile AI systems. By combining language understanding with visual comprehension, Google aims to develop AI assistants that can better interact with and understand the physical world 12.
This advancement could lead to more intuitive and capable AI applications across various sectors, from personal assistance to industrial automation. However, it also underscores the need for continued discussions on data privacy, ethical AI development, and the potential societal impacts of increasingly advanced AI systems.
Google introduces Veo 2, its AI-powered video generation model, to Gemini Advanced subscribers, marking a significant step in the competitive AI video creation landscape.
28 Sources
28 Sources
Google has announced the integration of its Gemini AI team into DeepMind, continuing its efforts to streamline AI research and development. This move aims to accelerate AI innovation and maintain Google's competitive edge in the rapidly evolving field.
9 Sources
9 Sources
An APK teardown of the Google app suggests that Gemini, Google's AI assistant, might soon be able to generate videos. This potential feature could significantly expand Gemini's capabilities beyond its current text and image generation abilities.
2 Sources
2 Sources
Google DeepMind unveils Gemini Robotics, an AI model that enables robots to perform complex tasks with improved generalization, adaptability, and dexterity. The technology shows promise in creating more intuitive and capable robots for various applications.
30 Sources
30 Sources
Google CEO Sundar Pichai outlines an aggressive plan to scale Gemini AI in 2025, aiming to outpace rivals like OpenAI and establish dominance in the AI market amid increasing competition and regulatory scrutiny.
8 Sources
8 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved