Google unveils Gemini Omni, a multimodal AI that generates videos from any input at I/O

Reviewed byNidhi Govil

24 Sources

Share

Google introduced Gemini Omni at its I/O developer conference, marking a significant step toward multimodal AI that can create videos from text, images, and audio. The AI content generation tool includes digital avatar capabilities and SynthID watermarking to combat deepfakes, but raises concerns about AI slop flooding social media platforms.

Google AI Takes Multimodal Video Generation to New Heights

Google unveiled Gemini Omni at its I/O developer conference, introducing a multimodal AI model that CEO Sundar Pichai says will be able to "create anything from any input."

1

The launch represents a concrete step toward Google's three-year-old vision of building a single neural network trained on text, image, audio, and video that can generate content in any format. Unlike simple stitching tools, Gemini Omni reasons across all inputs to produce consistent outputs that demonstrate understanding of physics, culture, history, and science.

Source: ET

Source: ET

Filling the Void Left by OpenAI Sora

The timing of Google's announcement is notable, as it arrives after OpenAI discontinued both the Sora app and web experience last month to redirect AI computing power elsewhere.

4

Google is positioning Gemini Omni as more than just an update to its existing Veo video model. Nicole Brichtova, Google DeepMind director of product management, emphasized that "it's the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models." The tool can generate video from text and images while incorporating advanced physics capabilities that accurately simulate forces like gravity, kinetic energy, and fluid dynamics.

3

Digital Avatars Enable Video Cloning Capabilities

One of the most intriguing and controversial features allows users to create AI-generated video clips with digital avatars that look and sound like themselves.

4

To prevent deepfakes, users must complete a dedicated onboarding process that involves recording themselves speaking a series of numbers. The avatar then gets stored for future use, enabling creators to generate videos without appearing on camera themselves. Google is framing this as a tool for reimagining personal photos or videos by adding fictional AI elements, which might help sidestep potential legal battles that plagued OpenAI Sora.

4

Source: Lifehacker

Source: Lifehacker

SynthID Watermarking Addresses Authenticity Concerns

All videos created with Gemini Omni will include Google's SynthID digital fingerprinting technology, allowing users to verify whether content was generated via Gemini products. Google is also adding Content Credentials verification across its Gemini app to show whether content was created with AI or a camera, and whether it's been edited with AI.

2

This comes as CNET research found that 51% of US adults believe we need better AI labels online, and 94% believe they see AI-generated or altered content on social media.

2

Only 44% say they can confidently distinguish real content from AI-generated photos and videos.

Gemini Omni Flash Launches Across Multiple Platforms

The first model in the family, Gemini Omni Flash, rolled out to the Gemini app, YouTube Shorts, and AI creative studio Google Flow. Flash can render 10 seconds of video, which Brichtova clarified isn't a model limitation but rather a decision based on getting it into more hands and anticipating that most users won't want much longer videos yet. Longer video durations are planned for the near future. During a media briefing, DeepMind chief technologist Koray Kavukcuoglu demonstrated how Omni could quickly render a claymation explainer video about protein folding from a simple prompt, complete with accurate scientific voice-over.

Source: CNET

Source: CNET

Enterprise and Creative Applications on the Horizon

While Google is pitching Gemini Omni Flash as primarily a consumer tool for creating personalized content, the enterprise implications are substantial. Google will make Gemini Omni available via API in the coming weeks, enabling developers and enterprise customers to build custom integrations.

5

The model's text-rendering capabilities could prove particularly valuable for advertising, allowing marketers to place products or slogans seamlessly into generated videos. An end-to-end multimodal workflow could transform how advertisers and filmmakers approach content creation.

Growing Skepticism About AI Content Generation

Despite Google's technical achievements, consumer sentiment reveals significant hesitancy toward AI-generated content. CNET found that only 11% of people say AI content is useful, informative, or entertaining, while 21% believe there should be a total ban on AI-generated content on social media.

2

Critics argue that between Nano Banana Pro and Gemini Omni, Google appears to be creating a paradox—the same tech giant providing tools to create AI-generated content is also developing tools to verify it.

2

The concern is that Gemini Omni will simply add to the growing volume of AI slop flooding social media feeds.

Path Toward Artificial General Intelligence

Google considers Gemini Omni a critical step toward building artificial general intelligence and world models that can accurately simulate reality.

4

Pichai explained that "with world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction." The long-term vision extends beyond video generation to include generating images from audio or audio from video. Google is also working on an even more powerful Omni Pro model for future release.

5

As the technology advances, questions remain about how society will navigate the tension between creative possibilities and concerns about authenticity, privacy, and the proliferation of synthetic media.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved