12 Sources
12 Sources
[1]
Google's AI videos get a big upgrade with Veo 3.1
It's getting harder to know what's real on the Internet, and Google is not helping one bit with the announcement of Veo 3.1. The company's new video model allegedly offers better audio and realism, along with greater prompt accuracy. The updated video AI will be available throughout the Google ecosystem, including the Flow filmmaking tool, where the new model will unlock additional features. And if you're worried about the cost of conjuring all these AI videos, Google is also adding a "Fast" variant of Veo. Veo made waves when it debuted earlier this year, demonstrating a staggering improvement in AI video quality just a few months after Veo 2's release. It turns out that having all that video on YouTube is very useful for training AI models, so Google is already moving on to Veo 3.1 with a raft of new features. Google says Veo 3.1 offers stronger prompt adherence, which results in better video outputs and fewer wasted compute cycles. Audio, which was a hallmark feature of the Veo 3 release, has allegedly gotten better, too. Veo 3's text-to-video was limited to 720p landscape output, but there's an ever-increasing volume of vertical video on the Internet. So Veo 3.1 can produce both landscape and portrait 16:9 video. Google previously said it would bring Veo video tools to YouTube Shorts, which use a vertical video format like TikTok. The release of Veo 3.1 probably opens the door to fulfilling that promise. You can bet Veo videos will show up more frequently on TikTok as well now that it fits the format. This release also keeps Google in its race with OpenAI, which recently released a Sora iPhone app with an impressive new version of its video-generating AI. A focus on filmmakers The Veo 3.1 model will be available across Google's AI ecosystem. You'll be able to create content with Veo 3.1 and Veo 3.1 Fast via the Gemini app, and developers will have access in Vertex AI and through the Gemini API. Using the Fast variant will help keep costs down when paying per token. Presumably, users of the Gemini app will get more Fast video generations -- we've asked Google about limits and will report if we hear back. Veo is the underlying model in Google's Flow filmmaking tool, and it's getting a few new capabilities thanks to the updated model. The Ingredients to Video, Frames to Video, and Extend features are now all compatible with generated audio. So you can upload multiple images as a reference or use images as a starting or end point while also adding custom audio to the clip. These same capabilities are offered in the API, and the Gemini app continues to accept reference images for Veo outputs. The app doesn't get all the Flow features, though. There are a couple of entirely new video features coming with Veo 3.1, too. Google says Veo 3.1 is better able to replicate the look of a video while making "precision" edits. So you'll be able to add an object to a clip while keeping the rest of it unchanged (more or less). Likewise, you can remove an element without changing the rest of the scene. Adding objects will be available in Flow and the API immediately. Removing objects won't be available in Flow just yet, but Google says that will be coming soon. The new video model begins rolling out today, so make sure you use a skeptical eye when scrolling vertical videos.
[2]
Google releases Veo 3.1, adds it to Flow video editor | TechCrunch
Google launched its new video model Veo 3.1 with improved audio output, granular editing controls, and better output for image to video. It said that Veo 3.1 builds on May's Veo 3 release and generates more realistic clips and adheres to prompts better. The model allows users to add an object to the video and have it blend into the clip's style, Google said. Soon, users will be able to remove an existing object from the video in Flow, too. Veo 3 already has edit features such as adding reference images to drive a character, providing the first and last frame to generate a clip using AI, and the ability to extend an existing video based on the last few frames. With Veo 3.1, Google is adding audio to all these features to make the clips more lively. The company is rolling out the model to its video editor Flow, the Gemini App, along with Vertex and Gemini APIs. It said that since Flow's launch in May, users have created more than 275 million videos on the app.
[3]
Google Drops New Veo 3 AI Video Model Amid Sora Hype
Google wants you to take a break from OpenAI's Sora and try out its new AI video model. The newest version of its flagship AI video generator is here, named Veo 3.1, the company announced Wednesday. Veo 3.1 is available now for paying Gemini users and through Flow, the Gemini API and Vertex AI. The new version of Veo will have some features you may recognize if you've used Flow, Google's AI filmmaking program. Ingredients to video will let you upload separate assets that Veo will combine in the final video, which first debuted on Flow. You'll also be able to add objects to existing assets, with the much-needed ability to remove objects coming soon. You can now also give Veo a starting and ending still shot, and it will generate an AI transition to blend the two images into a short video clip. Short clips can now be extended to be over a minute long, another way to smooth out transitions between clips. These transition tools will be helpful for creators, as AI video has previously relied on many hard jump cuts between short clips. Google's Veo 3 dropped earlier this year at its I/O developers conference and quickly found fans. It was the first AI video generator to include native, AI-generated synchronized audio. Google has been investing heavily in generative media this year. Its nano banana AI image model quickly gained popularity. But Google's dominance has been challenged by OpenAI. The ChatGPT maker dropped a new version of its AI video generator, Sora, and created a TikTok-like social media app. Sora has been the topic of much debate, with enthusiasts eagerly snatching up invite codes and, more concerningly, experts worried about its ability to create convincing deepfakes and further fill the internet with AI slop. AI video generators like Veo and Sora highlight the controversial role generative AI plays in creative industries. Many videographers, filmmakers and creators are concerned about how AI is trained on their existing material and deployed by studios and streamers. While hotly contested debates rage on, many artists and authors are taking AI companies to court over alleged copyright infringement and other intellectual property issues. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
[4]
Google's Veo 3.1 can turn separate images into a single video
Once upon a time, animators had to painstakingly work frame-by-frame, stitching together long strings of still images to create the illusion of motion. Today, they only need to upload a few images, and AI will do the rest. On Wednesday, Google DeepMind released its latest video-generating AI model, Veo 3.1, available now in Flow, Vertex AI, the Gemini API, the Gemini App, and Vids. The company also released a smaller, less powerful version of the model called Veo 3.1 Fast. Also: I used Google's photo-to-video AI tool on my selfie - and it made me do the tango Veo 3.1 specializes in blending disparate images into natural-looking videos, significantly reducing the time and resources that have historically been required for video production. Amazon also recently debuted an AI tool which allows brands to generate short video ads from still images of products in a matter of seconds. Google's new model arrives less than four months after the public launch of its predecessor, Veo 3, which quickly became a hit because of its ability to generate video with synchronized audio. Google also later upgraded that model with the ability to generate short videos from a single image. Veo 3.1 also comes with that feature and more. According to a promotional deck from Google shared with ZDNET, the model "offers richer audio and enhanced realism that captures true to life textures." It also has a more sophisticated "understanding of storytelling, cinematic styles, and character interactions," the company wrote. Veo 3.1 blends multiple images to create a single, natural-looking video, like an AI blender that takes separate assets and combines them into a single visual smoothie. Also: Try Google's Nano Banana image generator in Search and NotebookLM - here's how An image of a woman's face, another of a collection of clothing grouped together, and a third of an ornate-looking room could, for example, prompt the model to create a short video clip of the woman wearing the pictured clothes and strolling through the room (no obviously detectable extra fingers included). More interestingly, you can upload images which, at first glance, you'd never expect could be brought together in any kind of comprehensible way. This is where the "creativity" (to use a loaded term) of Veo 3.1 shines brightest. Want more stories about AI? Sign up for AI Leaderboard, our weekly newsletter. A demo provided by Google showing one image of a decorated Christmas tree behind a pair of sliding doors and another of a psychedelic mixture of colors -- resembling a collection of various paint colors blended together -- creates a video of the doors sliding open to release a flood of multicolored, Christmas ornament-sized balls, like a Surrealist reimagining of the blood-filled elevator in The Shining. Veo 3.1 also allows users to upload just two images -- the first and last in a sequence -- and the model will automatically fill in the intermediary blank spot with video. Also: You can test Microsoft's new in-house AI image generator model now - here's how In one demo video, for example, Google shows an image of an old, rustic barn, with low sunlight pouring through the entryway, and another of a cowboy astride a horse, which appears to be casually trotting through tall grass. Veo 3.1 combines these two images by panning the camera through the barn's doorway until all we see is the (now actually moving) cowboy. The first and last image feature is available now on Flow, Vertex AI, and the Gemini API, but not the Gemini App. In that demo video and in others provided by Google, both the first and last images have similar lighting and artistic aesthetics. Uploading two images that are completely distinct from and unrelated to one another -- a black and white image of a Ferrari paired with a color pencil sketch of an orange tree, say -- will yield less predictable results. Veo 3.1 also comes with a new scene extension feature, through which users can easily lengthen their AI-generated video clips, along with another capability that allows them to add or remove visual elements to and from existing videos.
[5]
Google's Veo 3.1 is better at generating videos from images
Google has released a new update to its Veo AI video generation model that should make it do a better job of sticking to prompts and converting images into videos. Veo 3.1 is available to try today through Google's Gemini API and is now also powering the company's Flow video editor. Veo 3.1 builds on the new capabilities Google introduced with launch of Veo 3 at Google I/O 2025. The new model offers better "prompt adherence," according to Google, and should have an easier time creating videos based on the image "ingredients" you upload alongside your written prompt. Veo 3.1 also makes it possible to convert images to video and generate audio at the same time, a capability that wasn't available with Veo 3. In Flow, Veo 3.1 supports at least a new feature that gives you finer control over the videos you generate. With what Google calls "Frame to Video," Flow lets you upload a first and last frame, and then generates the video in-between. Adobe Firefly, which is powered by Veo 3, offers a similar feature, but Flow will be able to pull it off and create audio at the same time. Those added audio skills will also apply to the video editor's ability to extend clips and insert objects into existing footage, too. Based on the samples Google's shared, videos generated with Veo 3.1 still have an uncanny quality that seems to vary greatly depending on the prompt and subject. Even if it's missing some of the realism of OpenAI's Sora 2, though, the company's decision to try and make Veo more useful to people who actually work with video rather than a source of social media spam is a welcome move.
[6]
Google's filmmaking tool is getting a power-up from the rollout of Veo 3.1
With these tools, you'll be able to more precisely edit your AI-generated videos. Veo 3 was first introduced earlier this year, around Google I/O in May. The AI video generator, capable of producing clips from text and images, powers the company's AI filmmaking tool, known as Flow. Google has now rolled out a newer version of the model, Veo 3.1, which brings some improvements to Flow, like stronger prompt adherence and improved audiovisual quality, and adds some new editing capabilities. In addition to being available in Flow, Veo 3.1 can also be accessed through the Gemini API or Vertex AI. The new video AI model will also be available in the Gemini app.
[7]
Google announces Veo 3.1 and updates Flow with more controls, tools
Google today announced Veo 3.1 as its latest video generation model, with Flow getting a number of updates to take advantage of the latest capabilities. Compared to Veo 3, which was announced at I/O 2025 in May, this new version offers richer audio and "enhanced realism that captures true-to-life textures." Veo 3.1 has a deeper understanding of storytelling, cinematic styles, and character interactions to give you more narrative control. The image-to-video capability benefits from improved audio-visual quality and better follows your prompt. Veo 3.1 and Veo 3.1 Fast are available in the Gemini app, as well as the Gemini API and Vertex AI, to power text-to-video and image-to-video for horizontal (16×9) and vertical (9×16) outputs. Meanwhile, Google is updating the Flow filmmaking tool to take advantage of Veo 3.1. Audio generation is coming to: Flow is getting new editing capabilities. You can insert elements like objects, characters, and details, with Google handling shadows, scene lighting, and other complex details to make everything look natural. Coming soon is the ability to remove objects and characters from a scene, with Flow working to reconstruct the background and surroundings to make the edit seamless.
[8]
Google releases new AI video model Veo 3.1: what it means for enterprises
As expected after days of leaks and rumors online, Google has unveiled Veo 3.1, its latest AI video generation model, bringing a suite of creative and technical upgrades aimed at improving narrative control, audio integration, and realism in AI-generated video. While the updates expand possibilities for hobbyists and content creators using Google's online AI creation app, Flow, the release also signals a growing opportunity for enterprises, developers, and creative teams seeking scalable, customizable video tools. The quality is higher, the physics better, the pricing the same as before, and the control and editing features more robust and varied. My initial tests showed it to be a powerful and performant model that immediately delights with each generation. However, the look is more cinematic, polished and a little more "artificial" than by default than rivals such as OpenAI's new Sora 2, released late last month, which may or may not be what a particular user is going after (Sora excels at handheld and "candid" style videos). Expanded Control Over Narrative and Audio Veo 3.1 builds on its predecessor, Veo 3 (released back in May 2025) with enhanced support for dialogue, ambient sound, and other audio effects. Native audio generation is now available across several key features in Flow, including "Frames to Video," "Ingredients to Video," and "Extend," which give users the ability to, respectively: turn still images into video; use items, characters and objects from multiple images in a single video; and generate longer clips than the initial 8 seconds, to more than 30 seconds or even 1+ plus when continuing from a prior clip's final frame. Before, you had to add audio manually after using these features. This addition gives users greater command over tone, emotion, and storytelling -- capabilities that have previously required post-production work. In enterprise contexts, this level of control may reduce the need for separate audio pipelines, offering an integrated way to create training content, marketing videos, or digital experiences with synchronized sound and visuals. Google noted in a blog post that the updates reflect user feedback calling for deeper artistic control and improved audio support. Gallegos emphasizes the importance of making edits and refinements possible directly in Flow, without reworking scenes from scratch. Richer Inputs and Editing Capabilities With Veo 3.1, Google introduces support for multiple input types and more granular control over generated outputs. The model accepts text prompts, images, and video clips as input, and also supports: * Reference images (up to three) to guide appearance and style in the final output * First and last frame interpolation to generate seamless scenes between fixed endpoints * Scene extension that continues a video's action or motion beyond its current duration These tools aim to give enterprise users a way to fine-tune the look and feel of their content -- useful for brand consistency or adherence to creative briefs. Additional capabilities like "Insert" (add objects to scenes) and "Remove" (delete elements or characters) are also being introduced, though not all are immediately available through the Gemini API. Deployment Across Platforms Veo 3.1 is accessible through several of Google's existing AI services: * Flow, Google's own interface for AI-assisted filmmaking * Gemini API, targeted at developers building video capabilities into applications * Vertex AI, where enterprise integration will soon support Veo's "Scene Extension" and other key features Availability through these platforms allows enterprise customers to choose the right environment -- GUI-based or programmatic -- based on their teams and workflows. Pricing and Access The Veo 3.1 model is currently in preview and available only on the paid tier of the Gemini API. The cost structure is the same as Veo 3, the preceding generation of AI video models from Google. * Standard model: $0.40 per second of video * Fast model: $0.15 per second There is no free tier, and users are charged only if a video is successfully generated. This model is consistent with previous Veo versions and provides predictable pricing for budget-conscious enterprise teams. Technical Specs and Output Control Veo 3.1 outputs video at 720p or 1080p resolution, with a 24 fps frame rate. Duration options include 4, 6, or 8 seconds from a text prompt or uploaded images, with the ability to extend videos up to 148 seconds (more than 2 and half minutes!) when using the "Extend" feature. New functionality also includes tighter control over subjects and environments. For example, enterprises can upload a product image or visual reference, and Veo 3.1 will generate scenes that preserve its appearance and stylistic cues across the video. This could streamline creative production pipelines for retail, advertising, and virtual content production teams. Initial Reactions The broader creator and developer community has responded to Veo 3.1's launch with a mix of optimism and tempered critique -- particularly when comparing it to rival models like OpenAI's Sora 2. Matt Shumer, an AI founder of Otherside AI/Hyperwrite, and early adopter, described his initial reaction as "disappointment," noting that Veo 3.1 is "noticeably worse than Sora 2" and also "quite a bit more expensive." However, he acknowledged that Google's tooling -- such as support for references and scene extension -- is a bright spot in the release. Travis Davids, a 3D digital artist and AI content creator, echoed some of that sentiment. While he noted improvements in audio quality, particularly in sound effects and dialogue, he raised concerns about limitations that remain in the system. These include the lack of custom voice support, an inability to select generated voices directly, and the continued cap at 8-second generations -- despite some public claims about longer outputs. Davids also pointed out that character consistency across changing camera angles still requires careful prompting, whereas other models like Sora 2 handle this more automatically. He questioned the absence of 1080p resolution for users on paid tiers like Flow Pro and expressed skepticism over feature parity. On the more positive end, @kimmonismus, an AI newsletter writer, stated that "Veo 3.1 is amazing," though still concluded that OpenAI's latest model remains preferable overall. Collectively, these early impressions suggest that while Veo 3.1 delivers meaningful tooling enhancements and new creative control features, expectations have shifted as competitors raise the bar on both quality and usability. Adoption and Scale Since launching Flow five months ago, Google says over 275 million videos have been generated across various Veo models. The pace of adoption suggests significant interest not only from individuals but also from developers and businesses experimenting with automated content creation. Thomas Iljic, Director of Product Management at Google Labs, highlights that Veo 3.1's release brings capabilities closer to how human filmmakers plan and shoot. These include scene composition, continuity across shots, and coordinated audio -- all areas that enterprises increasingly look to automate or streamline. Safety and Responsible AI Use Videos generated with Veo 3.1 are watermarked using Google's SynthID technology, which embeds an imperceptible identifier to signal that the content is AI-generated. Google applies safety filters and moderation across its APIs to help minimize privacy and copyright risks. Generated content is stored temporarily and deleted after two days unless downloaded. For developers and enterprises, these features provide reassurance around provenance and compliance -- critical in regulated or brand-sensitive industries. Where Veo 3.1 Stands Among a Crowded AI Video Model Space Veo 3.1 is not just an iteration on prior models -- it represents a deeper integration of multimodal inputs, storytelling control, and enterprise-level tooling. While creative professionals may see immediate benefits in editing workflows and fidelity, businesses exploring automation in training, advertising, or virtual experiences may find even greater value in the model's composability and API support. The early user feedback highlights that while Veo 3.1 offers valuable tooling, expectations around realism, voice control, and generation length are evolving rapidly. As Google expands access through Vertex AI and continues refining Veo, its competitive positioning in enterprise video generation will hinge on how quickly these user pain points are addressed.
[9]
Google Veo 3.1 launches: See the upgrades
While OpenAI's Sora 2 has been getting all the attention recently, we found in our comparison that Google Veo 3 is a more capable video generation model altogether. Now, however, it looks like Google might be widening the gap with the Wednesday launch of Google Veo 3.1. On top of the basic Veo 3.1 model, there's also a new Veo 3.1 Fast model, which is basically a lighter weight version of Veo 3.1. They're both now available in Gemini, the Vertex AI platform, and Google Flow, the AI video editing tool that's designed around AI-generated videos and editing them together. So what's actually new with Google Veo 3.1? As you might expect, given the name, it's not necessarily a massive upgrade over Veo 3, which has only been out for a few months now. According to Google, Veo 3.1 offers richer audio as well as better narrative comprehension, which should help ensure that videos make more sense when put in the wider scope of the video. Last but not least, Google says Veo 3.1 offers enhanced realism when it comes to true-to-life textures. Some of the features available with Veo have been improved too, notably by adding audio capabilities. For example, Ingredients to Video is a feature that lets users upload reference images for Veo to use when creating a video. Users could upload a picture of a character and a location, and have Veo generate a video of the character in that location. This feature is available with audio in Gemini, the Vertex AI platform, and Google Flow, the video editing software that's designed around using Veoo to generate clips to use in a project. Another feature that's been upgraded with audio is Scene Extension, which essentially allows users to extend a video clip. It's available with audio in the Gemini API, but not the Gemini app. It's also available in Flow. Another feature is First and Last Frame, which allows users to upload a static image of the first and final frame in a video and have Veo generate a transition between those two frames. It can be used in the Gemini API, Vertex AI, and Flow. Additionally, Google is adding some precision features to Flow that could help users refine existing video clips. These include the ability to insert or remove objects in a way that looks realistic and natural, though we have yet to test it. Veo 3.1 is now available in Google Flow, Gemini, and Vertex AI. The other features vary in how widely available they are. You'll need a Google AI Pro subscription to use the new models.
[10]
Google Flow AI video creator adds creative controls that make Sora look tame
Merely days after OpenAI released its Sora 2 AI video generator app, which quickly went viral for some pretty morbid reasons, Google is giving a lift to its own AI video tools. The company's Flow video creation platform is getting upgraded to the new Veo 3.1 AI video model, alongside a bunch of new tricks in tow. The next-gen video AI model is now available to users in the Gemini app, as well, starting today. What's new? One of the biggest upgrades this time around is the ability to add audio, alongside a bunch of new creative controls in the Flow suite. For example, the new "Ingredients to video" system lets users upload multiple images to ensure that they can achieve just the right style and character control in the scene. The overarching idea is that instead of describing every minute detail, users can take the easier route. For example, you can upload the image of a person, pick another image with the background of your choice, and the third image is that of a costume for the character. Recommended Videos The AI will combine them all. It's not just convenient, but also offers a far more granular control over the videos you want to create, instead of dealing with hit-or-miss AI prompts. It's almost like creating hybrid emojis using the Emoji Kitchen system in Google's Gboard app. A whole new level of creative control In addition to controlling the scene elements, Flow is also getting a new "Frames to video" feature that lets users create a scene by simply supplying the start and end frames. Once the two images are uploaded, the AI will automatically stitch them together and create a seamless video out of it. Next, we have the new "Extend" tool. As the name makes it abundantly clear, it comes in handy for prolonging a scene without having to write a fresh prompt or add another image as the source material. "Flow creates a new video based on the final second of your original clip," source material. And finally, we have the new "Insert" feature, which lets users add anything to a video naturally without disturbing the background continuity. Soon, Google will also let Flow users remove objects from videos, somewhat like the Magic Eraser feature in the Google Photos app. As far as the Veo 3.1 model goes, it adds "richer audio, more narrative control, and enhanced realism that captures true-to-life textures" to videos created in Flow.
[11]
Google Unveils Veo 3.1 to Rival OpenAI's Sora 2 -- But Does It Deliver? - Decrypt
Google positions Veo as a professional-grade alternative in the crowded AI video market. Google released Veo 3.1 today, an updated version of its AI video generator that adds audio across all features and introduces new editing capabilities designed to give creators more control over their clips. The announcement comes as OpenAI's competing Sora 2 app climbs app store charts and sparks debates about AI-generated content flooding social media. The timing suggests Google wants to position Veo 3.1 as the professional alternative to Sora 2's viral social feed approach. OpenAI launched Sora 2 on September 30 with a TikTok-style interface that prioritizes sharing and remixing. The app hit 1 million downloads within five days and reached the top spot in Apple's App Store. Meta took a similar approach, with its own sort of virtual social media powered by AI videos. Users can now create videos with synchronized ambient noise, dialogue, and Foley effects using "Ingredients to Video," a tool that combines multiple reference images into a single scene. The "Frames to Video" feature generates transitions between a starting and ending image, while "Extend" creates clips lasting up to a minute by continuing the motion from the final second of an existing video. New editing tools let users add or remove elements from generated scenes with automatic shadow and lighting adjustments. The model generates videos in 1080p resolution at horizontal or vertical aspect ratios. The model is available through Flow for consumer use, the Gemini API for developers, and Vertex AI for enterprise customers. Videos lasting up to a minute can be created using the "Extend" feature, which continues motion from the final second of an existing clip. The AI video generation market has become crowded in 2025, with Runway's Gen-4 model targeting filmmakers, Luma Labs offering fast generation for social media, Adobe integrating Firefly Video into Creative Cloud, and updates from xAI, Kling, Meta, and Google targeting realism, sound generation, and prompt adherence. But how good is it? We tested the model, and these are our impressions. If you want to try it, you'd better have some deep pockets. Veo 3.1 is currently the most expensive video generation model, on par with Sora 2 and only behind Sora 2 Pro, which costs more than twice as much per generation. Free users receive 100 monthly credits to test the system, which is enough to generate around five videos per month. Through the Gemini API, Veo 3.1 costs approximately $0.40 per second of generated video with audio, while a faster variant called Veo 3.1 Fast costs $0.15 per second. For those willing to use it at that price, here are its strengths and weaknesses. Veo 3.1 is a definite improvement over its predecessor. The model handles coherence well and demonstrates a better understanding of contextual environments. It works across different styles, from photorealism to stylized content. We asked the model to blend a scene that started as a drawing and transitioned into live-action footage. It handled the task better than any other model we tested. Without any reference frame, Veo 3.1 produced better results in text-to-video mode than it did using the same prompt with an initial image, which was surprising. The tradeoff is movement speed. Veo 3.1 prioritizes coherence over fluidity, making it challenging to generate fast-paced action. Elements move more slowly but maintain consistency throughout the clip. Kling still leads in rapid movement, although it requires more attempts to achieve usable results. Veo built its reputation on image-to-video generation, and the results still deliver -- with caveats. This appears to be a weaker area in the update. When using different aspect ratios as starting frames, the model struggled to maintain the coherence levels it once had. If the prompt strays too far from what would logically follow the input image, Veo 3.1 finds a way to cheat. It generates incoherent scenes or clips that jump between locations, setups, or entirely different elements. This wastes time and credits, since these clips can't be edited into longer sequences because they don't match the format. When it works, the results look fantastic. Getting there is part skill, part luck -- mostly luck. This feature works like inpainting for video, letting users insert or delete elements from a scene. Don't expect it to maintain perfect coherence or use your exact reference images, though. For example, the video below was generated using these three references and the prompt: a man and a woman stumble upon each other while running in a futuristic city, where a Bitcoin sign hologram is rotating. The man tells the woman, "QUICK, BITCOIN CRASHED! WE MUST BUY MORE!! As you can see, neither the city nor the characters are actually there. However, characters are wearing the clothes of reference, the city resembles the one in the in the image, and things portray the idea of the elements, not the elements themselves. Veo 3.1 treats uploaded elements as inspiration rather than strict templates. It generates scenes that follow the prompt and include objects that resemble what you provided, but don't waste time trying to insert yourself into a movie -- it won't work. A workaround: use Nanobanana or Seedream to upload elements and generate a coherent starting frame first. Then feed that image to Veo 3.1, which will produce a video where characters and objects show minimal deformation throughout the scene. This is Google's selling point. Veo 3.1 handles lip sync better than any other model currently available. In text-to-video mode, it generates coherent ambient sound that matches scene elements. The dialogue, intonation, voices, and emotions are accurate and beat competing models. Other generators can produce ambient noise, but only Sora, Veo, and Grok can generate actual words. Of those three, Veo 3.1 requires the fewest attempts to get good results in text-to-video mode. This is where things fall apart. Image-to-video with dialogue suffers from the same issues as standard image-to-video generation. Veo 3.1 prioritizes coherence so heavily that it ignores prompt adherence and reference images. For example, this scene was generated using the reference shown in the elements to video section. As you can see, our test generated a completely different subject than the reference image. The video quality was excellent -- intonation and gestures were spot-on -- but it wasn't the person we uploaded, making the result useless. Sora's remix feature is the best choice for this use case. The model may be censored, but its image-to-video capabilities, realistic lip sync, and focus on tone, accent, emotion, and realism make it the clear winner. Grok's video generator comes in second. It respected the reference image better than Veo 3.1 and produced superior results. Here is one generation using the same reference image and prompt. If you don't want to deal with Sora's social app or lack access to it, Grok might be your best option. It's also uncensored but moderated, so if you need that particular approach, Musk has you covered.
[12]
Google Launches Veo 3.1 and Upgrades Its Flow AI Filmmaking Tool
The new Veo 3.1 model is already live in the Gemini app and the upgraded Flow AI tool is available starting today. Google has upgraded its powerful Veo 3 video generation model, and launched Veo 3.1 which delivers enhanced realism, better textures, lighting, and cinematic sound design. It's part of an update to Flow, Google's AI filmmaking tool, which brings granular scene editing tools, the ability to insert or remove objects, and generate longer, seamless clips. The Flow AI tool and Veo 3 AI model was launched five months ago during Google I/O 2025. Since then, video creators have requested Google to bring more artistic control and audio control across all features. Now, Google has launched an update to Flow with Veo 3.1, introducing many of those creative capabilities. Users now have greater narrative and visual control and can generate rich audio with improved realism. The Veo 3.1 AI model has better prompt adherence and can follow the storytelling intent much more accurately. You can also now use multiple reference images to define characters, objects, and style. The Flow AI tool can combine those images to create a cohesive scene. Not just that, the generated video now includes audio as well. Next, you can provide a starting and ending image, and Flow creates a seamless video between both the scenes. Plus, you can now extend an existing clip to create longer shots, which goes up to a minute or more. This can be great if you want to generate a continuous shot. It uses the last frame of your clip to continue motion. Finally, you can add new elements or objects into a scene and Flow adjusts lighting and shadows automatically to improve realism. Not to mention, you can remove unwanted objects or characters from a scene as well. You can use the new Veo 3.1 model in the Gemini app and Flow, starting today.
Share
Share
Copy Link
Google's latest AI video model, Veo 3.1, introduces enhanced features for video generation, including improved audio, better prompt adherence, and new editing capabilities. The update aims to revolutionize video creation across Google's ecosystem.
Google has unveiled its latest AI video model, Veo 3.1, marking a significant advancement in the realm of AI-generated content. This update builds upon the success of Veo 3, which was released earlier this year, and introduces a range of new features and improvements aimed at enhancing the quality and versatility of AI-generated videos
1
.Source: Beebom
Veo 3.1 boasts several key improvements over its predecessor. The new model offers stronger prompt adherence, resulting in better video outputs and fewer wasted compute cycles. Audio quality, a hallmark feature of Veo 3, has been further enhanced in this update
1
. The model now supports both landscape and portrait 16:9 video formats, catering to the increasing demand for vertical video content on platforms like YouTube Shorts and TikTok1
.One of the most notable additions to Veo 3.1 is its ability to blend multiple images into a single, cohesive video. This feature, called 'Ingredients to Video,' allows users to upload separate assets that Veo will combine into a final video
3
4
.The model also introduces a 'Frame to Video' feature, enabling users to provide starting and ending frames, with Veo 3.1 generating the transition between them
3
5
. Additionally, Veo 3.1 allows for the extension of short clips to over a minute in length, providing smoother transitions between scenes3
.Source: engadget
Veo 3.1 is being integrated across Google's AI ecosystem. It will be available through the Gemini app, Vertex AI, and the Gemini API
2
. The model is also powering Google's Flow video editor, where it unlocks additional features such as adding objects to existing assets and generating audio simultaneously with video conversion5
.Source: Mashable
Related Stories
The release of Veo 3.1 comes at a time of intense competition in the AI video generation space, with companies like OpenAI also making significant strides with their Sora model
3
. This advancement in AI-generated video technology has sparked discussions about its potential impact on creative industries, with concerns raised about copyright infringement and the changing landscape of video production3
.Google has also announced a 'Fast' variant of Veo 3.1, aimed at reducing costs for users paying per token
1
. The company plans to introduce additional features in the near future, including the ability to remove objects from videos in the Flow editor2
.As AI-generated video technology continues to evolve, it promises to revolutionize content creation, making sophisticated video production more accessible to a wider range of users. However, it also raises important questions about the authenticity of online content and the future of traditional video production methods.
Summarized by
Navi
[1]
10 Jun 2025•Technology
30 Jul 2025•Technology
21 May 2025•Technology