Runway, the leading artificial intelligence video platform has released a new "turbo" version of its Gen-3 model. This allows for the rapid creation of video from images.
Gen-3 was released earlier this month, initially as a text-to-video model but it soon added image-to-video and now, with Turbo, it made things faster.
To use Turbo simply select it from the list in the video creation tool, add any image (I've found Midjourney works well here) and an optional text prompt to describe the camera movement and character motion.
In my testing, Turbo was able to go from an initial prompt to a fully rendered ten-second video in just 15 seconds without any noticeable change in quality. We have near real-time AI video.
I've come up with five prompts for Midjourney to create a variety of scenes that will make the starting image for the video, and used the Runway prompt guide to come up with an associated text prompt.
All clips are ten seconds. Videos can be five or ten seconds long but there is no way to extend beyond that initial generation. You could screenshot the last frame of the generated video and use that as input for a new 10-second clip but I've kept it simple here.
Runway says there are more improvements to the model, control mechanisms and possibilities for real-time interactivity to come as a result of the new Turbo model.
Midjourney prompt: "A massive, gnarled ancient oak tree standing alone in a misty meadow at dawn, with its twisted roots exposed and branches reaching out like arms."
Motion prompt for Runway "The camera starts at the base of the ancient oak, slowly spiraling upwards to reveal the full height of the tree against the backdrop of a misty dawn. The focus is on the intricate details of the bark, roots, and branches as the sun begins to rise."
The prompt should test the ability of both Runway and Midjourney to handle complex textures and more gradual camera movement. I think it handled both well.
Midjourney prompt: "A vibrant village market bustling with activity, featuring vendors selling colorful fruits, vegetables, and flowers, with people of all ages interacting under a bright, sunny sky."
Runway motion: "The camera moves through the lively village market, capturing the energetic interactions of people bartering and laughing. The focus shifts between vendors displaying their goods and customers browsing, emphasizing the market's vibrant atmosphere."
Here we are seeing if the AI can handle dynamic, human-centric scenes with lots of movement and interaction. It also needs to maintain the look of the image through the scene.
Midjourney image: "A young woman recording a vlog in a cozy, well-lit room filled with plants, books, and soft decor, with a ring light and camera set up in front of her."
Runway motion: "The camera follows the influencer as she moves around her cozy room, adjusting the lighting and camera, then begins recording her vlog. The shot focuses on her facial expressions and the warm, inviting atmosphere of the space."
With this prompt, we're testing whether AI can simulate human expression and whether Runway can handle hand motions. It was good but not perfect, with an element of unreality.
Midjourney prompt: "A scenic train ride through a mountainous landscape during the golden hour, with passengers gazing out the window at the breathtaking view."
Runway motion: "The camera starts inside the train, focusing on passengers looking out the window with golden hour lighting, then shifts to a view outside, capturing the beautiful mountainous scenery as the train glides through the landscape during the golden hour."
This test requires Runway to create a transition between one-shot, inspired by the image, and the next while trying to maintain the same feel. It was close but I think would have been better it was a text-to-video prompt without the starting image.
Midjourney image: "A vibrant outdoor music festival at dusk, with a large crowd of people dancing, colorful lights illuminating the stage, and a band performing energetically."
Runway motion: "The camera sweeps over the energetic crowd at a music festival, capturing the lively dancing and flashing stage lights as the band performs. The focus moves from the stage to the crowd, highlighting the collective excitement and energy of the event."
Finally, we want to see how Runway handles a complex, high-energy scene that includes multiple points of motion. It did a great job although the dancers were all a little similar.
Being able to rapidly generate a video is a significant change for Runway. It also points to a potential future higher resolution mode where you can upscale failed generations.
AI video has come a very long way in just a year. We've reached a point where we can make a short film made up of ten-second clips and have it look almost real. Every new generation improves the image and motion realism.
Turbo simply makes this whole process faster, allowing for rapid iterations -- which is useful as the ratio of usable to unusable clips is still about 5:1.