Say hello to a new wave of text-to-video AI apps. Yes, I know you're only just getting used to images, but do try and keep up. OpenAI's Sora came and went in a flash, KlingAI is blowing everyone's minds and RunwayML continues to delight with Gen-3. But now there's a new kid on the block. Hotshot.co is a new AI text-to-video generator, it's free and rather good.
The service is the brainchild of Aakash Sastry, former CEO of Itsme which was an avatar maker back last year. But time moves on, pivots are pivots and now he's turned his attention to wowing us with cool five to ten-second video clips created from thin air. I gave it the quick once over and it works well.
There's a basic plan that provides three free 5-second video generations a day, which is generous, even more so since you don't have to sign up to start creating. Hotshot is unique in this regard since almost every other AI app demands your identity and email address first.
The bad news is the fact that because of this free access, wait times are currently horrendous. My first attempt informed me that it would take one hour (yes really) to complete. In fact, wait times are considerably longer. My first video appeared after three hours, and the second will apparently be finished sometime next July. Just kidding, but you get what I mean.
Of course you can slash this wait time by upgrading to a paid plan, which starts at $29 a month for up to 100 generations. The paid plans also offer an extra range of aspect ratios including square and portrait, as well as early access to 'new features'. Note: RunwayML comes in at $15 a month with 4K resolution exports, so this could be a hard sell for Hotshot in the long run.
My first prompt was deliberately difficult - "an anthropomorphic polar bear dressed like a hip-hop star is sitting at a pavement cafe table in Paris drinking a coffee. He is sitting opposite a young lady wearing a business suit who is talking earnestly to him about something important."
The service then enhanced my prompt to - "An anthropomorphic polar bear in a flashy hip-hop outfit, including a large gold chain, is seated at a café table in Paris, sipping coffee and interacting with a young lady in a business suit. The polar bear looks contemplative, holding a coffee cup, while the young lady, gesturing animatedly, appears to be delivering a serious message. The backdrop features iconic Parisian architecture and a bustling, blurred street scene."
The results, when they eventually arrived, were fascinating, and prove just how far video has come since the early days. Although there were a number of glitches in the video, mainly a brown bear instead of polar, and the usual weirdness with hands and faces, the overall coherence and understanding of the subject matter was bang on target.
Similarly with my second generation of a lazy cat.
The implications of this level of quality from a second tier service are huge. Once better models arrive which can deliver more accurate detail, general AI video generation is destined to go ballistic, and in some ways it already is. Interestingly the company is offering an API, so we may see the Hotshot tech powering other applications soon.