Is Vidu 1.5 the Breakthrough Generative AI Needed to Conquer Hollywood?
Nov. 14 2024, Published 1:28 a.m. ET
From digital twins to game design development, generative AIs have made some promising progress in the media and entertainment industry, but we’re seeing a turning point. And what Shengshu is bringing to the table just might upend or empower filmmakers and Hollywood, depending on which side you’re on.
The film industry has dabbled or at least attempted to use generative video. But there are some caveats. When you generate a video, the output is often not really what you expected. It’s jumpy and obviously looks like AI.
A sneak peek into OpenAI’s Sora was an indication of just how tricky it was to deliver visual consistency. It’s not uncommon for the sizes and appearances of objects to change throughout a generated video. So, in reality, post-production on AI generated footage is a reality. And AI-generated videos haven’t quite been able replicate the natural look and feel of light interacting with a generated character
Vidu 1.5 is a major update for Shengshu’s video generator, and this time it’s all about putting more control in the user’s hand. Now camera angles, character actions, and subtle expressions don’t result in jarring videos. The final footage from Vidu 1.5 looks and feels like any old video from start to end, and reveals fewer abrupt jumps or unwanted transitions typical of generative video these days. But it’s like you’re in the director’s seat directing live actors, but all this can be done with text inputs instead of expensive camera equipment.
In one example, Vidu 1.5’s new Multiple-Entity Consistency feature which is the first of its kind is able to merge images together. And these could be images that are completely irrelevant to each other. Take for instance a profile shot of Elon Musk, a second image of a rose printed shirt, and a third of a moped. Uploading these three images end up resulting in a cohesive video of Elon dressed in a rose printed button down shirt, enjoying his moped ride.
Or, in another scenario with its Multiple-Angle Consistency function, three images can be uploaded of a single subject - take for examplea model - but from differing angles. The combined result? Vidu 1.5 is scarily accurate in being able to predict what the model might look like from any angle. Normally the intricate details of a dress would easily throw off an AI. Vidu 1.5 on the other hand generates a video in which the model is walking, but she could even be turning around in 360 degrees without breaking visual continuity, along with minimal distortion to the natural flow of the dress and facial expressions.
Want OK! each day? Sign up here!
You can even direct the cinematography with text inputs to zoom, pan, tilt or rotate the generated footage, and also now output high resolution footage in 720p or 1080p.
And Vidu this time with its 1.5 update also adds additional support for animators or 2D artists, with Expanded Animation Styles and special effects – great for artists specializing in the Japanese fantasy and hyper-realistic anime genres. Vidu 1.5 offers enhanced flame visuals and dynamic lighting which can more accurately depict light versus shadows.
Building upon an already powerful text-to-image video multimodal model, Vidu’s new upgrade is all about helping anyone. The idea isn’t to replace human creativity with AI tools, but rather blending technical precision with creative potential, eventually making high-quality video production more accessible to the masses.
In fact, the new upgrade, the firm claims, can reduce the need for manual editing significantly. Ultimately it solves a major persisting problem with generative videos – the lack of a consistent look and feel of videos. This could mean time-strapped creatives will now have more time for concept ideation instead of endlessly spending time on post-production. Dare we say, Vidu’s users can practically create Hollywood-style clips without a Hollywood-size budget.