Summary
- ChatGPT Plus subscribers get 5-second Sora taste tests, not full 20-second clips.
- Sora’s videos remain hit-or-miss due to coherence issues and visualizing prompts.
- Sora’s video technology feels less mature than image generation, lagging behind Veo 2.
After months of teasing, OpenAI’s Sora video generation technology is available to the public. I spent some time playing with this much-anticipated tech, and honestly came away a little underwhelmed.
ChatGPT Plus Subscribers Get a 5-Second Taste Test of Sora
Like every other ChatGPT Plus subscriber, once Sora rolled out to the public, I got access to make my own videos. However, this is more like a taste tester than the real deal. People who specifically pay for Sora can make clips up to 20-seconds long and access the higher 1080p resolution. For Plus subscribers, you get 5-second clips, and up to 720p quality.
All you have to do is put your prompt into the text box, and a few seconds later you have a video clip, pretty much how Midjourney or other AI image generators work from the user’s perspective.
Even Short Clips Are Very Hit-and-Miss
One of the main reasons that the “full” Sora experience is limited to 20 seconds is that there are still significant issues with this technology when it comes to coherence. The longer the video goes on, the more mistakes and weird tangents it takes.
That issue aside, it had a hard time visualizing what I put in my prompts. For example, I asked it for a clip of a starship going into warp, which is a pretty common sci-fi trope.
Well, it’s sort of what I had in mind, but I wouldn’t put that in my half-baked YouTube talking head video.
At other times, it’s pretty spot-on. Such as when I asked for a spinning chrome HTG logo.
The last bit of trouble Sora currently has is with any sort of physics. I’ve seen plenty of videos featuring animals that just don’t move in a believable way, and when I asked for something simple—a ball bearing running on a rail, it gave me this strange video.
Even when videos are visually perfect, it’s usually the motion that gives it away as an AI-generated clip.
Sora Feels Much Less Mature Than Image Generation
I don’t want to create the impression that Sora isn’t impressive. It’s a major achievement, but actually using it feels like the early days of image generation. This wouldn’t be so apparent if not for Google’s precisely-timed announcement of Veo 2.
The videos from that system look so much better than Sora, particularly when it comes to the physics of moving objects looking correct.
Just check out this official compilation from Google.
While one might argue these are cherry-picked, a few YouTubers have had access to Veo 2, and the opinion seems to be that Veo 2 comes out on top by quite a margin.
For Now, It’s Just a Fun Toy
Getting to play around with Sora for a bit thanks to a subscription I already have was fun, but I certainly wouldn’t want to pay the $200 a month fee for this product in its current state. You’d be much better off simply subscribing to a stock video service.
Looking at what Google’s cooked up, and considering that there are other competitors in this space like HeyGen and Runway ML, I expect updates and improvements to be fast and frequent. If for no other reason than OpenAI being relentless in its improvement of ChatGPT.
I still see a medium-term future where AI video generation will be capable of so much more, and even allow for longer-form content to be generated with precise prompt adherence, and the ability to edit elements within a scene. However, that day is still likely a few years away, and for now it’s an interesting if impractical curiosity.
Source link