While the AI image generators that are built into chatbots might have been grabbing most of the attention recently, the dedicated AI imagery engine Midjourney has been quietly improving and evolving since its launch three years ago. Now, it also features a video model.
According to Midjourney, this is another step toward producing an AI tool that’s capable of producing a real-time, 3D world simulator. The V1 model has been released with that ultimate goal in mind, though it’s going to take a while to get there.
The AI video maker in Midjourney works a little differently than other generators. You start with an image—either AI-generated or one you already have—and Midjourney creates a five-second animation from it. These short clips can then be extended, four seconds at a time, and four times in total.
As usual with Midjourney, this content creation will cost you time (the Midjourney version of credits): A second of video is the same cost as an image generation, and Midjourney plans start at $10 a month and go up from there.
Creating videos in Midjourney
To create a video in Midjourney, you first need to create an image through the web interface. Enter your prompt in the box at the top, using the sliders button to the right to set some of the options, such as the aspect ratio. Be as precise as you can in your prompt (there are more tips here), then hit Enter (or click the send icon) to run it.
As usual, Midjourney presents you with several results from your prompt, together with options for building on them. Included in these are now four animation options for creating a video. Your first decision is whether to go with Auto (Midjourney chooses the motion that’s added) or Manual (you describe the motion you want).
Your second decision is whether to go with Low Motion (motion is limited) or High Motion (where everything in the frame moves, and glitches are more likely). Once you’ve made your pick, you can edit your prompt again (if you’ve picked Manual), and the video is created. As with images, you’ll see multiple variations presented.
Click on any of the generated videos, and you’ll see the same four animation options are here, only these are now for extending the video further—which you can do four times in total. You can mix up auto and manual sections, and low-motion and high-motion sections, to build up the clip you’re looking for.
You’ll find the options for downloading your video up above the prompt on the right: You can download the raw video or a version optimized for social media (which combats some of the compression that happens when you post videos to those platforms). You can start again by clicking on the original prompt, then making changes to it.
Midjourney is an impressive AI image generator, and its videos reach the same standard. I tried creating a sci-fi cityscape and a natural landscape animation, and the end results were mostly consistent and logical, while closely following the prompt instructions. Some of the typical quirks of AI-generated video are here, like weird physics, but even at this early stage, the V1 model is polished and capable.
You can see both the limitations and advantages of the Midjourney approach in these clips: Each four-second segment moves smoothly into the next, but you don’t get much time to do what you want to do in your video if you’re working in four-second bursts, and as the video progresses you do tend to lose some of the detail and richness that you get in your original image.
What do you think so far?
Comparing Midjourney to Sora and Gemini
If you’re paying OpenAI $20 or more a month for ChatGPT, then you also have access to Sora. Like Midjourney, Sora lets you start videos from an image (either AI-generated or otherwise), or with a fresh prompt.
I got Sora to build on the futuristic sci-fi city and animated landscape images I’d created in Midjourney, and got mixed results. The scene felt more engaging but there were more oddities in it, such as unnatural movements and glitchy backgrounds (especially with the animation, which got really weird).
You can use Sora to generate videos up to 20 seconds in length, but there’s less control over how a scene progresses than there is with Midjourney: You basically just enter your prompt and then take whatever you get back. For casual projects, at least, Midjourney feels like the more accessible tool, capable of more realistic results.
I also tried creating the same scene in Google’s Veo 2, via the Flow online app. Flow lets you base your videos on images, and extended scenes while maintaining consistency, like Midjourney (you don’t get the same features with Veo 2 in the Gemini app). Overall, I’d say this got me the results closest to what I was looking for, though there were still some inconsistencies and oddities.
You can see that the flying car does descend in a believable way through the cityscape, and the prompt instructions are followed closely. As for the animation, flying across a cartoon-ish landscape, the results from Google Flow and Veo 2 were the best of the bunch—though again you can see that you gradually lose some of the richness and detail present in the original image.
If your AI filmmaking ambitions are a bit more grand, Google’s tools might be the best fit, though again, there’s a cost: Video generation and access to Flow will set you back $20 or more a month. You can also pay $250 for the Google AI Ultra plan, which gets you extended access to the more advanced Veo 3 model, complete with sound (though Veo 3 can’t yet make videos based on a static image).
While this isn’t the biggest sample size, the quality of the Midjourney clips is clear to see, and the approach to video making is straightforward and intuitive. Google Veo 2 remains a better choice for overall quality, while for now Sora remains rather chaotic and unpredictable. You’re going to have to spend a lot more time with the OpenAI model to end up with passable results.