Meta introduced a foundation model capable of creating realistic-looking videos, rivaling OpenAI’s Sora and Google’s Veo in the emerging generative AI video competition. Two new models were revealed on Oct. 4:
- The 30B parameter Movie Gen Video.
- The 13B parameter Movie Gen Audio.
Both are based on Meta’s Llama 3 model. The tech giant expects to embed Movie Gen into Instagram in 2025.
What is the Movie Gen family of models?
The Movie Gen models are text-to-video or text-to-audio generative AI. Meta claims Movie Gen can create videos up to 16 seconds long. In comparison, OpenAI’s Sora, currently unavailable to the public, can generate one-minute videos with multiple scenes. Veo, which is available to select creators, can create videos about a minute long.
Movie Gen is controlled using natural language. This means users can describe the scene they want to see, including individual elements and the overall tone. They can also change video elements based on natural language text prompts, such as adding or deleting parts from a scene.
The personalization aspect was enabled by “post-training procedures,” Meta said. These procedures focused the AI such that it “maintains the identity of the person while following the text prompt.” This allows users to place themselves — or someone else — into a custom-made scene.
Meta’s product seems to be targeting primarily content creators in the initial reveal of the product. The goal is to “to help people express themselves in new ways and to provide opportunities to people who might not otherwise have them,” Meta stated in a blog post.
SEE: Digital transformation can sometimes seem like a random shot in the dark – but there are ways to help projects succeed.
Lights, action, and sound
Movie Gen Audio can create music or sound effects for videos “up to several minutes long,” according to Meta’s research paper. The music is generated at 48kHz and can either match the images seen on screen or serve as a soundtrack.
Meta points to Llama 3 to tackle security and deepfake concerns
For businesses, rapidly generating AI-created videos could significantly reduce the time required to produce both internal and external content. On the other hand, using AI-generated content, especially without attribution, can create confusion among audiences and reduce trust, evidenced by a recent report by the the Journal of Hospitality Marketing and Management.
Perhaps in an effort to address the trust concerns, Meta added a watermark to Video Gen’s images. A transparent “sparkle” graphic often used to indicate AI sits in the lower left corner of the videos.
Security and the use of generative AI to create disturbing, harmful, or misleading content are concerns — especially for business use cases where the reputation of the company could be at stake. In the announcement of Movie Gen, Meta linked to a September report on safeguarding its AI models, including the Llama 3 family. The report details how the model contains safeguards against inappropriate content, and that images will include both visible and invisible watermarks.