OpenAI Releases New AI Model That Generates 60-Second Videos from a Single Sentence!
From seamless conversations with humans and coding to passing Google engineer interviews, OpenAI’s generative AI has already demonstrated numerous capabilities. Now, they have mastered a new skill: video creation. Their latest AI model, “Sora,” allows users to generate realistic videos up to one minute long with just a single sentence.
“Introducing Sora, our text-to-video model. Sora can generate videos up to one minute long while ensuring visual quality and adhering to user prompts,” OpenAI stated on their official website.
The video generated by Sora is highly realistic, revealing OpenAI’s latest image generation technology.
AI-generated videos are not new, with tech giants like Google and Meta, as well as startups like Pika Labs, having previously released AI video generation techniques. However, Sora’s greatest feature lies in its exceptional realism.
According to Wired, this level of realism is unprecedented in other AI video generation models, and the videos generated by Sora are longer compared to other models.
As per OpenAI’s official website, Sora can generate complex scenes with multiple characters, specific action types, and intricate details. The AI not only understands various objects mentioned in the prompts but also knows how these objects exist in the real world, creating astonishingly realistic experiences.
Furthermore, Sora possesses a deep understanding of language, accurately depicting the content mentioned in the prompts and generating captivating characters. It can establish multiple different camera angles in a video while preserving the style of the characters and visuals.
OpenAI has also showcased numerous demonstration videos on their website. For example, one short film depicts a woman walking on the streets of Tokyo. The prompt for this video is as follows:
“A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”
Although this one-minute video contains some flaws, such as inconsistencies in sign texts, road layouts, and overly smooth movements of pedestrians, it remains highly realistic at first glance. If the focus is on the fashionable woman, one might not immediately realize that the entire video is generated by AI.
Not only does Sora excel in generating realistic modern videos, but it also adds a vintage filter to historical footage. For example, when the prompt is “Historical footage of California during the gold rush,” Sora applies a filter that gives the video a nostalgic feel. However, there are still some inconsistencies in elements like architectural layouts that can be noticed upon closer inspection.
OpenAI acknowledges that the current model has limitations in accurately simulating the physical principles of complex scenes and understanding causality. For instance, if Sora is asked to generate a video of a person eating a cookie, the video might show a bite taken out of the cookie, but the cookie would still appear intact. Additionally, Sora struggles with distinguishing left from right and accurately representing events that change over time.
As for the time required to generate such realistic videos, OpenAI has not provided specific details but has mentioned that it takes approximately the time it takes to “go out and have a meal.”
Sora also has some features that have not been publicly demonstrated yet, such as generating videos from images or filling missing frames in existing videos, or even extending content. OpenAI researcher Bill Peebles stated, “This is a really cool way to enhance storytelling abilities. You can visualize an idea and make it a reality.”
Currently, Sora cannot revolutionize the film industry as each generated content varies, making it impossible to string together 120 one-minute videos into a movie. However, for short video platforms like TikTok, it will be a disruptive new tool. Even ordinary people can utilize AI technology to generate high-quality short videos.
The general public still has to wait to use Sora! OpenAI is collaborating with various stakeholders to address security concerns.
However, with such realistic video generation capabilities, what happens if malicious individuals use it to create fake news? This is one of the reasons why OpenAI has not publicly released Sora. Currently, the model is only available for the red team, a simulated attack team, and a select few artists, designers, and filmmakers.
OpenAI emphasizes that they are currently developing tools to detect fake news and plan to embed metadata, such as the previous indication of using Dall-E in the generated image files. Additionally, OpenAI claims they will include usage guidelines similar to Dall-E 3, which refrains from generating images of celebrities and violent, sexual, or hateful content.
OpenAI states that they are collaborating with governments, educators, and artists worldwide to understand concerns and promote positive usage. “Just as we cannot predict all positive use cases, we cannot anticipate all malicious uses in advance,” they stated on their official website. “This is why we believe learning from real-world usage, building, and deploying safer AI systems is crucial.”
Sources:
OpenAI, Wired, The Verge