OpenAI Surges Ahead with New Video Generation Model

The swift evolution of AI innovations persists without pause. OpenAI has just unveiled a tech tease for one of the most anticipated advancements in the AI domain for this year.

Introducing Sora, a new generative video model developed by OpenAI. Sora possesses the capability to craft lifelike and imaginative scenes based on textual prompts. With the ability to produce videos up to a minute in length, Sora ensures both visual fidelity and alignment with the user's input. But, surely, Sora exceeds the boundaries of being merely a text-to-video model.

We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

TABLE OF CONTENTS

What does it look like?

OpenAI has provided a selection of video samples for examination. Let's take a moment to explore these visuals. The close-up video featuring a man exhibits a variety of facial expressions, as outlined in the prompt. Renowned for its exceptional quality, the video provides a detailed perspective. It notably surpasses competitors like Runaway, Google, and Meta in video generation. These rivals can only deliver shorter, lower-quality videos by comparison.

PROMPT: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

This is what OpenAI says and here's another remarkable video sample from OpenAI's Sora to prove that, boasting high-definition visuals and intricate detail. Upon closer examination, it becomes apparent that the moving cars obscured by trees do not reappear accurately, despite OpenAI's assertion that Sora effectively manages occlusions.

PROMPT: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

Once again, let's examine another selectively chosen video showcasing the model's remarkable capability to comprehend the physical world.

PROMPT: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

Weaknesses?

OpenAI acknowledges that the current model possesses certain weaknesses, suggesting that it could be some time before it becomes readily available for use.

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

The release date?

It might take a while before we gain further insight. OpenAI's unveiling of Sora today is a technological preview, and the company states that there are no immediate intentions to make it publicly available. Instead, OpenAI will commence sharing the model with third-party safety evaluators for the first time today.

Today, Sora is becoming available to red teamers to assess critical areas for harms or risks. We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals. We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.