New AI Video Generator is Sora-level

AI Search
6 Jun 202420:23

TLDRA new AI video generator, 'cling', has been announced, rivaling Sora in realism and coherence. Capable of generating 2-minute, 1080p videos at 30 fps, it showcases impressive examples, from a parrot's feathers to a chef chopping onions, with minor flaws in linear object representation. Cling combines a diffusion model with a Transformer architecture, promising more consistent and context-aware video generation. However, access is currently limited to those in China with a Chinese phone number.

Takeaways

  • 😲 A new video generator called 'cling' has been announced, which is claimed to be as realistic as Sora.
  • 🎥 'Cling' can generate videos up to 2 minutes long at 30 frames per second in Full HD (1080p).
  • 📹 The video generator produces high-quality results with realistic movements and details, such as a boy eating a cheeseburger and a man eating noodles.
  • 🌟 Cling's technology includes a 3D spatial-temporal joint attention mechanism for better modeling of complex motions.
  • 🤖 The tool uses a diffusion and Transformer architecture, combining the strengths of both for improved context understanding and video coherence.
  • 🎨 It can simulate physical properties of the real world, generating videos that adhere to the laws of physics.
  • 🔍 However, there are some flaws, such as inconsistencies in linear objects like highway signs and guitar strings.
  • 📡 Cling also features 3D face and body reconstruction technology, allowing users to apply dance movements to a single photo.
  • 🚫 Currently, 'cling' is not publicly available; interested users can join a waitlist but may require a Chinese phone number or location.
  • 🌐 The company's website is in Chinese, and the app 'quing' is used to access the video generation feature.
  • 📊 The video concludes with a comparison of 'cling' to other video generators like Sora and Google's vo, and a call for public opinion on the new technology.

Q & A

  • What is the new video generator being discussed in the script?

    -The new video generator being discussed is called 'cling', which is announced to produce super realistic video results comparable to Sora.

  • What are the capabilities of the 'cling' video generator?

    -Cling can generate up to 2 minutes of video at 30 frames per second and Full HD (1080p) resolution. It uses a 3D spatial-temporal joint attention mechanism and a diffusion Transformer architecture for better context understanding and realistic results.

  • How does the 'cling' video generator handle high movement or high action videos?

    -Cling is capable of generating high movement or high action videos with a lot of freedom and possibilities, unlike some older video generators which are mostly limited to stationary panning or zooming.

  • What are some of the examples given to showcase the realism of 'cling'?

    -Examples include a closeup of a bright blue parrot, a Chinese boy eating a cheeseburger, a man eating noodles with chopsticks, a boy riding a bike through changing seasons, and a giant panda playing guitar by a lake.

  • What are some of the flaws or inconsistencies found in the 'cling' video generator?

    -Some flaws include issues with linear objects like highway signs and bars not appearing straight, slight inconsistencies with hand movements, and the night sky in one example not appearing realistic.

  • How does 'cling' compare to other video generators like Runway, Pabs, and Sora?

    -Cling is said to be more advanced, producing more consistent and realistic results, especially with high movement and action. It also seems to understand real-world physics better than some of the older generators like Runway and Pabs.

  • What is the significance of the 'diffusion Transformer architecture' used in 'cling'?

    -The diffusion Transformer architecture is significant as it combines the capabilities of a diffusion model with a Transformer model, which is used in large language models. This gives cling better context understanding and more consistent results.

  • How can one access the 'cling' video generator?

    -As of the script's knowledge, 'cling' is not publicly available. Interested users can join a waitlist through a Chinese app called 'quing', but it requires a Chinese phone number or location.

  • What is the sponsor mentioned in the script and what do they offer?

    -The sponsor mentioned is 'brilliant', which offers interactive lessons in math, data analysis, programming, and AI. It is designed to help users learn effectively with a hands-on problem-solving approach.

  • What are some of the creative examples generated by 'cling'?

    -Creative examples include a macro lens volcano erupting in a coffee cup and a man and woman walking under a starry sky with stars moving in the background, showcasing cling's ability to create effects even without specific training data.

  • What is the main limitation for accessing 'cling' as mentioned in the script?

    -The main limitation for accessing 'cling' is the requirement of a Chinese phone number or location, and the current unavailability to the public, as it is only accessible through joining a waitlist via the 'quing' app.

Outlines

00:00

🚀 Introduction to Cling: A New Video Generator Competitor

The script introduces 'Cling', a new video generator that rivals Sora in realism and coherence. It highlights the ability of Cling to generate up to 2 minutes of Full HD video at 30 frames per second. The narrator showcases various examples of Cling's output, emphasizing the realism in details such as a parrot's feathers, a boy eating a cheeseburger, and a man eating noodles with chopsticks. The script also notes Cling's capacity for generating high movement and action videos, which sets it apart from older generators like Runway and Pabs.

05:02

🔍 Analyzing Cling's Realism and Flaws in Generated Videos

This paragraph delves into the analysis of Cling's video outputs, pointing out the realism in various scenarios such as a panda playing guitar, a rabbit in a café, and an angelfish in an underwater habitat. It also addresses some flaws observed in linear elements like highway signs and guitar strings, suggesting that Cling, like other diffusion models, struggles with generating consistent linear structures. However, the overall impression is one of high-quality video generation that respects real-world physics and lighting.

10:03

🌟 Sponsor Highlight: Brilliant Interactive Learning Platform

The script transitions to a sponsored segment promoting 'Brilliant', an interactive learning platform offering lessons in math, data analysis, programming, and AI. The narrator discusses the platform's effectiveness, highlighting its first principles approach and hands-on problem-solving. The sponsor's offer includes a free 30-day trial and a 20% discount on an annual premium subscription for viewers.

15:05

🎥 Exploring Cling's Advanced Features and Limitations

The script returns to discussing Cling, focusing on its multi-resolution support and the seamless transitions it can create in long video generations. It acknowledges some inconsistencies in linear elements and leaves, particularly in longer videos. The narrator explains Cling's use of a 3D spatial-temporal joint attention mechanism for better motion modeling and adherence to physical laws, suggesting that it combines the strengths of diffusion and Transformer models for superior context understanding and video generation.

20:06

🚧 Accessing Cling and Its Future Prospects

The final paragraph addresses the current limitations in accessing Cling, noting that it is not yet publicly available but has a waitlist for those with a Chinese phone number or location. The narrator guides potential users on how to join the waitlist through the 'Quing' app and expresses frustration at the lack of public access to these advanced video generators. The script concludes with a teaser for future updates on Cling's availability and an invitation for viewer feedback in the comments section.

🔗 Additional Resources for AI Tools and Job Opportunities

In the closing remarks, the narrator mentions a website built for discovering AI tools and job opportunities in AI, machine learning, and data science, directing viewers to check out 'ai-search' for more information.

Mindmap

Keywords

💡Sora

Sora is a reference to a highly realistic video generator that was previously featured on the channel. It is used as a benchmark for comparison in the video script to illustrate the advanced capabilities of the new video generator 'cling'. For instance, the script mentions that 'cling' produces results that are 'really comparable to Sora', indicating a high level of realism and coherence in the generated videos.

💡Cling

Cling is the name of the new video generator announced in the video script, which is claimed to produce super realistic results. It is positioned as a competitor to Sora and is capable of generating videos up to 2 minutes long at 30 frames per second in Full HD. The script provides examples of videos generated by Cling, emphasizing the realism and detail in scenarios like a closeup of a parrot's feathers or a boy eating a cheeseburger.

💡Video Generator

A video generator is a software or system capable of creating video content based on given prompts or scripts. In the context of the video, the term is used to describe advanced AI technologies like Cling, Sora, and Google's Vo, which can generate highly realistic and coherent video content. The script discusses the capabilities and output quality of these video generators, such as the ability to generate videos with high movement and action.

💡Realism

Realism, in the context of the video script, refers to the lifelike quality and authenticity of the video content generated by AI systems. The script frequently comments on the realism of the videos produced by Cling, noting the accurate depiction of details such as light reflections, fur texture, and the physical movements of subjects in the videos.

💡Coherence

Coherence in the video script denotes the logical and consistent nature of the video content generated by the AI. It is mentioned in relation to how well the generated videos follow the prompts and maintain a believable sequence of events or scenes. For example, the script praises Cling for producing coherent videos where elements like the changing seasons or the movement of objects are consistently portrayed.

💡Diffusion Model

A diffusion model is a type of AI algorithm used in generating images and videos. The script mentions that Cling, along with other video generators like Sora and Google's Vo, likely uses a diffusion model combined with a Transformer architecture. This combination is said to enhance the context understanding and result consistency of the generated content, making it more realistic and coherent.

💡Transformer Architecture

The Transformer architecture is a model used in AI, particularly in large language models, that is known for its ability to handle long-range dependencies and context. In the script, it is mentioned that Cling uses a diffusion model combined with a Transformer architecture, which contributes to its advanced capabilities in generating videos with better context understanding and fewer inconsistencies.

💡3D Spatial Temporal Joint Attention Mechanism

This term from the script refers to a specific AI technique used by Cling to better model complex spatial-temporal motion in video generation. It suggests that Cling can generate video content that adheres to the laws of physics and captures the dynamics of movement more effectively than other video generators.

💡Full HD

Full HD, or 1080p, is a term used in the script to describe the resolution at which Cling can generate videos. It indicates that the video generator is capable of producing high-quality, detailed video content with a resolution of 1920x1080 pixels, which is considered standard for high-definition video.

💡Aspect Ratio

Aspect ratio in the video script refers to the proportional relationship between the width and height of the video frame. The script mentions that Cling can generate videos in various aspect ratios such as horizontal, square, or vertical, providing flexibility in the output format to suit different viewing preferences or requirements.

💡Weight List

In the context of the video script, a weight list or waitlist is a queue of people interested in trying out or accessing a new product or service, in this case, the Cling video generator. The script explains that interested users can join the waitlist through the Cling app, but access is currently limited to those with a Chinese phone number or location.

Highlights

A new video generator, 'cling', has been announced, competing with Sora in realism and coherence.

Cling can generate up to 2 minutes of Full HD video at 30 frames per second.

The video generator produces highly realistic results, comparable to Sora.

Examples include a closeup of a parrot, a boy eating a cheeseburger, and a man eating noodles with chopsticks.

Cling handles high movement and action in videos, unlike some older video generators.

The tool can generate videos with changing seasons, maintaining consistency and realism.

Cling uses a 3D spatial-temporal joint attention mechanism for better motion modeling.

The video generator is based on a diffusion and Transformer architecture, combining the strengths of both models.

Cling can simulate the physical properties of the real world, generating videos that conform to these properties.

With a single photo, cling can generate vivid singing and dancing gameplay.

The video generator is currently not available for public use but has a waitlist.

To join the waitlist, users need a Chinese phone number or location.

Cling's capabilities are showcased through a series of demo videos.

Some flaws are noted, such as inconsistencies with linear objects like highway signs.

The video generator understands and reflects real-world physics, such as light reflections.

Cling's realism is compared to Sora's, with some suggesting it may be superior.

The video concludes with a call to action for viewers to share their thoughts on the new video generator.