New AI Video Generator is Sora-level
TLDRA new AI video generator, 'cling', has been announced, rivaling Sora in realism and coherence. Capable of generating 2-minute, 1080p videos at 30 fps, it showcases impressive examples, from a parrot's feathers to a chef chopping onions, with minor flaws in linear object representation. Cling combines a diffusion model with a Transformer architecture, promising more consistent and context-aware video generation. However, access is currently limited to those in China with a Chinese phone number.
Takeaways
- 😲 A new video generator called 'cling' has been announced, which is claimed to be as realistic as Sora.
- 🎥 'Cling' can generate videos up to 2 minutes long at 30 frames per second in Full HD (1080p).
- 📹 The video generator produces high-quality results with realistic movements and details, such as a boy eating a cheeseburger and a man eating noodles.
- 🌟 Cling's technology includes a 3D spatial-temporal joint attention mechanism for better modeling of complex motions.
- 🤖 The tool uses a diffusion and Transformer architecture, combining the strengths of both for improved context understanding and video coherence.
- 🎨 It can simulate physical properties of the real world, generating videos that adhere to the laws of physics.
- 🔍 However, there are some flaws, such as inconsistencies in linear objects like highway signs and guitar strings.
- 📡 Cling also features 3D face and body reconstruction technology, allowing users to apply dance movements to a single photo.
- 🚫 Currently, 'cling' is not publicly available; interested users can join a waitlist but may require a Chinese phone number or location.
- 🌐 The company's website is in Chinese, and the app 'quing' is used to access the video generation feature.
- 📊 The video concludes with a comparison of 'cling' to other video generators like Sora and Google's vo, and a call for public opinion on the new technology.
Q & A
What is the new video generator being discussed in the script?
-The new video generator being discussed is called 'cling', which is announced to produce super realistic video results comparable to Sora.
What are the capabilities of the 'cling' video generator?
-Cling can generate up to 2 minutes of video at 30 frames per second and Full HD (1080p) resolution. It uses a 3D spatial-temporal joint attention mechanism and a diffusion Transformer architecture for better context understanding and realistic results.
How does the 'cling' video generator handle high movement or high action videos?
-Cling is capable of generating high movement or high action videos with a lot of freedom and possibilities, unlike some older video generators which are mostly limited to stationary panning or zooming.
What are some of the examples given to showcase the realism of 'cling'?
-Examples include a closeup of a bright blue parrot, a Chinese boy eating a cheeseburger, a man eating noodles with chopsticks, a boy riding a bike through changing seasons, and a giant panda playing guitar by a lake.
What are some of the flaws or inconsistencies found in the 'cling' video generator?
-Some flaws include issues with linear objects like highway signs and bars not appearing straight, slight inconsistencies with hand movements, and the night sky in one example not appearing realistic.
How does 'cling' compare to other video generators like Runway, Pabs, and Sora?
-Cling is said to be more advanced, producing more consistent and realistic results, especially with high movement and action. It also seems to understand real-world physics better than some of the older generators like Runway and Pabs.
What is the significance of the 'diffusion Transformer architecture' used in 'cling'?
-The diffusion Transformer architecture is significant as it combines the capabilities of a diffusion model with a Transformer model, which is used in large language models. This gives cling better context understanding and more consistent results.
How can one access the 'cling' video generator?
-As of the script's knowledge, 'cling' is not publicly available. Interested users can join a waitlist through a Chinese app called 'quing', but it requires a Chinese phone number or location.
What is the sponsor mentioned in the script and what do they offer?
-The sponsor mentioned is 'brilliant', which offers interactive lessons in math, data analysis, programming, and AI. It is designed to help users learn effectively with a hands-on problem-solving approach.
What are some of the creative examples generated by 'cling'?
-Creative examples include a macro lens volcano erupting in a coffee cup and a man and woman walking under a starry sky with stars moving in the background, showcasing cling's ability to create effects even without specific training data.
What is the main limitation for accessing 'cling' as mentioned in the script?
-The main limitation for accessing 'cling' is the requirement of a Chinese phone number or location, and the current unavailability to the public, as it is only accessible through joining a waitlist via the 'quing' app.
Outlines
🚀 Introduction to Cling: A New Video Generator Competitor
The script introduces 'Cling', a new video generator that rivals Sora in realism and coherence. It highlights the ability of Cling to generate up to 2 minutes of Full HD video at 30 frames per second. The narrator showcases various examples of Cling's output, emphasizing the realism in details such as a parrot's feathers, a boy eating a cheeseburger, and a man eating noodles with chopsticks. The script also notes Cling's capacity for generating high movement and action videos, which sets it apart from older generators like Runway and Pabs.
🔍 Analyzing Cling's Realism and Flaws in Generated Videos
This paragraph delves into the analysis of Cling's video outputs, pointing out the realism in various scenarios such as a panda playing guitar, a rabbit in a café, and an angelfish in an underwater habitat. It also addresses some flaws observed in linear elements like highway signs and guitar strings, suggesting that Cling, like other diffusion models, struggles with generating consistent linear structures. However, the overall impression is one of high-quality video generation that respects real-world physics and lighting.
🌟 Sponsor Highlight: Brilliant Interactive Learning Platform
The script transitions to a sponsored segment promoting 'Brilliant', an interactive learning platform offering lessons in math, data analysis, programming, and AI. The narrator discusses the platform's effectiveness, highlighting its first principles approach and hands-on problem-solving. The sponsor's offer includes a free 30-day trial and a 20% discount on an annual premium subscription for viewers.
🎥 Exploring Cling's Advanced Features and Limitations
The script returns to discussing Cling, focusing on its multi-resolution support and the seamless transitions it can create in long video generations. It acknowledges some inconsistencies in linear elements and leaves, particularly in longer videos. The narrator explains Cling's use of a 3D spatial-temporal joint attention mechanism for better motion modeling and adherence to physical laws, suggesting that it combines the strengths of diffusion and Transformer models for superior context understanding and video generation.
🚧 Accessing Cling and Its Future Prospects
The final paragraph addresses the current limitations in accessing Cling, noting that it is not yet publicly available but has a waitlist for those with a Chinese phone number or location. The narrator guides potential users on how to join the waitlist through the 'Quing' app and expresses frustration at the lack of public access to these advanced video generators. The script concludes with a teaser for future updates on Cling's availability and an invitation for viewer feedback in the comments section.
🔗 Additional Resources for AI Tools and Job Opportunities
In the closing remarks, the narrator mentions a website built for discovering AI tools and job opportunities in AI, machine learning, and data science, directing viewers to check out 'ai-search' for more information.
Mindmap
Keywords
💡Sora
💡Cling
💡Video Generator
💡Realism
💡Coherence
💡Diffusion Model
💡Transformer Architecture
💡3D Spatial Temporal Joint Attention Mechanism
💡Full HD
💡Aspect Ratio
💡Weight List
Highlights
A new video generator, 'cling', has been announced, competing with Sora in realism and coherence.
Cling can generate up to 2 minutes of Full HD video at 30 frames per second.
The video generator produces highly realistic results, comparable to Sora.
Examples include a closeup of a parrot, a boy eating a cheeseburger, and a man eating noodles with chopsticks.
Cling handles high movement and action in videos, unlike some older video generators.
The tool can generate videos with changing seasons, maintaining consistency and realism.
Cling uses a 3D spatial-temporal joint attention mechanism for better motion modeling.
The video generator is based on a diffusion and Transformer architecture, combining the strengths of both models.
Cling can simulate the physical properties of the real world, generating videos that conform to these properties.
With a single photo, cling can generate vivid singing and dancing gameplay.
The video generator is currently not available for public use but has a waitlist.
To join the waitlist, users need a Chinese phone number or location.
Cling's capabilities are showcased through a series of demo videos.
Some flaws are noted, such as inconsistencies with linear objects like highway signs.
The video generator understands and reflects real-world physics, such as light reflections.
Cling's realism is compared to Sora's, with some suggesting it may be superior.
The video concludes with a call to action for viewers to share their thoughts on the new video generator.