Chinese Company Unveils SORA Competitor - "Vidu" AI Video Generator

AI Search
28 Apr 202411:37

TLDRA Chinese company named Shu has announced a new AI video generator, Vidu, which is positioned as a competitor to SORA. Vidu is built on a proprietary architecture called Universal Vision Transformer (UViT), which combines the strengths of diffusion and Transformer models to create more coherent and accurate video outputs. The company claims that Vidu can generate a 16-second, 180p video clip with a single click. While the results from Vidu's show reel appear to be impressive and more realistic than current competitors like Runway and Pika, the comparison with SORA's quality is not definitive as SORA is yet to be released. The core technology of UViT was reportedly proposed by Vidu's team before SORA's model architecture. Interested users can apply to use Vidu through their website, shanguai.com. The emergence of Vidu and other recent advancements from China highlight the global competition in the AI space, with countries like China making significant strides.

Takeaways

  • 🎉 A Chinese company named Shu has announced a new AI video generator called Vidu, which they claim is a competitor to SORA.
  • 🚀 Vidu can generate a 16-second 180p video clip with a single click and is built on a self-developed architecture called Universal Vision Transformer (Uvit).
  • 🤖 The architecture combines two AI models: diffusion and Transformer, which is considered an advancement in generative AI.
  • 📈 Vidu's core technology was first proposed by its research team in September 2022, predating Sora's model architecture.
  • 👀 The Transformer model is known for its ability to understand context, which should theoretically improve the coherence of generated videos.
  • 🆚 A side-by-side comparison with SORA shows that while Vidu produces high-quality videos, there are some inconsistencies and lower resolution compared to SORA.
  • 🌟 Vidu's video generation capabilities are showcased in its show reel, which includes realistic hand movements and detailed imagery.
  • 📹 Vidu's videos, while impressive, do have some noticeable flaws, such as inconsistent transformations and inaccuracies in object representation.
  • 📝 The article from Global Times suggests that Vidu can output 1080p videos, which were not fully showcased in the provided examples.
  • 🌐 Interested users can apply to use Vidu through the website shanguai.com, where they can leave their contact information for further assistance.
  • 📈 There has been a surge of AI advancements from China recently, with new language models and robots being unveiled, indicating a competitive edge in the global AI race.
  • 💬 The video concludes by encouraging viewers to share their thoughts on Vidu and whether they plan to apply for access, fostering a sense of community and engagement.

Q & A

  • What is the name of the Chinese company that announced the AI video generator?

    -The Chinese company that announced the AI video generator is called Shu.

  • What is the name of the AI video generator Shu announced?

    -The name of the AI video generator is 'Vidu'.

  • What is the Universal Vision Transformer (Uvit)?

    -The Universal Vision Transformer (Uvit) is a self-developed visual transformation model architecture that integrates two text video AI models of diffusion and Transformer, aiming to be the next step in generative AI.

  • How does the Vidu AI video generator compare to Sora in terms of video quality?

    -While Vidu generates high-quality videos and has some advantages such as generating hands well, it is not yet on par with Sora in terms of video quality and consistency, as shown in the side-by-side comparisons.

  • What are some limitations of the stable diffusion model?

    -Some limitations of the stable diffusion model include its inability to generate text very well and its difficulty in understanding context or following more complicated prompts.

  • How does the Transformer model contribute to the improvement of the diffusion model?

    -The Transformer model, known for its ability to understand context, is merged with the diffusion model to create more coherent and accurate videos or images.

  • What is the significance of the merger between the Transformer model and the diffusion model?

    -The merger is significant because it is considered the next step in generative AI, potentially overcoming the limitations of the diffusion model alone.

  • What is the role of the Institute of AI at Chingua University in the development of Uvit?

    -The Institute of AI at Chingua University, led by Ju Jun, the vice dean and chief scientist at Shangu, played a role in the research and development of Uvit.

  • How can one apply to use the Vidu AI video generator?

    -To apply to use the Vidu AI video generator, one can visit the website shanguai.com, fill out the application form with their name, phone number, company name, and expect to be contacted by a marketing consultant.

  • What are some other recent advancements in AI from China?

    -Other recent advancements from China include the launch of Since Nova 5.0 by the Chinese company Since Time, which claims to outperform GPT on nearly all benchmarks, and the unveiling of the S1 robot by the company ASOT.

  • How does the global AI community view the competition between Vidu and Sora?

    -The global AI community generally views competition positively as it drives innovation and improvement in AI technology. The emergence of Vidu as a potential close competitor to Sora is seen as a positive development.

  • What is the current resolution capability of the Vidu AI video generator?

    -The Vidu AI video generator is capable of outputting videos in 1080p resolution, although the examples shown in the script were in 720p.

Outlines

00:00

🚀 Introduction to Shu's AI Video Generator

The video introduces a new AI video generator developed by a Chinese company called Shu. The generator, named VD, is claimed to be a competitor to the AI tool Sora. The video showcases a show reel from Shu and discusses the capabilities of VD, which is built on a self-developed architecture called Universal Vision Transformer (UViT). UViT combines the strengths of the diffusion model and the Transformer model, which is known for its context understanding capabilities. The video also compares VD with Sora, highlighting that while VD has some advantages, it is not yet on par with Sora, which is yet to be released.

05:01

📊 Comparative Analysis of VD and Sora

The second paragraph provides a side-by-side comparison of VD and Sora's video generation capabilities. It discusses the quality and realism of the videos produced by both AI tools. The video points out some flaws in VD's output, such as inconsistencies in hair transformation and the disappearance of elements like a green leaf. It also contrasts VD's generated videos with Sora's, noting that while VD's videos are not in full HD resolution, the details are less crisp compared to Sora's. The paragraph also mentions the accessibility of VD through the website shanguai.com and the process of applying for its use.

10:03

🌏 Global AI Competition and Recent Chinese Innovations

The final paragraph of the script shifts the focus to the broader context of global AI competition. It emphasizes the recent advancements in AI from China, including a new language model and a fast S1 robot by a company called ASOT. The speaker expresses excitement about the unveiling of VD and the potential for it to be a close competitor to Sora. The paragraph also encourages viewers to share their thoughts on VD and whether they will apply for access. It concludes with a call to action for viewers to like, share, subscribe, and stay tuned for more content.

Mindmap

Keywords

💡AI Video Generator

An AI video generator is a type of software that uses artificial intelligence to automatically create videos based on certain inputs or prompts. In the context of the video, the AI video generator named 'Vidu' is presented as a competitor to 'SORA', claiming to produce high-quality video clips with a single click.

💡Universal Vision Transformer (Uvit)

Uvit refers to a self-developed visual transformation model architecture that integrates two AI models: diffusion and transformer. It is central to the Vidu AI video generator's technology, allowing it to generate more coherent and accurate videos by combining the strengths of both models.

💡Diffusion Model

The diffusion model is a type of generative model used in AI for creating new data samples similar to the training data. It is one of the components merged with the transformer model in the Vidu's architecture, aiming to overcome limitations of previous models in generating text and understanding context.

💡Transformer Model

The transformer model is a machine learning architecture that is particularly good at understanding context and generating text. It is based on the paper 'Attention Is All You Need' by Google's DeepMind. In the video, it is highlighted as a key component in Vidu's technology that helps to improve the coherence of generated videos.

💡Stable Diffusion

Stable Diffusion is an AI model that has been mentioned as having limitations, such as not being able to generate text very well or follow complicated prompts. It is based on the diffusion model and is used as a comparative reference to show how Vidu's technology aims to be a step ahead.

💡Shangu AI

Shangu AI is the company behind the development of the Vidu AI video generator. They are presented as a Chinese company that has unveiled a new technology in the field of generative AI, aiming to compete with other established models like SORA.

💡Generative AI

Generative AI refers to the branch of artificial intelligence that involves creating new content, such as images, videos, or text, that did not exist before. The video discusses how merging the diffusion and transformer models represents a next step in generative AI.

💡Runway and Pika

Runway and Pika are mentioned as the two best video generators currently available. They are used as a benchmark to compare the capabilities of the new Vidu AI video generator, with the suggestion that Vidu may outperform these existing solutions.

💡WeChat

WeChat is a Chinese multi-purpose messaging, social media, and mobile payment app. In the video, it is mentioned as the source of the Vidu showreel, indicating the platform's role in sharing and promoting the new technology.

💡Resolution

Resolution in the context of the video refers to the quality of the video output. The video discusses the difference in resolution between Vidu's and SORA's video clips, noting that the lower resolution of Vidu's example might affect the perceived quality.

💡Competition in AI

The video emphasizes the importance of competition in the field of AI, suggesting that the unveiling of Vidu as a competitor to SORA is a positive development. It highlights the potential for innovation and improvement driven by companies vying to outperform each other.

Highlights

Shu, a Chinese company, has announced a new AI video generator called Vidu, which is a competitor to SORA.

Vidu claims to be on par with Sora and can generate a 16-second 180p video clip with a single click.

The technology behind Vidu is based on a self-developed visual transformation model architecture called Universal Vision Transformer (Uvit).

Uvit merges the diffusion and Transformer models, which is considered the next step in generative AI.

The Transformer model is known for its ability to understand context, which should make the generated content more coherent.

Ju Jun, Vice Dean of The Institute of AI at Chingua University, claims that Vidu's core technology was proposed before Sora's model architecture.

Vidu's show reel demonstrates the AI's ability to generate realistic videos, including detailed elements like hands.

Comparisons between Vidu and Sora's videos show that while Vidu is impressive, Sora's results appear higher quality and more realistic.

Vidu's video generation has some noticeable flaws, such as inconsistencies in elements like hair and leaves.

The resolution of Vidu's videos is lower than Sora's, with the Vidu show reel only in 720p.

Vidu can output 1080p videos, as mentioned in the Global Times article.

To apply for access to use Vidu, one can fill out a form on the website shanguai.com.

The application process does not specify eligibility and requires basic contact information.

China has been making significant strides in the AI space with recent advancements in language models and robotics.

The unveiling of Vidu adds competition to the AI video generation market, which is beneficial for innovation.

The presenter expresses enthusiasm for competition and the potential of Vidu as a close competitor to Sora.

The presenter encourages viewers to share their thoughts on Vidu and whether they will apply for access.