Today Generative AI is Just a Parlor Trick

Mike Zyda
3 Mar 202412:31

TLDRIn his presentation, Mike Zider discusses the current state of generative AI, describing it as a 'parlor trick' due to its limited capabilities in creating short, non-interactive videos. He highlights the potential of AI in the entertainment industry but emphasizes the need for significant development in areas such as performance, interface, art style control, and scalability. Zider calls for investment in tool-building to integrate generative AI with game development and filmmaking, aiming for more than just visually appealing but practically useful applications.

Takeaways

  • 🎨 Generative AI is currently seen as a novelty, often referred to as a 'parlor trick', with its main use being the creation of artistic images and short, photorealistic videos.
  • 🖼️ The technology behind generative AI, such as DALL·E and Google's Gemini, allows users to input text descriptions and receive computed images that match their requests.
  • 📉 The term 'dead cat bounce', used humorously in the script, is actually a stock market term for a temporary recovery in share prices, not related to the AI's output.
  • 🚀 OpenAI's DALL·E 2 has recently expanded its capabilities to generate short videos, though they are limited to 30 seconds or less and require detailed textual descriptions.
  • 🎥 The speaker challenges the capabilities of generative AI by suggesting it should be able to create a complete Pixar film like 'Toy Story' to truly impress.
  • 📝 The speaker desires improvements in generative AI, such as faster computation times, better interfaces, art style control, and the ability to interact with generated content.
  • 🔍 There is a need for assurance that the art styles generated do not infringe on copyrights or trademarks to avoid legal issues.
  • 🎨 The speaker envisions a future where generative AI can consistently apply a specific art style to a series of images based on a provided example.
  • 🌐 The potential scalability of large language models is questioned, with the speaker suggesting a test would be to generate a full-length film like 'Toy Story'.
  • 🔧 The speaker calls for significant investment in tool-building to make generative AI more useful for interactive games and filmed entertainment industries.

Q & A

  • What is the main argument Mike Zida presents about generative AI in his presentation?

    -Mike Zida argues that while generative AI is impressive and trendy for its ability to create images and short videos, it is currently more of a parlor trick than a practical tool, especially for interactive experiences and film entertainment.

  • What is the significance of the term 'dead cat bounce' in the context of the presentation?

    -In the presentation, 'dead cat bounce' is used as an example to illustrate the capabilities of generative AI models like DALL·E and Midjourney. It also serves as a metaphor for the temporary excitement around generative AI without substantial practical applications.

  • How does Mike Zida describe the current limitations of generative AI in creating content?

    -Mike Zida points out that generative AI can create short, 17-second videos and images based on text descriptions, but it struggles with creating longer, more detailed, and interactive content, such as a full-length Pixar film.

  • What are some improvements Mike Zida suggests for generative AI to be more useful in the entertainment industry?

    -Mike Zida suggests improvements such as faster computation times, better interfaces beyond text input, art style control, assurance of non-copyrighted styles, and the ability to interact with generated 3D objects and scenes.

  • What is the importance of scalability in the context of generative AI, according to Mike Zida?

    -Scalability is important because it would allow generative AI to handle more complex tasks, such as generating a full-length film like Toy Story. Mike Zida suggests that a test of scalability would be to see if a large language model could recreate the film in a reasonable amount of compute time.

  • How does Mike Zida envision the future integration of generative AI with game development and filmmaking?

    -Mike Zida envisions a future where generative AI researchers work closely with game developers and filmmakers to create tools that can produce useful content for interactive games and feature films, rather than just short, non-interactive videos.

  • What is the role of AI in user interfaces according to Mike Zida's perspective?

    -Mike Zida believes that AI should be integrated into user interfaces in a way that it can understand the expertise level of the user and provide appropriate guidance, teaching, or get out of the way as needed.

  • What are the challenges Mike Zida identifies in the development of generative AI tools?

    -The challenges include the need for significant investment in tool building, the complexity of current tools like Autodesk Maya, and the reluctance of the gaming and film industries to fund the hard work required to build such development environments.

  • How does Mike Zida propose to address the issue of art style consistency in generative AI content creation?

    -Mike Zida suggests that users should be able to provide a digital version of a concept artist's work to the generative AI, instructing it to create subsequent images in the style of the provided artwork, ensuring consistency in artistic look.

  • What is the significance of the Oxford Languages webpage reference in the presentation?

    -The reference to the Oxford Languages webpage is used to highlight the discrepancy between the generative AI's interpretation of 'dead cat bounce' and its actual meaning, emphasizing the limitations of AI in understanding context and nuances.

Outlines

00:00

🎤 Introduction to Generative AI

Mike Zida introduces the topic of generative AI, highlighting its current capabilities in creating artistic images and short, photorealistic videos without human artist interaction. He discusses the popularity of generative AI, mentioning models like DALL·E, Midjourney, and Google's Gemini, and their limitations, comparing them to a parlor trick. The talk outlines the potential and challenges of generative AI in the context of interactive experiences and filmed entertainment.

05:01

📝 Enhancing Generative AI for Practical Use

The speaker outlines desired improvements for generative AI, such as faster computation for concept art, a better interface beyond text input, art style control, and the ability to maintain consistency in artistic style. He also emphasizes the need for assurance that the chosen art style is not copyrighted and the potential for integrating real concept art into AI-generated content. The paragraph discusses the limitations of current AI-generated videos, which lack interactivity and the need for a more advanced tool for world-building in the entertainment industry.

10:02

🚀 The Future of Generative AI in Entertainment

Mike Zida concludes by discussing the potential of generative AI to move beyond being a novelty and become a practical tool for creating interactive games and feature films. He calls for investment in tool development and the integration of AI researchers with game developers and filmmakers. The speaker mentions the need for a high-level world-building tool that simplifies the process of assigning physics properties and emotional states to characters. He also references a paper on S by Steven Levy and acknowledges the AI and generative AI researchers who contribute to the field.

Mindmap

Keywords

💡Generative AI

Generative AI refers to artificial intelligence systems capable of creating new content, such as images or videos, based on given input or learned patterns. In the video, it is described as a technology that can imagine artistic images and short, photorealistic videos without human artist interaction. The speaker criticizes its current capabilities as a mere 'parlor trick' due to its limitations in creating complex, interactive content like full-length films.

💡Parlor Trick

A parlor trick is a simple, often deceptive, act or device intended to impress or entertain. In the context of the video, the speaker uses this term to express skepticism about the practical applications of generative AI, suggesting that while it may create visually appealing outputs, it does not yet have the capability to be a transformative tool in the entertainment industry.

💡DALL·E

DALL·E is a generative AI model developed by OpenAI, known for creating images from textual descriptions. The video mentions DALL·E as one of the best-known models for imagining images, illustrating the technology's ability to interpret text and generate corresponding visuals, such as an image of a 'dead cat bounce'.

💡Google Gemini

Google Gemini is another generative AI model mentioned in the video, which, like DALL·E, can generate images based on text descriptions. It represents the ongoing development and competition in the field of AI-generated imagery.

💡OpenAI Sora

OpenAI Sora is a generative AI model that has recently been developed to create short, detailed, photorealistic videos. The video script highlights the limitations of Sora by challenging it to generate a full-length Pixar film like 'Toy Story', which would require a significant leap in AI capabilities.

💡Photorealistic

Photorealistic refers to images or videos that are so realistic in appearance that they could be mistaken for photographs or live footage. The video discusses the current limitations of generative AI in producing truly photorealistic content, especially in the context of short videos.

💡3D Objects and Scenes

In the context of the video, 3D objects and scenes refer to the virtual elements created for use in video games, films, and other digital media. The speaker expresses a desire for generative AI to be able to create and render these elements more efficiently and interactively, which would enhance their utility in the entertainment industry.

💡Art Style Control

Art style control pertains to the ability to dictate the visual aesthetic or artistic style of generated content. The video emphasizes the need for generative AI to offer users more control over the style of their creations, allowing for consistency and personalization in artistic output.

💡Workspace Tool

A workspace tool, as discussed in the video, is an interface that allows users to interact with and manipulate their generative AI creations. The speaker envisions a tool that would enable real-time editing and rendering of concepts, streamlining the creative process and improving efficiency.

💡Scalability

Scalability in the context of AI refers to the ability of a system or model to handle increased demand or complexity without a significant degradation in performance. The video raises questions about the scalability of generative AI models, particularly in terms of processing time and computational resources required to create more complex content.

💡World Building Tool

A world building tool is a software application used to create and develop virtual environments, typically for video games or virtual reality experiences. The speaker calls for the development of such tools that integrate AI to assist in the creation of interactive and immersive worlds, making the process more accessible and efficient.

Highlights

Generative AI is currently seen as a parlor trick, with its main application being the creation of artistic images and short, photorealistic videos.

Generative AI sits on top of a large language model and generates images based on textual descriptions.

Popular generative AI models for images include DALL·E, Midjourney, and Google's Gemini.

The term 'dead cat bounce' is misunderstood in the context of generative AI, as it's actually a stock market term.

OpenAI's DALL·E 2 can generate short videos, but they are limited to 30 seconds or less.

The speaker challenges generative AI by asking for a complete Pixar film like Toy Story to be generated.

The speaker suggests improvements for generative AI, such as performance enhancement, better interfaces, and art style control.

The need for assurance that chosen art styles are not copyrighted is highlighted.

The concept of using a workspace tool for iterative concept art development is proposed.

The speaker emphasizes the desire for 3D object and character interaction within the generated scenes.

The potential scalability of large language models is questioned, with the example of generating a full-length Pixar film.

The need for investment in tool-building for generative AI to be useful in interactive and filmed entertainment industries is discussed.

The speaker calls for a world-building tool that integrates AI to assist in the creation of game worlds and characters.

The importance of assigning physics properties and emotional states to characters in a high-level, user-friendly way is mentioned.

The speaker concludes that generative AI needs to move beyond being a parlor trick to become a practical tool for the entertainment industry.

The speaker suggests embedding generative AI researchers with game developers and filmmakers for more practical applications.

The speaker's paper draft is available on their webpage, and they invite others to contribute to the games column.