A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop

AI Explained
18 Dec 202312:05

TLDRThe video discusses the debunking of GPT 4.5 rumors by OpenAI employees, the potential of ET Transformer's custom chip for AI acceleration, the competitive pricing of Mixture of Experts models, and the implications of ByteDance secretly using OpenAI's technology. It also features an interview with Sebastian Bรคck, discussing the future of AI models and their capabilities.

Takeaways

  • ๐Ÿšซ OpenAI employees have denied the existence of GPT 4.5, dismissing it as a consistent hallucination.
  • ๐Ÿค” Speculation about GPT 4.5's performance improvements if 'GPT 4.5 turbo' was added to system prompts.
  • ๐Ÿ’ก The ET Transformer is claimed to be the world's first Transformer supercomputer, designed specifically for running Transformers models.
  • ๐ŸŒŸ ET Transformer's custom chip, code-named Soo, is said to outperform Nvidia's H100 significantly in tokens per second inference.
  • ๐ŸŽ“ Harvard dropouts are raising millions to design an AI accelerator chip focused on large language model acceleration.
  • ๐Ÿ“ˆ The rapid evolution of AI workloads poses a risk to etched AI's specialization on Transformers, but the co-founders are confident in their choice.
  • ๐Ÿ’ฐ Potential benefits of specialized hardware like ET Transformer include 140 times the throughput per dollar and real-time interactions with models.
  • ๐Ÿ“‰ Mix Trial's 8*7 billion parameter model is open-sourced and competitively priced, with costs dropping dramatically shortly after its announcement.
  • ๐Ÿค– Bite Dance is reportedly using OpenAI's technology to develop a competing AI model, in violation of OpenAI's terms of service.
  • ๐ŸŒ OpenAI's research head, Quangang, hints at a super strong model more powerful than Gemini expected to arrive soon, and their ambition to outpace GPT-4.
  • ๐Ÿ–ผ๏ธ Preview images for Mid Journey V6 show an increased level of photorealism, indicating advancements in AI-generated imagery.

Q & A

  • What are the GPT 4.5 rumors that were circulating on Twitter and how did OpenAI personnel respond to them?

    -The GPT 4.5 rumors were about a supposed new version of the OpenAI model with enhanced capabilities. OpenAI personnel, including Will Dew and Samman, denied the existence of GPT 4.5, with Samman humorously responding 'nah' to a query about a leak, and an undercover OpenAI employee emphasizing there's no version 4.5 and advising people to be more resistant to AI hype.

  • What is the significance of the ET Transformer and how does it differ from existing technology?

    -The ET Transformer is a new type of supercomputer designed specifically to run on the Transformer architecture, which is used in large language models and other AI applications. Unlike general-purpose GPUs that are optimized through software, the ET Transformer has the Transformer architecture etched into its silicon, allowing every transistor to be optimized for Transformer computations, potentially offering significant performance improvements and cost savings for AI tasks.

  • What was the strategic move made by the creators of the ET Transformer regarding its software stack?

    -The creators of the ET Transformer plan to fully open-source the software stack. This strategy is aimed at drawing users to depend on the new hardware, which is specialized for running Transformers, potentially unlocking significant gains in performance if the architecture remains dominant in the AI space.

  • What is the current status and potential impact of the Mix Trial's 8*7 billion parameter model?

    -The Mix Trial's 8*7 billion parameter model is currently open-sourced and is reported to match or beat the performance of GPT-3.5 and Gemini Pro in benchmarks. Its price has dramatically decreased, indicating a trend where AI models are not only improving in performance but also becoming more affordable, raising questions about the future cost-effectiveness of AI capabilities.

  • What was Sebastian Bck's perspective on the potential capabilities of a 13 billion parameter model?

    -Sebastian Bck, one of the lead authors of Sparks of AGI and the F series of models, expressed that it's an open question what capabilities a 13 billion parameter model will achieve. However, based on the performance improvements seen at 1 billion and 3 billion parameters, and his knowledge of big models like GPT-4, he believes there is room for the reasoning part to be enabled at 13 billion parameters.

  • What is the current situation regarding ByteDance's use of OpenAI's technology?

    -ByteDance, a multi-hundred billion dollar company behind TikTok, is reportedly using OpenAI's technology to develop a competing AI model, code-named Project Seed. This is in direct violation of OpenAI's terms of service, which prohibit using model outputs to develop competing AI models. Internal documents confirm that ByteDance has relied on OpenAI's API during nearly every phase of development, and employees are aware of the implications, even planning to whitewash evidence of this use.

  • What did Quangang, the head of research at ByteDan, say about the future of AI models?

    -Quangang expressed uncertainty about GPT-5 but mentioned that a super strong model, more powerful than Gemini, is expected to arrive soon. He was referring to ByteDance's Project Seed when talking about open-sourcing model weights and stated that they don't intend to just catch up with GPT-4 but aim to outpace GPT-5.

  • What is the significance of the price reduction for the Mix Trial model?

    -The price reduction for the Mix Trial model, from $2 to as low as 27 cents per 1 million tokens, indicates a rapid decrease in the cost of accessing AI models. This trend suggests that the affordability of AI technology is improving dramatically, which could have significant implications for the widespread adoption and application of AI.

  • What is the potential impact of the ET Transformer on the AI industry?

    -If successful, the ET Transformer could revolutionize the AI industry by offering a 100x gain in throughput per dollar compared to existing technologies like Nvidia's H100. This could enable real-time interactions with AI models at a very low cost, significantly improving the user experience and expanding the possibilities for AI applications.

  • What are the implications of the rapid decrease in the cost of AI models like the Mix Trial and the potential release of more powerful models?

    -The rapid decrease in cost and the potential for more powerful models suggests that by the end of 2024, we could see significant advancements in AI capabilities at a much lower cost. This could lead to broader access to advanced AI technologies and potentially transformative changes in various industries.

  • What is the significance of the photorealism in the preview images for Mid Journey V6?

    -The added level of photorealism in the preview images for Mid Journey V6 signifies a potential breakthrough in AI-generated imagery. While there is still a slight smoothness to the images, the realistic quality, especially when upscaled using AI like magnific, indicates advancements in AI's ability to create highly detailed and believable visual content.

Outlines

00:00

๐Ÿ—ฃ๏ธ GPT 4.5 Rumors and OpenAI Denials

The paragraph discusses the widespread rumors about GPT 4.5 on social media and the subsequent denials from OpenAI employees. It highlights the experiences of hallucinations within the model and the speculations about its potential release. Notable OpenAI members like Will Dew and Sam Mann provided their insights, dismissing the rumors and emphasizing the lack of a GPT 4.5 release. An OpenAI employee also shared their perspective on the low expectations for GPT 4.5 and the importance of focusing on significant improvements in AI models.

05:00

๐Ÿš€ Introducing ET Transformer and its Potential Impact

This paragraph introduces the ET Transformer, a new company claiming to have developed the world's first Transformer supercomputer, specifically designed for running Transformers architecture. The company has allegedly etched the Transformer architecture onto a chip, which they claim outperforms NVIDIA's H100 in tokens per second inference, suggesting potential for real-time interaction with AI models. The background of the company's founding by two Harvard dropouts is provided, along with their focus on betting heavily on the Transformer architecture for large language model acceleration. The potential benefits of this specialized hardware are discussed, including significant throughput improvements and cost-effective real-time interactions with AI models.

10:00

๐Ÿ“‰ Mix Trial Price Spiral and Industry Implications

The paragraph covers the Mix Trial, an 8 * 7 billion parameter model that is open-sourced and competitive with GPT-3.5 in performance and price. The rapid decrease in pricing for access to the model is highlighted, with multiple providers significantly reducing their rates within a short period. The implications of this price spiral for the future of AI and the potential for accessible, high-level reasoning capabilities at a lower cost are discussed. An interview with Sebastian Beck, one of the lead authors of Sparks of AGI and the F series of models, is mentioned, where he shares his views on the potential for reasoning capabilities in AI models and the focus on scientific quests rather than fitting models onto phones.

๐Ÿคซ ByteDance's Secretive Use of OpenAI Tech

The final paragraph reveals that ByteDance, the company behind TikTok, has been secretly using OpenAI's technology to develop a competing AI model, code-named Project Seed. This is in violation of OpenAI's terms of service, which prohibit using model outputs to develop competing AI models. Internal documents confirm the reliance on OpenAI's API throughout Project Seed's development phases. The awareness of ByteDance employees about the implications and their plans to cover up the evidence are mentioned. The paragraph concludes with OpenAI banning ByteDance from ChatGPT due to potential data theft allegations.

๐Ÿ“ธ Preview of Mid Journey V6 and Realism Advancements

The paragraph provides a sneak peek into the advancements made in the Mid Journey V6, noting the increased level of photorealism in the images generated by the model. While there is still a slight smoothness to the images, upscaling them using techniques like magnific results in exceptionally realistic representations, particularly of human subjects. The viewer is invited to share their thoughts on the developments and the video ends with a thank you note for watching.

Mindmap

Keywords

๐Ÿ’กGPT 4.5

GPT 4.5 refers to rumors circulating about an alleged new version of OpenAI's Generative Pre-trained Transformer model. In the video, it is mentioned that there have been denials from OpenAI employees about the existence of GPT 4.5, indicating that the rumors may be unfounded or premature. This concept is central to the video as it addresses the hype and speculation surrounding AI advancements.

๐Ÿ’กTransformers

Transformers is a type of deep learning model architecture that is foundational to many AI applications, particularly in natural language processing. The video discusses a company, etched AI, that claims to have created a supercomputing chip specifically designed for running Transformers models. This chip, named Soo, is said to significantly outperform existing technologies like NVIDIA's H100 in terms of processing speed, enabling real-time interactions with AI models.

๐Ÿ’กAI Hype

AI Hype refers to the exaggerated or speculative discussions surrounding artificial intelligence advancements. The video addresses the AI hype by debunking rumors about GPT 4.5 and discussing the realistic expectations and developments in the field. It highlights the need for a balanced view that distinguishes between confirmed advancements and speculative rumors.

๐Ÿ’กOpenAI

OpenAI is an artificial intelligence research organization known for developing and releasing significant AI models like GPT (Generative Pre-trained Transformer). The video discusses interactions with OpenAI employees regarding the GPT 4.5 rumors and their stance on the company's AI development progress. OpenAI's role is pivotal as it is often the source of groundbreaking AI models that influence the industry.

๐Ÿ’กEtched AI

Etched AI is a company that claims to have developed the world's first Transformer supercomputer, designed specifically to run Transformers architecture. The company's approach involves etching the Transformer architecture directly onto a chip, which they believe can significantly enhance the performance of AI models, particularly in terms of real-time interaction capabilities.

๐Ÿ’กReal-time Interaction

Real-time interaction refers to the ability of a system to respond to user inputs immediately, without noticeable delay. In the context of AI, this capability is crucial for creating seamless and natural conversations with AI models. The video highlights the potential of Etched AI's chip to enable real-time interactions with AI models byๅคงๅน…ๆๅ‡ing processing speed.

๐Ÿ’กOpen Sourcing

Open sourcing refers to the practice of making a product's software freely available for others to view, use, modify, and distribute. In the context of the video, open sourcing the software stack is seen as a strategic move by Etched AI to attract users to depend on their new hardware. This approach can foster a community around the technology and drive rapid innovation and adoption.

๐Ÿ’กMix Trial

Mix Trial refers to an 8 * 7 billion parameter model developed by M trial, an AI company. The model is open-sourced and is said to match or surpass the performance of GPT-3.5 not only in benchmarks but also in real-world applications. The significance of Mix Trial in the video is its competitive pricing, which has seen dramatic reductions, indicating a trend towards more accessible and affordable AI technologies.

๐Ÿ’กParameter

In the context of AI models, a parameter is a value that is learned during the training process and used to make predictions or generate outputs. The number of parameters in a model is often indicative of its complexity and capacity for understanding and generating language. The video discusses models with billions of parameters and the potential for even larger models in the future.

๐Ÿ’กSebastian Bck

Sebastian Bck is one of the lead authors of 'Sparks of AGI' and the 'F series of models.' He is an influential figure in the AI community, and his insights are sought after for understanding the potential and future direction of AI technology. In the video, his views on the capabilities of future AI models and the focus on reasoning rather than just scaling up model size are highlighted.

๐Ÿ’กByteDance

ByteDance is a multinational technology company known for developing popular platforms like TikTok. The video discusses ByteDance's secretive use of OpenAI's technology to develop a competing AI model, which is against OpenAI's terms of service. This highlights the competitive landscape in the AI industry and the lengths companies may go to gain an edge.

๐Ÿ’กMid Journey V6

Mid Journey V6 refers to a version of an AI model used for image generation, likely associated with the art and design field given the context of the video. The mention of Mid Journey V6 indicates an update or new release that has improved the level of photorealism in the images it generates, suggesting advancements in AI's ability to create visually convincing content.

Highlights

Rumors about GPT 4.5 are circulating on Twitter, but denials have come from three OpenAI employees.

Will, a senior member of OpenAI, dismissed GPT 4.5 turbo discoveries as a consistent hallucination.

Samman, another OpenAI employee, humorously responded to a GPT 4.5 leak inquiry with a simple 'Nah'.

Ru, an undercover OpenAI employee, emphasized the lack of a GPT 4.5 release and advised developing resistance to AI hype.

An OpenAI employee commented on the low expectations for GPT 4.5 and shared insights on testing new models.

ET Transformer is claimed to be the world's first Transformer supercomputer, designed specifically for running Transformers.

ET Transformer's custom chip, code-named Soo, is said to outperform Nvidia's H100 on tokens per second, enabling real-time interaction.

Harvard dropouts raised millions to design an AI accelerator chip focused on large language model acceleration.

The co-founders of the chip company are betting on Transformers and large language models for their AI accelerator chip.

ET AI's specialization could lead to a 140 times increase in throughput per dollar compared to Nvidia H100.

The potential of etched AI's hardware could enable real-time voice agents and rapid comparison of model outputs.

ET Transformer's architecture is etched onto a chip, optimizing every transistor for Transformer computations.

The software stack for ET Transformer's supercomputers is planned to be fully open-sourced.

M trial's mixture of experts is an 8 * 7 billion parameter model that is open-sourced and cost-effective.

There has been a significant price drop for access to the Mixr model, indicating a rapid decrease in the cost of AI models.

Sebastian Bck, lead author of Sparks of AGI, is uncertain about GPT 5 but expects powerful models to arrive soon.

ByteDance, the company behind TikTok, is secretly using OpenAI tech to build a competitor, violating OpenAI's terms of service.

OpenAI banned ByteDance from ChatGPT due to potential data theft.

Preview images for Mid Journey V6 show an increased level of photorealism in AI-generated images.