A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop
TLDRThe video discusses the debunking of GPT 4.5 rumors by OpenAI employees, the potential of ET Transformer's custom chip for AI acceleration, the competitive pricing of Mixture of Experts models, and the implications of ByteDance secretly using OpenAI's technology. It also features an interview with Sebastian Bรคck, discussing the future of AI models and their capabilities.
Takeaways
- ๐ซ OpenAI employees have denied the existence of GPT 4.5, dismissing it as a consistent hallucination.
- ๐ค Speculation about GPT 4.5's performance improvements if 'GPT 4.5 turbo' was added to system prompts.
- ๐ก The ET Transformer is claimed to be the world's first Transformer supercomputer, designed specifically for running Transformers models.
- ๐ ET Transformer's custom chip, code-named Soo, is said to outperform Nvidia's H100 significantly in tokens per second inference.
- ๐ Harvard dropouts are raising millions to design an AI accelerator chip focused on large language model acceleration.
- ๐ The rapid evolution of AI workloads poses a risk to etched AI's specialization on Transformers, but the co-founders are confident in their choice.
- ๐ฐ Potential benefits of specialized hardware like ET Transformer include 140 times the throughput per dollar and real-time interactions with models.
- ๐ Mix Trial's 8*7 billion parameter model is open-sourced and competitively priced, with costs dropping dramatically shortly after its announcement.
- ๐ค Bite Dance is reportedly using OpenAI's technology to develop a competing AI model, in violation of OpenAI's terms of service.
- ๐ OpenAI's research head, Quangang, hints at a super strong model more powerful than Gemini expected to arrive soon, and their ambition to outpace GPT-4.
- ๐ผ๏ธ Preview images for Mid Journey V6 show an increased level of photorealism, indicating advancements in AI-generated imagery.
Q & A
What are the GPT 4.5 rumors that were circulating on Twitter and how did OpenAI personnel respond to them?
-The GPT 4.5 rumors were about a supposed new version of the OpenAI model with enhanced capabilities. OpenAI personnel, including Will Dew and Samman, denied the existence of GPT 4.5, with Samman humorously responding 'nah' to a query about a leak, and an undercover OpenAI employee emphasizing there's no version 4.5 and advising people to be more resistant to AI hype.
What is the significance of the ET Transformer and how does it differ from existing technology?
-The ET Transformer is a new type of supercomputer designed specifically to run on the Transformer architecture, which is used in large language models and other AI applications. Unlike general-purpose GPUs that are optimized through software, the ET Transformer has the Transformer architecture etched into its silicon, allowing every transistor to be optimized for Transformer computations, potentially offering significant performance improvements and cost savings for AI tasks.
What was the strategic move made by the creators of the ET Transformer regarding its software stack?
-The creators of the ET Transformer plan to fully open-source the software stack. This strategy is aimed at drawing users to depend on the new hardware, which is specialized for running Transformers, potentially unlocking significant gains in performance if the architecture remains dominant in the AI space.
What is the current status and potential impact of the Mix Trial's 8*7 billion parameter model?
-The Mix Trial's 8*7 billion parameter model is currently open-sourced and is reported to match or beat the performance of GPT-3.5 and Gemini Pro in benchmarks. Its price has dramatically decreased, indicating a trend where AI models are not only improving in performance but also becoming more affordable, raising questions about the future cost-effectiveness of AI capabilities.
What was Sebastian Bck's perspective on the potential capabilities of a 13 billion parameter model?
-Sebastian Bck, one of the lead authors of Sparks of AGI and the F series of models, expressed that it's an open question what capabilities a 13 billion parameter model will achieve. However, based on the performance improvements seen at 1 billion and 3 billion parameters, and his knowledge of big models like GPT-4, he believes there is room for the reasoning part to be enabled at 13 billion parameters.
What is the current situation regarding ByteDance's use of OpenAI's technology?
-ByteDance, a multi-hundred billion dollar company behind TikTok, is reportedly using OpenAI's technology to develop a competing AI model, code-named Project Seed. This is in direct violation of OpenAI's terms of service, which prohibit using model outputs to develop competing AI models. Internal documents confirm that ByteDance has relied on OpenAI's API during nearly every phase of development, and employees are aware of the implications, even planning to whitewash evidence of this use.
What did Quangang, the head of research at ByteDan, say about the future of AI models?
-Quangang expressed uncertainty about GPT-5 but mentioned that a super strong model, more powerful than Gemini, is expected to arrive soon. He was referring to ByteDance's Project Seed when talking about open-sourcing model weights and stated that they don't intend to just catch up with GPT-4 but aim to outpace GPT-5.
What is the significance of the price reduction for the Mix Trial model?
-The price reduction for the Mix Trial model, from $2 to as low as 27 cents per 1 million tokens, indicates a rapid decrease in the cost of accessing AI models. This trend suggests that the affordability of AI technology is improving dramatically, which could have significant implications for the widespread adoption and application of AI.
What is the potential impact of the ET Transformer on the AI industry?
-If successful, the ET Transformer could revolutionize the AI industry by offering a 100x gain in throughput per dollar compared to existing technologies like Nvidia's H100. This could enable real-time interactions with AI models at a very low cost, significantly improving the user experience and expanding the possibilities for AI applications.
What are the implications of the rapid decrease in the cost of AI models like the Mix Trial and the potential release of more powerful models?
-The rapid decrease in cost and the potential for more powerful models suggests that by the end of 2024, we could see significant advancements in AI capabilities at a much lower cost. This could lead to broader access to advanced AI technologies and potentially transformative changes in various industries.
What is the significance of the photorealism in the preview images for Mid Journey V6?
-The added level of photorealism in the preview images for Mid Journey V6 signifies a potential breakthrough in AI-generated imagery. While there is still a slight smoothness to the images, the realistic quality, especially when upscaled using AI like magnific, indicates advancements in AI's ability to create highly detailed and believable visual content.
Outlines
๐ฃ๏ธ GPT 4.5 Rumors and OpenAI Denials
The paragraph discusses the widespread rumors about GPT 4.5 on social media and the subsequent denials from OpenAI employees. It highlights the experiences of hallucinations within the model and the speculations about its potential release. Notable OpenAI members like Will Dew and Sam Mann provided their insights, dismissing the rumors and emphasizing the lack of a GPT 4.5 release. An OpenAI employee also shared their perspective on the low expectations for GPT 4.5 and the importance of focusing on significant improvements in AI models.
๐ Introducing ET Transformer and its Potential Impact
This paragraph introduces the ET Transformer, a new company claiming to have developed the world's first Transformer supercomputer, specifically designed for running Transformers architecture. The company has allegedly etched the Transformer architecture onto a chip, which they claim outperforms NVIDIA's H100 in tokens per second inference, suggesting potential for real-time interaction with AI models. The background of the company's founding by two Harvard dropouts is provided, along with their focus on betting heavily on the Transformer architecture for large language model acceleration. The potential benefits of this specialized hardware are discussed, including significant throughput improvements and cost-effective real-time interactions with AI models.
๐ Mix Trial Price Spiral and Industry Implications
The paragraph covers the Mix Trial, an 8 * 7 billion parameter model that is open-sourced and competitive with GPT-3.5 in performance and price. The rapid decrease in pricing for access to the model is highlighted, with multiple providers significantly reducing their rates within a short period. The implications of this price spiral for the future of AI and the potential for accessible, high-level reasoning capabilities at a lower cost are discussed. An interview with Sebastian Beck, one of the lead authors of Sparks of AGI and the F series of models, is mentioned, where he shares his views on the potential for reasoning capabilities in AI models and the focus on scientific quests rather than fitting models onto phones.
๐คซ ByteDance's Secretive Use of OpenAI Tech
The final paragraph reveals that ByteDance, the company behind TikTok, has been secretly using OpenAI's technology to develop a competing AI model, code-named Project Seed. This is in violation of OpenAI's terms of service, which prohibit using model outputs to develop competing AI models. Internal documents confirm the reliance on OpenAI's API throughout Project Seed's development phases. The awareness of ByteDance employees about the implications and their plans to cover up the evidence are mentioned. The paragraph concludes with OpenAI banning ByteDance from ChatGPT due to potential data theft allegations.
๐ธ Preview of Mid Journey V6 and Realism Advancements
The paragraph provides a sneak peek into the advancements made in the Mid Journey V6, noting the increased level of photorealism in the images generated by the model. While there is still a slight smoothness to the images, upscaling them using techniques like magnific results in exceptionally realistic representations, particularly of human subjects. The viewer is invited to share their thoughts on the developments and the video ends with a thank you note for watching.
Mindmap
Keywords
๐กGPT 4.5
๐กTransformers
๐กAI Hype
๐กOpenAI
๐กEtched AI
๐กReal-time Interaction
๐กOpen Sourcing
๐กMix Trial
๐กParameter
๐กSebastian Bck
๐กByteDance
๐กMid Journey V6
Highlights
Rumors about GPT 4.5 are circulating on Twitter, but denials have come from three OpenAI employees.
Will, a senior member of OpenAI, dismissed GPT 4.5 turbo discoveries as a consistent hallucination.
Samman, another OpenAI employee, humorously responded to a GPT 4.5 leak inquiry with a simple 'Nah'.
Ru, an undercover OpenAI employee, emphasized the lack of a GPT 4.5 release and advised developing resistance to AI hype.
An OpenAI employee commented on the low expectations for GPT 4.5 and shared insights on testing new models.
ET Transformer is claimed to be the world's first Transformer supercomputer, designed specifically for running Transformers.
ET Transformer's custom chip, code-named Soo, is said to outperform Nvidia's H100 on tokens per second, enabling real-time interaction.
Harvard dropouts raised millions to design an AI accelerator chip focused on large language model acceleration.
The co-founders of the chip company are betting on Transformers and large language models for their AI accelerator chip.
ET AI's specialization could lead to a 140 times increase in throughput per dollar compared to Nvidia H100.
The potential of etched AI's hardware could enable real-time voice agents and rapid comparison of model outputs.
ET Transformer's architecture is etched onto a chip, optimizing every transistor for Transformer computations.
The software stack for ET Transformer's supercomputers is planned to be fully open-sourced.
M trial's mixture of experts is an 8 * 7 billion parameter model that is open-sourced and cost-effective.
There has been a significant price drop for access to the Mixr model, indicating a rapid decrease in the cost of AI models.
Sebastian Bck, lead author of Sparks of AGI, is uncertain about GPT 5 but expects powerful models to arrive soon.
ByteDance, the company behind TikTok, is secretly using OpenAI tech to build a competitor, violating OpenAI's terms of service.
OpenAI banned ByteDance from ChatGPT due to potential data theft.
Preview images for Mid Journey V6 show an increased level of photorealism in AI-generated images.