【人工智能】OpenAI发布最新小模型GPT-4o mini | 取代GPT-3.5 | 性能超GPT-4 | 价格下降60% | Mistral发布小模型NeMo | 超越Mistral 7B

大飞说科技
18 Jul 202408:30

TLDROpenAI发布了性能超越GPT-4且价格下降60%的GPT-4o mini模型,立即取代了GPT-3.5 Turbo。此模型在MMLU得分82%,支持128K token上下文窗口,能效高,适用于多种任务。商用价格极具竞争力,输入每百万token 15美分,输出60美分。同时,MistralAI与NVIDIA发布了Mistral NeMo 12B小模型,支持多语言任务,性能大幅提升,采用新分词器Tekken,压缩效率显著提高。

Takeaways

  • 🆕 OpenAI发布了GPT-4o mini模型,性能超越GPT-4,价格大幅下降。
  • 🚀 GPT-4o mini模型在MMLU测试中得分82%,聊天性能优于GPT-4。
  • 💰 GPT-4o mini商用价格显著降低,输入和输出token成本仅为GPT-3.5 Turbo的60%和40%。
  • 📈 GPT-4o mini模型支持128K token的上下文窗口,知识截止2023年10月。
  • 🌐 改进版tokenizer使GPT-4o mini处理非英语文本更经济高效。
  • 🔍 GPT-4o mini在多模态推理和文本智能方面超越了其他小型模型。
  • 📊 在数学和编码能力方面,GPT-4o mini在MGSM和HumanEval基准测试中得分高。
  • 🔒 OpenAI从模型开发初期就内置了安全措施,通过自动和人工评估确保模型安全性。
  • 🔧 Mistral发布了NeMo小模型,支持多语言任务和高效压缩多种语言文本。
  • 📝 GPT-4o mini和Mistral NeMo的发布预示着AI模型朝着更小、更高效、更安全的方向发展。
  • 🔮 业界预测,未来将出现更小但思考能力强、可靠的AI模型,当前大模型的庞大是训练过程中的浪费所致。

Q & A

  • What is the significance of the release of GPT-4o mini by OpenAI?

    -The release of GPT-4o mini is significant as it immediately replaces the previous GPT-3.5 Turbo model, offering improved performance and a substantial reduction in cost, making it more accessible for a wider range of applications.

  • What is the MMLU score of GPT-4o mini, and how does it compare to GPT-4 in terms of chat performance?

    -GPT-4o mini has an MMLU score of 82%, which is superior to GPT-4 in terms of chat performance, indicating its enhanced capabilities in language understanding and interaction.

  • How much cheaper is GPT-4o mini compared to GPT-3.5 Turbo and GPT-4 in terms of commercial use pricing?

    -GPT-4o mini is more than 60% cheaper than GPT-3.5 Turbo and 96%-97% cheaper than GPT-4, with a commercial price of 15 cents per million input tokens and 60 cents per million output tokens.

  • What are the key features of GPT-4o mini in terms of context window and tokenizer improvements?

    -GPT-4o mini features a 128K token context window and an improved tokenizer based on GPT-4o, which makes it more efficient and cost-effective in handling non-English texts.

  • When is OpenAI expected to release the voice modality for GPT-4o mini?

    -OpenAI plans to release a voice modality test version for GPT-4o mini in late July, with public access permissions to be opened at a later date.

  • How does GPT-4o mini perform in comparison to other small models in terms of multimodal reasoning and text intelligence?

    -GPT-4o mini outperforms other small models in multimodal reasoning and text intelligence, as evidenced by its higher scores in benchmarks like MMLU, MGSM, and MMMU.

  • What is the token generation speed of GPT-4o mini, and how does it compare to other models in terms of inference efficiency?

    -GPT-4o mini generates 183 tokens per second, making it the fastest model on the list and 18 tokens per second faster than the second-fastest model, Gemini 1.5 Flash.

  • What safety measures has OpenAI implemented in GPT-4o mini from the beginning of its development?

    -OpenAI has implemented safety measures from the start of GPT-4o mini's development, including filtering out unwanted information during pre-training and using techniques like RLHF to align model behavior with desired strategies.

  • What is the significance of the release of Mistral NeMo 12B by MistralAI and NVIDIA?

    -The release of Mistral NeMo 12B signifies a new small model that is customizable and deployable for various tasks, with improved performance and efficiency due to its consideration of quantization and the use of a new tokenizer, Tekken.

  • How does the Tekken tokenizer compare to previous tokenizers in terms of compression efficiency for different languages?

    -The Tekken tokenizer shows higher compression efficiency in about 85% of languages compared to Llama 3's tokenizer and offers significant improvements in languages like Korean and Arabic, with efficiency increases of up to 2x and 3x respectively.

  • What is Andrej Karpathy's perspective on the future of large language model development?

    -Andrej Karpathy predicts that we will see very small but highly capable and reliable models in the future. He suggests that the current large size of language models is due to wasteful training processes, and that models must first become large before they can be effectively reduced in size through automated assistance and improved training data.

Outlines

00:00

🚀 Launch of GPT-4o Mini Model

OpenAI has announced the 'Mini' version of the GPT-4o model, which is now available to replace the previous GPT-3.5 Turbo. This new model is accessible to free users and offers significant cost reductions compared to previous models, with a commercial price of 15 cents per million input tokens and 60 cents per million output tokens. The GPT-4o mini model achieves an 82% score on the MMLU benchmark, outperforming GPT-4 in chat-related tasks. It features a 128K token context window and improved tokenizer for handling non-English text. The model is capable of performing a wide range of tasks, including applications requiring multiple model calls, large context transfers, and real-time text interactions. Currently, the API supports text and visual inputs, with plans to introduce voice and other modalities later. The model also excels in text intelligence, multimodal reasoning, and function calls, scoring higher than other small models in various benchmarks. OpenAI CEO Sam Altman emphasizes the model's affordability and the company's commitment to safety and reliability, with built-in security measures and continuous improvements.

05:03

🌐 Emergence of Smaller AI Models and Their Impact

The script discusses the release of the GPT-4o mini model by OpenAI and its implications for the AI industry. It highlights the model's cost-effectiveness and its potential to replace larger models in various applications. Additionally, MistralAI and NVIDIA have introduced the Mistral NeMo 12B model, which is customizable and supports tasks like chatbots, multilingual tasks, programming, and summarization. The Mistral NeMo model is optimized for FP8 inference and shows significant performance improvements over other models like Gemma 2 9B and Llama 3 8B. It also uses a new tokenizer, Tekken, which is more efficient in compressing natural language text and source code across multiple languages. The script concludes with a discussion on the future of AI models, with predictions about the development of smaller but highly capable models. It also mentions the upcoming release of Meta's 400B parameter Llama 3 model and the anticipation for GPT-5, indicating a competitive landscape in the AI industry.

Mindmap

Keywords

💡GPT-4o mini

GPT-4o mini是OpenAI发布的一款最新的小模型。它在性能上超过了之前的GPT-3.5 Turbo和GPT-4,并且在MMLU上的得分为82%。GPT-4o mini模型的商用价格比之前的模型大幅下降,是一个性价比极高的新选择。

💡MMLU

MMLU (Massive Multi-task Language Understanding) 是一种基准测试,用于评估语言模型在多任务语言理解方面的表现。在视频中,GPT-4o mini在MMLU上的得分为82%,显示了其在语言理解能力上的优越表现。

💡上下文窗口

上下文窗口是指模型在处理文本时能够考虑的前后文长度。GPT-4o mini具有128K token的上下文窗口,使其能够处理更长的文本,提高了模型的应用范围和精度。

💡商用价格

商用价格是指模型在商业应用中的使用费用。GPT-4o mini的价格为每百万输入token 15美分,每百万输出token 60美分,比之前的模型便宜了60%以上,甚至比GPT-4o便宜了96%-97%。

💡RLHF

RLHF (Reinforcement Learning from Human Feedback) 是一种通过人类反馈进行强化学习的方法。OpenAI在GPT-4o mini的训练中使用了RLHF,以提高模型的响应准确性和可靠性。

💡多模态推理

多模态推理是指模型在处理涉及多种类型数据(如文本、图像、音频等)的任务时的能力。GPT-4o mini不仅在文本智能方面表现出色,还在多模态推理基准测试中表现优异,得分为59.4%。

💡Tekken分词器

Tekken分词器是一种新型分词器,用于提高自然语言文本和源代码的压缩效率。Mistral NeMo模型使用Tekken分词器,在多种语言上比之前的分词器效率提高了30%-300%。

💡FP8推理

FP8推理是指使用8位浮点数进行推理的技术。Mistral NeMo模型在训练时考虑了量化,因此能够在不降低性能的情况下进行FP8推理,提高了模型的计算效率。

💡安全缓释措施

安全缓释措施是指在模型开发过程中内置的各种安全保护措施。GPT-4o mini内置了与GPT-4o相同的安全缓释措施,包括指令分层方法,以提高模型的安全性和可靠性。

💡Mistral NeMo

Mistral NeMo是MistralAI和NVIDIA发布的新小模型。它支持128K tokens的上下文窗口,性能优于之前的Mistral 7B,并且在多项基准测试任务上表现出色,是一个多语言应用的理想选择。

Highlights

OpenAI has released a new 'Mini' version of the GPT-4o model, immediately replacing the previous GPT-3.5 Turbo.

GPT-4o mini is now available for free users and has scored 82% on the MMLU benchmark, outperforming GPT-4 in chat capabilities.

The commercial price for GPT-4o mini is significantly lower than previous models, with a 60% reduction compared to GPT-3.5 Turbo and a 96%-97% decrease from GPT-4o.

OpenAI's CEO, Sam Altman, described the cost of accessing intelligence as 'too cheap to meter'.

GPT-4o mini has a context window of 128K tokens and knowledge up to October 2023, with improved tokenization for non-English texts.

The model is capable of performing a wide range of tasks with low cost and latency, such as handling multiple model calls or real-time text interactions.

GPT-4o mini's API currently only supports text and vision, with plans to include voice and other modalities later in July.

GPT-4o mini has shown superior performance in text intelligence and multimodal reasoning compared to other small models.

In mathematical and coding capabilities, GPT-4o mini scores higher than previous models on the MGSM and HumanEval benchmarks.

The model demonstrates strong performance in multimodal reasoning with a score of 59.4% on the MMMU benchmark.

GPT-4o mini's reasoning efficiency is the highest among its peers, generating 183 tokens per second.

OpenAI has partnered with collaborators to test GPT-4o mini, finding it superior in tasks such as data extraction from receipts or generating email responses.

The model includes built-in safety measures from the beginning of its development, with continuous reinforcement throughout the process.

GPT-4o mini is the first model to apply the instruction layering method, enhancing its resistance to various security threats.

OpenAI plans to release a fine-tuned version of GPT-4o mini in the coming days, making it available to ChatGPT users.

Mistral AI and NVIDIA have also released a new small model, Mistral NeMo 12B, which is customizable and supports various tasks.

Mistral NeMo supports 128k Tokens context window and can replace systems using Mistral 7B, with improved performance in multiple benchmarks.

The new tokenizer Tekken, used in Mistral NeMo, offers higher compression efficiency for natural language text and source code across many languages.

Andrej Karpathy predicts a future of very small but highly capable and reliable models, as the competition in model scale moves in the opposite direction.

The significant price drop has made it economically feasible to reason about every word spoken or heard in the US within 24 hours for under $200,000.