NEW Mixtral 8x22b Tested - Mistral's New Flagship MoE Open-Source Model
TLDRThe video discusses the testing of Mistral's new 8x22b parameter MoE open-source model, a significant upgrade from the previous 8x7b model. The base model and its fine-tuned version, Kurasu Mixt 8*22b, are evaluated for various tasks including coding, game playing, logic reasoning, and problem-solving. The model shows promising results, particularly in coding and logic puzzles, although it falls short in certain areas like the snake game and a complex math problem. The video concludes by encouraging viewers to look forward to future fine-tuned versions of the model.
Takeaways
- 🚀 Introduction of Mistral's new 8x22b parameter MoE (Mixture of Experts) open-source model, a significant upgrade from the previous 8x7b model.
- 💻 The new model is released with no accompanying information, only a torrent link, following Mistral's typical mysterious fashion.
- 📊 The base 8x22b model is not fine-tuned, but a fine-tuned version called 'Kurasu Mixt 8x22b' is available for chat applications.
- 🔍 Testing of the model is conducted using Informatic Doai, a platform that offers free access to the latest models, including the 8x22b.
- 📝 Successful completion of a basic programming task: writing a Python script to output numbers 1 to 100.
- 🎮 The model passes the Snake Game challenge, albeit with a minor issue where the snake can pass through walls but correctly ends the game upon self-collision.
- 📈 The model demonstrates a good understanding of logic and reasoning, such as explaining the transitive property in speed comparison and solving a drying time problem with proportional reasoning.
- 🧐 Uncensored capabilities are partially present; the model can be nudged to provide certain information but still maintains some restrictions.
- 📊 The model makes a mistake in a basic math problem, initially providing an incorrect answer but correcting it after a step-by-step explanation.
- 🔢 A logic and reasoning problem about the number of killers in a room is incorrectly solved, showing a misunderstanding of the scenario.
- 📋 The model correctly creates JSON data for a given set of people with different attributes.
- 🤔 A logic problem involving a marble in a cup placed inside a microwave is incorrectly solved, showing a lack of understanding of gravity's role in the situation.
- 🏆 Overall, the Kurasu fine-tuned version of the 8x22b model performs well, but there is anticipation for further fine-tuned versions that could potentially outperform the previous 8x7b model.
Q & A
What is the new model released by Mistol AI?
-The new model released by Mistol AI is an 8x22b parameter MixMoE (Mixture of Experts) open-source model, named Mistol 8x22b version 0.1.
How does the new Mistol 8x22b model compare to the previous version in terms of parameters?
-The new Mistol 8x22b model has 22 billion parameters, which is an increase from the previous version that had 7 billion parameters.
What is the fine-tuned version of the Mistol 8x22b model called?
-The fine-tuned version of the Mistol 8x22b model is called Kurasu Mixt 8x22b.
Which platform was used to run the inference of the Kurasu Mixt 8x22b model?
-The platform used to run the inference of the Kurasu Mixt 8x22b model is Informatic Doai.
What was the first test performed with the new model?
-The first test performed with the new model was to write a Python script to output numbers 1 to 100.
How did the model handle the task of writing a Snake game in Python?
-The model was able to write a basic version of the Snake game in Python, but it had some issues such as the snake going through walls and not ending the game when leaving the window.
What was the result of the model's attempt to answer a logic and reasoning question about drying shirts?
-The model provided a correct answer by using simple proportion, stating that it would take 16 hours for 20 shirts to dry under similar conditions.
How did the model perform on the task of creating JSON for given people information?
-The model successfully created a JSON structure with the provided information about Mark, Joe, and Sam, including their names, ages, and genders.
What was the model's performance on the hardest logic and reasoning problem involving a marble in a cup inside a microwave?
-The model's response was incorrect as it suggested the marble would still be on the table, which is not the case since the marble would remain inside the cup placed inside the microwave.
How did the model handle the task of providing 10 sentences ending with the word 'Apple'?
-The model failed to provide sentences that ended with the word 'Apple', but it did include the word 'Apple' in every sentence.
What was the model's response to the question about digging a 10-ft hole with 50 people?
-The model correctly calculated that it would take approximately 6 minutes for 50 people to dig a 10-ft hole, assuming no limitations on space or equipment.
Outlines
🚀 Testing the New Mixol 8*22B Model
The paragraph discusses the testing of a newly released massive open-source model, Mixol 8*22B, which is an upgrade from the previous 8*7B model. The author expresses excitement about this version, which is a mixture of experts model. The testing involves using a fine-tuned version named Kurasu Mixt 8*22B for chat, sourced from Light Blue. The author also mentions using Informatic Doai to run the inference for free and compares it with other models. The testing includes writing a Python script, the Snake Game, and assessing the model's uncensored capabilities and logical reasoning. The paragraph concludes with the author's positive impressions of the model's performance, despite some minor issues.
🧠 Logical Reasoning and Problem-Solving
This paragraph delves into the model's performance in logical reasoning and problem-solving tasks. It covers a variety of problems, including drying shirts, determining speed based on a transitive property, simple math, complex math, planning and prediction, and a classic logic puzzle involving killers in a room. The author evaluates the model's responses, noting both successes and failures. The paragraph highlights the model's correct use of the transitive property to deduce who is faster and its incorrect solution to the killer problem, which is contrasted with the previous Mixol version's correct answer. The model's performance in creating JSON and answering a physics-related logic question is also discussed, with mixed results.
🎯 Final Assessment and Predictions
The final paragraph assesses the model's performance in creating sentences ending with a specific word, the nuances required in a digging task involving multiple people, and ends with an overall evaluation of the Mixol 8*22B model. The model's failure to correctly complete the 'Apple' sentences is noted, but it is acknowledged that the model did include the word 'Apple' in every sentence. The paragraph concludes with a logical and well-explained answer to the digging task, demonstrating the model's understanding of proportional work efforts. The author reflects on the model's overall performance, noting its impressive capabilities and potential for future improvements through fine-tuning.
Mindmap
Keywords
💡Mixture of Experts (MoE)
💡Open-Source Model
💡Parameter
💡Fine-Tuned Version
💡Informatical Doai
💡Quantized
💡Snake Game
💡Censored
💡Logic and Reasoning
💡JSON
💡Physics
Highlights
Mistol AI has released a new flagship MoE Open-Source Model, an 8x22b parameter model.
The new model is a massive open-source mixture of experts model, a significant upgrade from the previous 8x7b model.
Mistol AI announced the model with a unique approach, simply dropping a torrent link with no additional information.
The base model is not fine-tuned, but a fine-tuned version called Kurasu Mixt 8x22b is available for chat.
Informatica Doai is used to run the inference of the model, offering a free platform for testing the latest models.
The model passed the test of writing a Python script to output numbers 1 to 100.
The model demonstrated the ability to write a game of Snake in Python, albeit with some issues.
The model showed uncensored behavior when pushed, indicating potential for further refinement.
The model provided a logical and correct answer to a drying time problem involving shirts.
The model correctly applied the transitive property to determine speed among three individuals.
The model made a mistake in a simple math problem, initially providing an incorrect answer but correcting it with step-by-step explanation.
The model failed to accurately predict the word count in a response to a prompt.
The model incorrectly reasoned in the 'killers in a room' logic problem, showing a misunderstanding of the scenario.
The model successfully created JSON for given data about people, demonstrating understanding of data structuring.
The model provided a nuanced explanation for a scenario involving John, Mark, a ball, a box, and a basket.
The model attempted a challenging task of generating sentences ending with the word 'Apple', but did not fully meet the criteria.
The model gave a correct and detailed explanation for the time it would take 50 people to dig a 10-ft hole.
Overall, the Kurasu fine-tuned version of the 8x22b model performed very well, showing potential for further improvements through fine-tuning.