Grok-1 is Open Source | All you need to know!!!
TLDRElan M has open-sourced Gro, a 314 billion parameter AI model, larger than anticipated, with uncensored content under Apache 2.0. The pre-trained model, based on October 2023 training, requires significant storage and computational resources to run. It outperforms other models on various benchmarks and supports 8-bit quantization. Detailed instructions for running the model on GitHub are provided, emphasizing the community's code of conduct to 'be excellent to each other'.
Takeaways
- 🚀 Elan M has open-sourced Gro and its weights, releasing the large language model (LLM) that powers charts on the xplatform.
- 🌐 The weights were released as a torrent file, similar to the approach taken by other AI models.
- 📈 The model, named Grock 1, is a 314 billion parameter mixture of expert model, significantly larger than expected.
- 🗓️ The pre-trained model was trained in October 2023, and the release is of the base model, not the fine-tuned version.
- 📜 Grock 1 is released under the Apache 2.0 license, allowing for commercial use without censorship.
- 🔢 For each token, only two out of eight experts are active, resulting in approximately 86 billion parameters being active per token.
- 🧠 The model features 64 layers and 48 attention heads for queries, with an internal embedding size of 6,000.
- 🛠️ It utilizes rotary embeddings and supports 8-bit quantization as well as activation charging.
- 💾 To download the model, approximately 300 GB of storage space is required, and running it in 4-bits requires around 320 GB of VRAM.
- 🏢 Companies may find the model particularly useful due to its size and uncensored nature, which allows for further fine-tuning with good datasets for improved reasoning capabilities.
- 📊 On four benchmarks (GSM 8K, mmLUU, human evolve, and math), Grock 1 outperforms both the 270 billion parameter Lama model and GPT-3.5.
Q & A
What did Elan M announce a week ago?
-Elan M announced that they would open source Gro and release the weights of Grock 1, the LLM powering charts on the xplatform.
How were the weights of Grock 1 released?
-The weights were released as a torrent file, similar to how Mera AI released their model.
What is the size of the Grock 1 model?
-The Grock 1 model is a 314 billion parameter mixture of expert model.
When was the base model of Grock 1 trained?
-The base model was trained in October 2023.
What does the Grock 1 model's release under Apache 2.0 allow for?
-The release under Apache 2.0 allows for commercial use of the model.
What is the significance of the model's 64 layers and 48 attention heads for queries?
-The 64 layers and 48 attention heads for queries indicate the model's complexity and capacity for handling large-scale computations and understanding context.
How much storage space is needed to download the Grock 1 model?
-Around 300 GB of storage space is needed to download the model.
What hardware is recommended to run the Grock 1 model in 4-bits?
-To run the model in 4-bits, around 320 GB of VRAM is needed, and for 8-bits, a DGX H100 with eight H100s, each having 80 GB of VRAM, is recommended.
How can one access and run the Grock 1 model?
-The model can be accessed and run by cloning the released GitHub repository, installing the required packages with pip, and running the python run.py file after placing everything in the checkpoint directory.
What does the code of conduct advise?
-The code of conduct advises to 'be excellent to each other,' implying a collaborative and respectful approach to working with the model.
How does the Grock 1 model perform on benchmarks?
-The Grock 1 model outperforms the 270 billion parameter Lama model and GPT-3.5 on four benchmarks, including the GSM 8K, mmLUU, human evolve, and math.
What potential benefits does the Grock 1 model offer for companies?
-The Grock 1 model offers potential benefits for companies due to its large size and uncensored nature, which allows for further fine-tuning with good datasets to achieve better results in reasoning and other AI tasks.
Outlines
🚀 Open Sourcing of Gro and Grock 1 LLM
Elan M announced the open sourcing of Gro and its large language model (LLM), Grock 1, which was released as expected. The model, with 314 billion parameters, is a mixture of expert model, and the pre-trained base model was released under the Apache 2.0 license, allowing commercial use. The model was trained in October 2023 and is uncensored. Technical details include 64 layers, 48 attention heads for queries, an internal embedding size of 6,000, and support for 8-bit quantization and activation charging. The context window supports up to 8,000 tokens. To download and run the model, significant storage and computational resources are required, with 300 GB of storage space and 320 GB of VRAM for 4-bit operation, and a DGX H100 with 8 H100s, each with 80 GB of VRAM for 8-bit operation. A GitHub repository with a Jax example and instructions for running the model was provided, along with a unique code of conduct emphasizing mutual excellence.
Mindmap
Keywords
💡Open Source
💡Gro
💡Weights
💡Torrent File
💡Mixture of Expert Model
💡Pre-train Model
💡Apache 2.0
💡Commercial Purposes
💡GitHub Repo
💡Jax
💡Checkpoint Directory
Highlights
Elan M announced the open sourcing of Gro and it has been released today.
The release includes the weights of Grock 1, an LLM powering charts on the xplatform.
The weights are distributed as a torrent file, similar to the approach taken by Meta AI.
The model is a 314 billion parameter mixture of expert model, which is larger than expected.
The released model is the pre-trained version, trained in October 2023, and is uncensored.
The model is released under the Apache 2.0 license, allowing for commercial use.
Each token utilizes only two out of eight experts, resulting in approximately 86 billion parameters being active per token.
The model features 64 layers and 48 attention heads for queries, with an internal embedding size of 6,000.
The model uses rotary embeddings and rope tokenizer, similar in size to GPT-4.
Support for 8-bit quantization and activation charging is included in the model.
The maximum sequence length or context window is 8,000 tokens.
Downloading the model requires approximately 300 GB of storage space.
Running the model in 4-bits requires around 320 GB of VRAM.
For 8-bit operation, a DGX H100 with eight H100s, each having 80 GB of VRAM, is needed.
A GitHub repository has been released with a Jax example for running the model.
To run the model, clone the repo, install required packages, and ensure all files are in the checkpoint directory.
The model's code of conduct emphasizes being excellent to each other.
On four benchmarks, Grock 1 outperforms both the 270 billion parameter Lama model and GPT-3.5.
The model's large size and uncensored nature make it a potentially powerful option for companies with good datasets for further fine-tuning.