* This blog post is a summary of this video.

Examining Google's New AI Model Gemini: Capabilities and Comparison

Table of Contents

Introducing Gemini: Google's Latest AI Model

Google has officially launched its newest AI model called Gemini. Developed by DeepMind, Gemini is being positioned as Google's answer to models like GPT-4 from OpenAI. There has been lots of excitement and discussion around Gemini's capabilities across social media platforms.

Gemini is the 'most capable AI model' created by DeepMind so far. It has been designed from the ground up to handle multimodal data including text, images, video, audio and even code. This could give Gemini an edge over text-only models like GPT-4 in certain applications.

Gemini's Versions: Nano, Pro and Ultra

There are 3 versions of Gemini that have been announced - Gemini Nano, Gemini Pro and Gemini Ultra. Gemini Nano is meant for on-device deployment on phones or other mobile applications. Gemini Pro can be compared to GPT-3.5 Turbo in terms of capability. Gemini Ultra will be the full scale version to truly rival GPT-4, but it is not publicly available yet. Gemini Pro has already been integrated into Google's conversational AI agent called Bard. So Bard is getting its 'biggest upgrade yet' with Gemini under the hood. However, OpenAI also made big claims about ChatGPT surpassing human capabilities which did not fully materialize, so the actual performance of Bard+Gemini needs to be evaluated.

Built for Multimodality From the Ground Up

As mentioned earlier, Gemini has been designed for multimodal inputs like text, images, videos, audio and code. This is thanks to DeepMind's expertise in computer vision models like Flamingo. Flamingo has provided the visual encoders used in Gemini during pretraining. The fact that Gemini can understand multiple data formats could give it an edge over OpenAI's text-only models. However, multimodality also increases the difficulty of properly evaluating such a model. So it remains to be seen if Gemini lives up to its promise in real world usage across different tasks.

Key Technical Specifications and Benchmarks

While there is still much we don't know about Gemini's architecture and datasets, DeepMind has revealed some key technical details in their paper:

Context Length of 32k

Disappointingly, Gemini actually has a context length of only 32,768 tokens. This is far lower than models like Claude which can handle over 130,000 tokens of context. A longer context would have allowed Gemini to potentially achieve better performance on complex reasoning tasks. The relatively small context length seems to be due to DeepMind opting for a decoder-only architecture rather than an encoder-decoder. This choice improves simplicity but restricts context size.

Coding Ability Benchmarks

DeepMind evaluated Gemini Pro on benchmarks like HumanEval and natural language to code tests. On HumanEval, Gemini scored 74.4% which is better than GPT-3.5's 72% score. However, on natural code generation the gains were minor - achieving 74.9% vs GPT-3.5's 73.9%. More testing is certainly warranted, but it seems Gemini only offers modest improvements in coding ability over GPT-3 series for now. The Ultra version could potentially do much better when released.

How Gemini Compares to Models Like GPT-4

Comparison on Human Eval and Other Tests

As discussed in the previous section, Gemini Pro achieves slightly better scores than GPT-3 on tests like HumanEval and natural code generation. However, it's important not to read too much into these early benchmark results. The benchmarks used today have been around for years now with little modification. So models are potentially starting to 'overfit' on gaming these benchmarks without actual improvements in real world performance. We need more rigorous testing methodologies before declaring any AI model superior to others. Much more analysis is required before claiming Gemini overtakes GPT-4.

Gemini Ultra to Rival GPT-4 in the Future

DeepMind themselves acknowledge that the Gemini Ultra version scheduled for release next year will be the true competitor to GPT-4 in capability. Gemini Pro is more comparable to GPT-3 series rather than the latest GPT-4. For now, Gemini Ultra remains shrouded in mystery. But given the resources Google has poured into Gemini's development, Ultra certainly warrants anticipation and could set new records.

Training and Architecture Behind Gemini

DeepMind utilized their own TPUv4 and TPUv5 hardware to train Gemini models. Reinforcement learning from human feedback (RLHF) was used along with supervised learning to specialize and fine-tune Gemini.

The visual encoders in Gemini are derived from DeepMind's Flamingo model which was trained on a huge corpus of unlabeled images and videos from the web. This allows Gemini to excel in multimodal understanding compared to text-only systems.

While detailed information is lacking, DeepMind states that Gemini was pre-trained on massive datasets encompassing text, images, videos, audio and source code. This focus on multimodality is a key strength of Gemini versus other LLMs.

Final Thoughts and Excitement Around Gemini's Launch

There is tremendous interest and discussion happening around Gemini across social media and developer circles. Google is marketing Gemini as a major advance in conversational AI.

However, we advise tempering expectations until thorough, rigorous testing is conducted by third parties on real-world tasks. Progress in AI has taught us that marketing claims often don't translate neatly into practical gains.

Nonetheless, Gemini represents DeepMind and Google's full-fledged entry into the large language model race. Given the resources and talent behind it, Gemini is undoubtedly an impressive engineering feat and breakthrough in multimodal understanding. We eagerly look forward to witnessing Gemini's ongoing impact.

FAQ

Q: What company developed the Gemini AI model?
A: Gemini was developed by DeepMind, a Google owned AI company.

Q: What tasks is Gemini good at?
A: Gemini is built to handle multimodal data well, including text, images, video, audio and code.

Q: Is Gemini better than GPT-4?
A: It's unclear if Gemini is better than GPT-4 yet. Gemini Ultra may rival GPT-4 when it releases next year.

Q: What hardware was used to train Gemini?
A: Google Tensor Processing Units (TPUs) were used to train Gemini.

Q: How large is Gemini's context length?
A: Gemini has a context length of 32,000 tokens, which is smaller than some other recent models.

Q: Can I use Gemini now in apps like ChatGPT?
A: Not yet, but Google plans to integrate Gemini into products like Bard over the next year as the models are further tested.

Q: Is Gemini open source?
A: No, Gemini is a proprietary model owned by Google and DeepMind.

Q: What benchmarks was Gemini tested on?
A: Gemini was tested on benchmarks like human eval, machine learning understanding, and natural language to code.

Q: How did Gemini compare to GPT-3 in tests?
A: In some initial test cases, Gemini slightly outperformed GPT-3 but more testing is still needed.

Q: When will Gemini Ultra be released?
A: Google has stated Gemini Ultra will likely be released later in 2023 or 2024 for further GPT-4 comparison.