* This blog post is a summary of this video.

AI Showdown: Evaluating Top Chatbots on Image Recognition and Generation

Table of Contents

Comparing Chatbot Image Recognition Capabilities

In this blog post, we will compare the image recognition capabilities of several leading chatbots: ChatGPT, Bing Chat, Perplexity AI, Claude, Google Bard, and Pi from Anthropic. As language models designed primarily for text, their ability to directly analyze images varies.

We uploaded sample images to each chatbot and prompted them to describe the image content. The results revealed stark differences between these AI assistants when presented with visual inputs.

ChatGPT Lacks Direct Image Recognition

When asked if it can recognize images, ChatGPT responded that it does not have the capability to directly view or analyze images. As a text-based model without computer vision built-in, it cannot interpret or describe visual content.

Bing Provides Detailed Image Analysis

In contrast, Bing Chat has robust image recognition abilities. We uploaded a screenshot related to Chrome extensions, and Bing provided an accurate high-level description, correctly identifying the image's topic. It also generated relevant recommendations for top Chrome extensions in 2023 based on the visual prompt.

Perplexity AI and Claude Cannot Process Images

Both Perplexity AI and Claude lack direct image analysis skills. When we uploaded the same Chrome extension screenshot after converting it to a PDF, both chatbots responded that they cannot access or interpret the contents of the file. They are limited to text inputs only.

Google Bard Offers Mixed Image Recognition Accuracy

Google Bard has the ability to analyze images, but its interpretations were hit-or-miss. It correctly identified some basic elements like the iPhone and ChatGPT app shown in our test image. However, it also generated some inaccurate descriptions like a non-existent black shirt worn by the pictured person. So the quality of its image recognition remains unreliable.

PI Also Fails at Recognizing Images

Rounding out the chatbot lineup, Pi from Anthropic likewise failed when presented with an image to analyze. Without computer vision capabilities built-in, it does not have the capacity to interpret or describe visual content.

Examining Chatbot Image Generation Skills

In addition to image recognition, we also tested these chatbots on image generation by prompting them to create an original image of a night sky with a full moon. Once again, there was significant variation in their creative abilities when presented with this visual challenge.

ChatGPT Cannot Directly Create Images

Unsurprisingly, language model ChatGPT reported that it cannot directly generate images. With no incorporated computer vision or image synthesis skills, it can only respond to textual prompts and queries.

Bing Chat Generates Relevant Images from Prompts

Bing Chat stood out from its AI peers in being able to produce completely original images from text prompts. When asked to depict a night sky with a full moon, it successfully generated several on-target images showcasing creative scene generation.

Perplexity AI, Claude, Google Bard Lack Image Creation

Like ChatGPT, other text-focused models Perplexity AI, Claude, and Google Bard also fell short when prompted to synthesize new visual content. Without native image generation capabilities, they could not fulfill the request for original graphic depictions.

PI Falsely Claims Image Generation Ability

Anthropic's Pi also failed to actually produce images when asked. However, it falsely claimed to have image creation skills, initially responding that it could generate graphics before failing to follow through on the prompt. So its replies were misleading about its limitations in this area.

Bing Chat Emerges as Winner in Image Tasks

Based on these image recognition and generation tests, Microsoft's Bing Chat stands apart for its robust picture processing abilities. It proved adept at accurately describing image content and even synthesizing new graphics based solely on text prompts. The other contending chatbots remain far more constrained when presented with visual inputs.

FAQ

Q: Which chatbot is best at recognizing images?
A: Bing Chat provides the most detailed and accurate image recognition capabilities out of the chatbots tested.

Q: Can ChatGPT create images from text prompts?
A: No, as a language model ChatGPT does not have the capability to directly generate images.

Q: Does any chatbot reliably create images?
A: Yes, Bing Chat uses the DALL-E model to create relevant images from text prompts with a high degree of accuracy.