* This blog post is a summary of this video.

Examining the Internet Access Capabilities of Top AI Chatbots

Table of Contents

Introduction to Testing AI Chatbots

AI chatbots have advanced tremendously in recent years, with the release of large language models like GPT-3 and ChatGPT. These chatbots promise the ability to have natural conversations and provide intelligent responses on a wide range of topics.

A key question around these chatbots is - can they access the internet to provide real-time, up-to-date information in response to questions? Or are they limited to the data they were trained on?

To test this, we conducted an experiment asking two simple questions to the top AI chatbots:

  1. Who won the Ashes 2023? (The Ashes is one of the most popular cricket tournaments between Australia and England)

  2. What's the release date of WordPress 6.3?

The goal was to see which chatbots could provide the correct, current answers by accessing the internet, versus those that were limited by their training data.

We tested a range of popular AI chatbots including different versions of ChatGPT, Google's Bard, Perplexity AI, Anthropic's Claude, and Meta's LLaMA.

AI Chatbots Tested

Here are the different AI chatbots we tested in this experiment:

  • ChatGPT 3.5: The original version of ChatGPT launched in November 2022. Free for all users.
  • ChatGPT 4 / ChatGPT Plus: Paid version launched in February 2023, meant to be more accurate.
  • ChatGPT Code Interpreter: Programming-focused version.
  • ChatGPT Plugins like WebPilot and Voxscript: Browser plugins that enhances ChatGPT's capabilities.
  • Google Bard: Launched in February 2023 as a ChatGPT rival.
  • Perplexity AI: Commercial AI service focused on knowledge search.
  • Anthropic Claude: Launched in December 2022, focused on transparency and safety.
  • Meta LLaMA 2: Large language model for dialog agents from Meta.

Overview of the Experiment

We asked two simple questions to each chatbot:

  • Who won the Ashes 2023? (A major cricket tournament)
  • What is the release date for WordPress 6.3? (Recent version of the popular CMS) We recorded whether the chatbots could provide the correct, up-to-date answers to these questions. Accessing current information would require the chatbot to search the internet in real-time.

Testing ChatGPT Models

We first tested different versions and configurations of ChatGPT with the two questions:

ChatGPT 3.5 Results

ChatGPT 3.5 was unable to answer either question correctly. It responded that is does not have enough context to answer these questions. This is expected, as the ChatGPT 3.5 model only has access to data up to September 2021 in its training set. So cannot access updated info.

ChatGPT 4 Results

Next, we tried ChatGPT Plus 4, which is the latest version of ChatGPT, released in February 2023. However, the responses were the same - ChatGPT 4 was also unable to answer the questions and cited lack of context and data.

ChatGPT Plugins Results

We tested two popular ChatGPT plugins that enhance its capabilities - WebPilot and VoxScript. WebPilot was able to correctly answer both the WordPress release date and the Ashes 2023 winner questions by accessing current information from the web. VoxScript was inconsistent - it was unable to answer the Ashes 2023 question but provided the accurate WordPress release date. So while plugins enable ChatGPT to search the internet for current information, the results can vary.

Testing Google Bard

Next up was Google's new Bard chatbot. We asked Bard the same two questions.

In our tests, Bard provided inconsistent results:

  • It was able to correctly state Australia as the Ashes 2023 winner in the first test.

  • But it provided incorrect WordPress release dates over multiple tries.

  • When we repeated the questions, it flipped - providing incorrect Ashes 2023 details but correct WordPress release date.

While Bard seems to have internet access based on the changing responses, it is clearly still very inconsistent in response accuracy.

Testing Perplexity AI

We also asked the Ashes and WordPress questions to the Perplexity AI service.

In the first test, Perplexity provided out of date and incorrect answers to both questions - claiming the Ashes was still ongoing and an old Wordpress version release date.

On repeating the questions, we received the same results again.

So Perplexity AI seems to have internet access to search for information, but serves irrelevant and inaccurate content.

Testing Anthropic Claude

Next, we tried Anthropic's Claude which markets itself as a safer AI assistant.

Claude responded that it does not have enough context to answer either the Ashes or WordPress release questions.

This indicates Claude does not have access to searching the internet for current information yet.

Testing Meta's Llama 2

Finally, we tested Meta's LLaMA 2 language model via the Perplexity AI service.

In both test cases, LLaMA 2 provided incorrect responses to the Ashes and WordPress questions.

It claimed England as the Ashes winner and an outdated WordPress release date.

Testing again yielded the same results.

So while LLaMA seems to be guessing or generating answers instead of truthfully saying it cannot answer accurately, leading to irrelevant responses.

Conclusion and Key Takeaways

Our testing revealed some key insights on AI chatbots' ability to access the internet:

  • Most cannot accurately answer questions requiring current information yet, like Claude and original ChatGPT versions.

  • Some like Perplexity, Bard and LLaMA try to guess answers instead of admitting ignorance.

  • Plugins like WebPilot enable ChatGPT to search the web for latest info.

  • But all chatbots' web-searched responses still contain inaccuracies.

As AI chatbots evolve, internet access will allow them to be more useful in providing updated responses. However currently they remain unreliable for certain types of factual questions. So human guidance is needed to enhance accuracy.

FAQ

Q: Which AI chatbots were tested in this experiment?
A: The AI chatbots tested were ChatGPT (GPT 3.5, GPT 4, and plugins), Google Bard, Perplexity AI, Anthropic Claude, and Meta's Llama 2.

Q: What were the two sample questions used to test internet access?
A: The two sample questions were: 1) Who won the Ashes 2023? 2) What's the release date of WordPress 6.3?

Q: Which AI chatbots demonstrated internet access capabilities?
A: Perplexity AI, Google Bard, and some ChatGPT plugins like Voxscript and Webpilot showed internet access capabilities.

Q: Were the AI chatbots with internet access able to provide fully accurate answers?
A: No, even the AI chatbots with internet access struggled to provide consistent and reliable answers to the sample questions.

Q: What was the overall conclusion of the experiment?
A: The conclusion was that while some AI chatbots can access the internet, their answers are still inconsistent and unreliable. Proper verification is still required rather than blindly relying on them.

Q: Which AI chatbot performed the best in the experiment?
A: Overall, Perplexity AI provided better performance than the others though it still missed key facts to the sample questions.

Q: What factors limited the other AI chatbots' capabilities?
A: ChatGPT models other than plugins were limited by the date cutoff of their training data, while Claude could not access the internet at all. Google Bard and Llama 2 provided inconsistent information even with internet access.

Q: What are the key takeaways from this experiment?
A: The key takeaways are: 1) Internet access does not guarantee accuracy, 2) Multiple AI tools should be tested before relying on any results 3) Human verification of facts is still essential.

Q: Would the AI chatbots' capabilities be expected to improve in future?
A: Yes, as the AI models get enhanced training over time and their access to internet resources expands, increased accuracy can be expected in the future.

Q: How frequently should the accuracy of AI chatbot capabilities be reassessed?
A: Accuracy and capabilities should be evaluated at least once every 3 months, or whenever a major change in training data, algorithms or architecture occurs for an AI model.