Why voice computers always fail

TechAltar
21 Nov 202317:07

TLDRThe video script discusses the struggles of voice-based computing platforms like Microsoft's Cortana and Amazon's Alexa, which have failed to meet expectations both financially and in consumer adoption. It introduces new devices from companies like Humane and Meta, highlighting their advanced features such as generative AI, built-in cameras, and wearable design. However, the script critically examines the claim that these devices could replace smartphones, pointing out the limitations of voice interfaces for complex tasks and privacy concerns. The video suggests that voice AI is better suited as a supplementary feature rather than a primary computing interface.

Takeaways

  • 📉 The voice-based computing industry has faced significant challenges, with platforms like Microsoft's Cortana shutting down and Amazon's Alexa incurring substantial losses.
  • 🚫 Despite massive investments and a decade of development, tech giants have yet to monetize voice-based platforms effectively or convince consumers to use them for complex tasks.
  • 🌟 New companies like Humane and Meta are emerging with claims that they will revolutionize voice computing and deliver on its long-awaited potential.
  • 🔄 The new generation of voice AI includes generative AI for smarter responses, built-in cameras for computer vision interactions, and wearable designs for reduced friction in use.
  • 📱 Humane AI's pin and Meta's Ray-Ban glasses propose to replace traditional smartphones, but the concept faces skepticism due to practical limitations.
  • 🖥️ Setting up and managing complex tasks on voice-only devices is currently impractical, necessitating real screens and precise input methods.
  • 🔍 Voice computing lacks privacy, is inefficient for complex data input or review, and is generally less precise than text-based inputs.
  • 🚷 Public use of voice interfaces raises concerns about annoyance and privacy invasion as others may overhear personal information.
  • 🛍️ Shopping, finance, and productivity apps that rely on visuals and precise inputs are not well-suited for voice-only interactions.
  • 🎶 Existing voice assistants are already adept at simpler tasks like music playback and smart home control, questioning the need for a complete voice-based platform.
  • 💡 Alternative solutions for voice computing include integrating it as an additional feature to existing devices or enhancing it with screens and precise input methods.

Q & A

  • What happened to Microsoft's Cortana and Amazon's Alexa in terms of their performance in the voice-based computing market?

    -Microsoft's Cortana flopped and was shut down completely without a direct replacement, while Amazon's Alexa faced significant financial losses, amounting to $1 billion per year.

  • What was Jeff Bezos' vision for Alexa in the voice computing segment?

    -Jeff Bezos personally insisted on winning the voice computing segment and turning Alexa into a major platform to rival smartphones and computers.

  • How has Google's platform fared in the voice assistant market?

    -Google's platform has seen smaller layoffs publicly, but it has lost a lot of its initial momentum, with neither Google nor its hardware partners releasing new devices dedicated to the Google Assistant in almost three years.

  • What are the three major upgrades that the new generation of voice-based computers, like Humane AI and Meta Ray-Bands, have over previous voice assistants?

    -The new generation has real generative AI built in, making them smarter and more flexible; they have a camera for advanced computer vision to analyze what the camera sees; and they are wearable and always on, reducing the friction of interaction.

  • How does the AI in the Humane AI pin and Meta Ray-Bands analyze visual information?

    -The AI can analyze visual information through the built-in camera, enabling interactions like estimating the sugar content of fruit for diabetics or checking the protein content in food items.

  • What is the main criticism against the idea of voice-based computing platforms replacing smartphones, as proposed by companies like Humane?

    -The main criticism is that voice AI is not suitable as the primary interface for general computing needs, as it is not practical for many tasks such as managing emails, using social media, handling finance, and more, which require precise inputs and visual interfaces.

  • What are the three fundamental shortcomings of voice as an interface for computing?

    -Voice interfaces have issues with privacy and annoyance in public, they are slow one-way communication lanes for computers, and human speech is often incoherent and imprecise for precise input tasks required by computers.

  • What are the two potential solutions proposed for the problems faced by voice-first interfaces?

    -One solution is to accept voice as a cool addition to existing devices rather than a replacement. The other is to add a good screen and precise input methods to voice-first devices, essentially reinventing the smartphone with voice capabilities.

  • What is the main argument for voice AI being integrated as an accessory rather than a primary interface?

    -Voice AI is best suited as an accessory because it is not practical for the majority of computing tasks due to the nature of voice communication being less precise and more cumbersome for complex tasks compared to visual and touch interfaces.

  • How does the speaker suggest the use of voice AI in the context of the Meta Ray-Bands and Microsoft's HoloLens demo?

    -The speaker suggests that voice AI is well-suited as an additional capability for devices like the Meta Ray-Bands and Microsoft's HoloLens, which serve as extensions of smartphones and provide new functionalities rather than trying to replace them entirely.

  • What is the alternative recommendation provided by the speaker for the holiday shopping list instead of the Humane AI pin?

    -The speaker recommends an iFixit set for the holiday shopping list, which allows people to fix their existing devices instead of buying new ones, reducing e-waste and giving control over their gadgets.

Outlines

00:00

📉 The Struggles of Voice Computing Platforms

This paragraph discusses the challenges faced by major tech companies in the voice computing sector. It highlights the failure of Microsoft's Cortana, which led to its complete shutdown, and Amazon's Alexa, which suffered significant financial losses. The paragraph also mentions Google's less publicized layoffs and Apple's Siri, which has seen minimal updates. Despite improvements over time, the voice assistant category has been a major disappointment, with no clear path to profitability or complex consumer interaction.

05:01

🚀 The Next Wave of Voice Computing

The paragraph introduces a new generation of voice-based computing devices, such as those from Humane and Meta, which promise significant upgrades over old voice assistants. These devices feature generative AI for smarter responses, built-in cameras for advanced computer vision, and are wearable for constant accessibility. The speaker expresses excitement about the potential benefits these advancements could bring, especially for those with vision impairments or the elderly. However, concerns are raised about the companies' marketing strategies, which seem disconnected from practical realities.

10:02

📱 The AI Pin: Hype vs. Reality

This section critiques the AI pin by Humane, which is marketed as a revolutionary device and a potential replacement for smartphones. The paragraph points out the impracticality of using voice AI as the primary interface for complex computing tasks. It argues that setting up the device, managing privacy, and performing tasks like controlling cameras, using social media, viewing photos, handling finance, and productivity work are either impossible or highly impractical with voice commands alone. The speaker asserts that voice computing is not suitable for most smartphone needs and suggests that the AI pin is not a practical addition to one's holiday shopping list.

15:02

🛠️ Empowering Repair and Sustainability

The final paragraph shifts focus from voice computing to promoting repair and sustainability through iFixit's Black Friday and holiday deals. It emphasizes the value of repairing existing devices rather than purchasing new ones, reducing e-waste, and empowering consumers to maintain control over their gadgets. The speaker recommends iFixit's repair kits, which include high-quality tools and resources for a wide range of devices. The paragraph highlights iFixit's commitment to the right to repair and its role in providing a practical and environmentally friendly alternative to discarding broken electronics.

Mindmap

Keywords

💡Voice-based Computing

Voice-based Computing refers to the use of voice commands to interact with computers or devices. In the context of the video, it discusses the struggles and limitations of this technology, highlighting how major tech companies have not been able to make it a profitable or widely adopted platform. The video mentions products like Microsoft's Cortana, Amazon's Alexa, Google Assistant, and Apple's Siri, which have seen limited success in this domain.

💡Generative AI

Generative AI refers to artificial intelligence systems that can create new content or data based on patterns they have learned. In the video, it is mentioned as a key feature of new voice-based computing devices, suggesting that these systems can understand and respond to more complex and open-ended queries, such as playing songs from a specific genre or movie.

💡Computer Vision

Computer Vision is a field of artificial intelligence that enables computers to interpret and understand visual information from the world, such as images or videos. In the context of the video, it is discussed as a capability that allows voice-based devices to analyze what their cameras see, enabling interactions like estimating the sugar content of food for diabetics or checking the grilling time for food.

💡Wearable Technology

Wearable technology refers to electronic devices or gadgets that are designed to be worn on the body. The video discusses how new voice-based computing devices are wearable and always on, suggesting that this feature reduces the friction of interaction and could make these devices more accessible and convenient for users, especially those with vision impairments or the elderly.

💡Smart Home Devices

Smart Home Devices are appliances, systems, or equipment that are integrated with advanced technology and can be controlled remotely or automatically. In the video, it is mentioned that existing voice assistants are already good at controlling smart home devices, indicating that this is a practical application of voice-based computing.

💡Privacy Concerns

Privacy Concerns refer to the potential risks or issues related to the unauthorized collection, use, or disclosure of personal information. The video raises privacy as a significant issue with voice-based computing, especially when it comes to speaking sensitive information aloud in public or having others overhear private conversations or data.

💡User Interface

User Interface (UI) is the space where interactions between humans and machines occur, including the design of screens, buttons, and the way users navigate and control a system. The video discusses the limitations of voice as a primary user interface, arguing that it lacks the precision and efficiency of visual interfaces and input methods like typing.

💡Productivity

Productivity refers to the efficiency and effectiveness with which tasks are completed. The video argues that voice-based computing is not suitable for productivity tasks, such as managing emails, editing documents, or handling spreadsheets, due to the lack of precision and the slow, one-way nature of voice communication.

💡Smartphones

Smartphones are mobile devices that combine the functions of a phone with those of a computer, offering a wide range of features and applications. The video discusses the idea that voice-based computing devices could replace smartphones, but ultimately argues against this notion, citing the limitations and practicality of voice interfaces for the complex tasks that smartphones are used for.

💡iFixit

iFixit is a company that provides tools, parts, and guides for repairing various electronic devices. In the video, iFixit is mentioned as an alternative to purchasing new technology, promoting the idea of repairing existing devices instead. This approach aligns with the company's mission to reduce e-waste and empower consumers to maintain control over their gadgets.

💡Right to Repair

Right to Repair is a movement advocating for the legal right of consumers to repair their own electronic devices. The video references iFixit's role in supporting this movement, emphasizing the importance of allowing individuals to fix their devices instead of constantly buying new ones, which contributes to sustainability and consumer empowerment.

Highlights

Voice-based computing has faced challenges, with platforms like Microsoft's Cortana shutting down and Amazon's Alexa incurring significant losses.

Despite massive investments, tech giants have struggled to monetize voice-based platforms and expand their use beyond basic tasks.

New companies like Humane and Meta are emerging with claims of revolutionizing voice computing and creating the interface of the future.

The new generation of voice AI includes generative AI, allowing for more flexible and less specific commands.

Devices now incorporate advanced computer vision to analyze and interact with the environment through built-in cameras.

Wearable voice AI devices are designed to be always on, reducing friction and enabling constant interaction.

The AI pin from Humane aims to replace smartphones, running on an Android-based OS and having its own cellular connection.

The AI pin's interface relies heavily on voice, with a very basic projector for visual interaction.

Voice AI has limitations, such as being unsuitable for tasks requiring privacy, complex inputs, or visual elements.

The idea of voice being the primary interface for general computing is considered unrealistic and impractical.

Voice is not suitable for public use due to privacy concerns and the annoyance it may cause to others.

Voice input is a slow, one-way communication method compared to the efficiency of visual interfaces.

Human speech is often incoherent and imprecise, making it a poor method for precise computer inputs.

Voice-first interfaces may work for simple tasks but are not practical for complex computing needs.

Rayband glasses from Meta serve as an accessory to smartphones, offering voice capabilities without replacing the device.

Microsoft's HoloLens demo showcased voice-controlled interfaces in industrial scenarios, providing additional capabilities rather than replacing existing tools.

The concept of voice AI is promising, but its application as a replacement for smartphones is misguided.

iFixit offers Black Friday deals for repairing existing devices, promoting sustainability and self-reliance in tech repair.

iFixit provides high-quality tools and repair guides, empowering users to fix their gadgets and reduce e-waste.