Voice Cloning in ElevenLabs vs. Descript

Excelerator
19 Oct 202307:25

TLDRThe video transcript explores voice cloning technology, comparing two platforms, ElevenLabs and Descript. It discusses the process of cloning a voice by uploading audio and the improvements made by both services. ElevenLabs now offers faster and better voice cloning, while Descript has simplified its process, requiring a minute of recording for setup. The video tests both by cloning the voice of 'Bob' and comparing the results, noting that while both have their strengths and weaknesses, they provide realistic AI voices with varying levels of success.

Takeaways

  • 🎤 Voice cloning technology allows users to record or upload audio for AI to learn their voice for future text-to-speech purposes.
  • 📱 ElevenLabs and Descript are two popular platforms offering voice cloning services, each with their own pricing and features.
  • 🚀 ElevenLabs has recently updated its voice cloning AI to be faster, easier, and of better quality.
  • 💰 To use voice cloning on ElevenLabs, a subscription of at least $5 per month is required.
  • 📊 For optimal results, ElevenLabs recommends uploading an audio file of at least one minute in length.
  • 🎧 After uploading, users can type text into the platform and generate audio that sounds as if they personally spoke the words.
  • 🌟 Descript's new AI speaker technology claims to significantly reduce the amount of audio needed for voice training, now requiring only a minute or two of recording.
  • 📝 Descript requires users to read a provided script for authorization and training purposes, indicating a focus on authenticity and voice match.
  • 🔄 Users may encounter limitations when uploading recordings that do not match the authorized script in Descript, emphasizing the platform's strict guidelines.
  • 💬 Both ElevenLabs and Descript have their strengths; ElevenLabs offers realistic AI voices at a low cost, while Descript provides additional features like video editing through text.
  • 🤔 The effectiveness of voice cloning technology is still evolving, with room for improvement in areas like natural speech patterns and emotional expressiveness.

Q & A

  • What is voice cloning and how does it work?

    -Voice cloning is a technology that allows users to record or upload audio of their voice, which is then learned by an AI system. This enables the AI to generate text-to-speech output in the user's voice, as if they had spoken the words at the time of playback.

  • What are the minimum audio length requirements for 11 Labs voice cloning?

    -11 Labs requires an audio file that is at least one minute long for voice cloning. They note that going over five minutes does not provide additional benefits.

  • What is the pricing for using 11 Labs' voice cloning feature?

    -To use the voice cloning feature in 11 Labs, a user must subscribe to at least a $5 per month plan.

  • How does the new voice cloning AI from 11 Labs differ from the previous version?

    -The new voice cloning AI from 11 Labs is stated to be faster, easier, and better in terms of performance compared to the previous version.

  • What is the process for using the voice cloning feature in Descript?

    -To use Descript's voice cloning, users need to record or upload a script that the AI uses for authorization and training. The AI speaker is then ready to use once the voice is authorized.

  • What was the previous requirement for recording audio for Descript's voice cloning?

    -Prior to the update, Descript required users to record or upload approximately 30 minutes of audio for the voice cloning process.

  • How long does it typically take for Descript to process the voice cloning?

    -After recording and uploading the audio, Descript's voice cloning is usually ready within 24 hours.

  • What issue was encountered when trying to upload a different recording to Descript?

    -The issue encountered was that the recording could not be over 2 minutes long, and it had to be the specific script provided by Descript for authorization and training. Any deviation from this requirement resulted in an error.

  • What are some of the unique features offered by Descript?

    -Descript offers unique features such as editing videos by editing text and an eye contact editing tool, among other functionalities.

  • What are the general impressions of the voice cloning technology in 11 Labs and Descript?

    -While 11 Labs may not perfectly replicate the user's voice, it can create very realistic AI voices at a low cost. Descript offers additional features beyond voice cloning, but the user found the voice output from both platforms to be usable and practical.

  • How can users access 11 Labs and Descript for further exploration?

    -Users can access 11 Labs and Descript by following the links provided in the description of the video, and they are invited to try out the platforms.

  • What is the affiliate relationship mentioned in the script?

    -The speaker of the script has an affiliate relationship with both 11 Labs and Descript, meaning that if a viewer makes a purchase through the provided links, the speaker may receive a small commission.

Outlines

00:00

🎤 Voice Cloning Technology and 11 Labs

This paragraph introduces the concept of voice cloning, which allows users to record audio or upload existing recordings for AI to learn their voice. The AI can then be used for text-to-speech, producing audio that mimics the user's voice. The paragraph focuses on testing the usability of this technology with 11 Labs, an app that has recently improved its voice cloning AI to be faster, easier, and better. The user experiences the process of uploading a 7-minute audio clip, naming the cloned voice, and generating synthesized speech. The initial results are promising, though there are some minor issues with pacing and emphasis. The paragraph also compares the old and new voice cloning processes, highlighting the improvements in speed and quality.

05:01

💬 Challenges and Comparisons with Descript's Voice Cloning

The second paragraph discusses the user's experience with Descript's voice cloning technology, highlighting the challenges faced when trying to upload a recording and the specific requirements for the training process. The paragraph compares Descript's method with 11 Labs, noting that Descript requires the user to read a provided script for authorization and training. Despite the initial issues with matching the authorization recording, the user attempts to use Descript's technology with a longer script. The summary points out the differences in the quality and naturalness of the synthesized voice between the two platforms. It concludes with the user's thoughts on the potential uses of voice cloning and an invitation for the audience to try both platforms and share their opinions. The user also mentions their affiliate status with both services, offering links for further exploration.

Mindmap

Keywords

💡Voice Cloning

Voice cloning is a technology that allows the creation of a digital replica of a person's voice using artificial intelligence. In the video script, voice cloning is central as it explores how ElevenLabs and Descript enable users to clone their voices for text-to-speech applications. The user uploads an audio sample, and the AI learns to reproduce the voice, allowing typed text to be spoken in the cloned voice, which can enhance accessibility or entertainment.

💡Text to Speech

Text to Speech (TTS) is a form of speech synthesis that converts written text into spoken words. This technology is highlighted in the video when discussing how once a voice is cloned, the user can type words which are then articulated using the cloned voice. This allows for seamless and personalized audio content creation, especially useful in digital media, assistive technologies, and more.

💡AI Speakers

AI Speakers refer to digital voices created by artificial intelligence that can speak text as if it were a real person. In the script, Descript's new AI speaker technology is tested, demonstrating how quickly and easily users can clone their voices, significantly improving the speed and quality from previous versions. This advancement illustrates the ongoing evolution and utility of AI in voice synthesis.

💡Authorization

Authorization in the context of voice cloning involves obtaining the legal right to use particular voice data for cloning purposes. The script mentions the need to confirm authorization, ensuring that users have the rights to the voices they are cloning. This is crucial to address privacy and consent issues in the use of personal and potentially sensitive data.

💡Speech Synthesis

Speech synthesis is the artificial production of human speech. In the video, this process is used after cloning a voice to generate spoken output from text. The video showcases how speech synthesis can be used creatively, such as in producing voiceovers for videos, highlighting its potential in diverse applications from entertainment to educational content.

💡Gaps

In speech synthesis, 'gaps' refer to pauses or spacing issues between spoken words or sentences. The script critiques the unnatural gaps that occur in the synthesized speech, indicating areas where the technology might still need refinement to achieve more natural-sounding voice output.

💡Waveform

A waveform is a visual representation of an audio signal showing the variation in amplitude over time. In the video, waveforms are mentioned as indicators that the voice cloning process is complete and the audio is ready to be played. This technical aspect is important for users to visually confirm the audio content before playback.

💡Subscription Plan

Subscription plan refers to the payment model required to access certain features in software applications. In ElevenLabs, voice cloning is available only on paid plans, not on the free version, as mentioned in the script. This reflects a common business model in software services where advanced features are monetized.

💡Ancient Olympics

The Ancient Olympics is used as an example in the voice cloning test in the script. A fun fact about athletes performing naked is narrated using the cloned voice to evaluate its naturalness and expressiveness. This use case shows how voice cloning can make educational content more engaging and accessible.

💡Gymnastics

Gymnastics, derived from Greek words meaning 'athletic training' and 'naked', is used in the script to assess the quality of speech synthesis. The example provides a context to test how well the AI handles complex and historically nuanced terms, evaluating the cloned voice's pronunciation and intonation.

Highlights

Voice cloning technology allows users to record or upload audio for AI to learn their voice for future text-to-speech purposes.

11 Labs and Descript are two popular platforms offering voice cloning services.

11 Labs has recently improved its voice cloning AI to be faster, easier, and better.

To use 11 Labs for voice cloning, a minimum of a $5 monthly plan is required.

For 11 Labs, an audio file of at least one minute is needed for voice cloning, with no significant benefit from longer recordings.

Once the voice is cloned in 11 Labs, users can type in text and generate audio that sounds as if the user spoke the words at that time.

Descript's new AI speaker technology claims to clone voices with just a minute of recording, significantly reducing the time and effort needed.

Descript requires users to read a provided script for authorization and training of the voice cloning AI.

The new voice cloning technology from Descript promises better quality than previous methods.

There are some issues with Descript's voice cloning, such as the inability to upload recordings not matching the provided script.

Both 11 Labs and Descript offer realistic AI voices, although 11 Labs is noted for its affordability.

Descript is known for its innovative features like editing video by editing text and its impressive eye contact technology.

The reviewer conducted tests with both platforms, comparing their voice cloning capabilities and ease of use.

The voice cloning results from both platforms have some minor issues, such as unnatural emphasis and long gaps between words.

Despite the limitations, both 11 Labs and Descript provide useful tools for voice cloning with their respective strengths.

The reviewer encourages viewers to share their thoughts on the voice cloning technology and try the platforms if interested.

The reviewer provides affiliate links to both 11 Labs and Descript for viewers to explore and potentially purchase services.