My Top 5 Open Source Text to Speech Softwares Starting off in 2024
TLDRIn this video, the creator reviews the top five open-source text-to-speech software options, sharing audio samples to demonstrate their capabilities. The softwares discussed include Sun's Bark, Valley X, Whisper X, Style TTS 2, and the favored Tortois TTS. The creator highlights the strengths and limitations of each, noting that while some produce good outputs, none are perfect. Tortois TTS stands out for its quality and the potential to improve with additional training and techniques like RVC, offering a more natural and accurate voice output.
Takeaways
- 🔊 The video discusses the top five open-source text-to-speech (TTS) softwares encountered over the past year.
- 🎤 The first TTS software mentioned is Sun's Bark, which gained attention but hasn't been updated recently.
- 🗣️ Valley X is another TTS software capable of voice cloning, but it's not considered the best due to its robotic tone.
- 🌟 The narrator tests various TTS softwares, including using a preset voice from Elden Ring and a Japanese voice.
- 🎧 Style TTS 2 is highlighted for its fantastic voice quality and active community support.
- 📝 K TTSs xtts is noted for its speed and quality, using HiFi GAN and a different training approach.
- 🇯🇵 The narrator demonstrates the TTS software's ability to handle Japanese language, though pitch accuracy is a challenge.
- 🥇 Tortois TTS is the narrator's favorite, offering the best quality and experience, especially when combined with RVC (Real Voice Cloning).
- 📚 The video includes a demonstration of generating an audiobook using Tortois TTS with RVC.
- 📌 The narrator emphasizes the importance of voice training for better TTS output quality.
- 📢 The video concludes with a recommendation for Tortois TTS and a call to action for viewers to share their thoughts in the comments.
Q & A
What is the first open-source text-to-speech software mentioned in the transcript?
-The first open-source text-to-speech software mentioned is Sun's bark.
What feature of Sun's bark caught attention when it was released?
-Sun's bark caught attention due to its curated example of a British accent voice.
How does the speaker describe the performance of Sun's bark?
-The speaker describes Sun's bark as good but not the best, and mentions that it hasn't been updated much as the focus shifted to their chirp feature.
What is Valley X's capability in relation to text-to-speech?
-Valley X can do voice cloning and is decent in producing text-to-speech outputs, though not always accurate.
How does the speaker evaluate the quality of the synthesized voice from Whisper X?
-The speaker finds the synthesized voice from Whisper X to be robotic and not very accurate.
What is the speaker's opinion on Style TTS 2?
-The speaker hasn't trained any voices with Style TTS 2 but finds the out-of-the-box voice to be fantastic.
What makes XTTS stand out according to the speaker?
-XTTS is considered better than bark and zuno, with the speaker noting its quality and the fact that it uses HiFi Gan instead of diffusion for the male spectrogram.
Why does the speaker prefer Tortoise TTS?
-The speaker prefers Tortoise TTS due to their experience with it, its high quality, and the ability to maintain good intonation when used with RVC.
How does the speaker enhance the quality of Tortoise TTS?
-The speaker enhances the quality of Tortoise TTS by using it in conjunction with RVC, which improves the voice's naturalness and intonation.
What is the speaker's overall recommendation for the best open-source text-to-speech software?
-The speaker's overall recommendation is Tortoise TTS, especially when combined with RVC for voice cloning.
Outlines
🗣️ Open-Source Text-to-Speech Software Overview
The paragraph introduces five top open-source text-to-speech (TTS) software options encountered over the past year. It begins with a comparison of the author's voice with a British accent and their actual voice. The first software discussed is Sun's Bark, which gained attention but hasn't seen updates as the developers focus on a feature called Chirp. The author also mentions Valley X, which can clone voices but is not the best option. The discussion includes audio samples from different TTS software, including a comparison between pre-made and cloned voices, with a focus on the quality and accuracy of the outputs.
🎤 Detailed Reviews and Voice Cloning
This paragraph delves into the specifics of each TTS software. It starts with a brief mention of Whisper X and its robotic output. The author then discusses Style TTS 2, highlighting a fantastic voice sample provided by a Discord member. The paragraph continues with Kttss XTTS, which the author has fine-tuned, and Tortoise TTS, which is the author's favorite due to their extensive experience with it. The author emphasizes the quality of Tortoise TTS, especially when combined with RVC (Rhythmic Voice Cloning), and shares their enthusiasm for AI voice cloning. The paragraph concludes with a comparison of the native output and the improved output after RVC processing.
Mindmap
Keywords
💡Open-source text-to-speech software
💡Voice cloning
💡Hugging Face
💡Style TTS 2
💡XTTS
💡Tortois TTS
💡RVC (Real Voice Cloning)
💡HiFi-GAN
💡Diffusion
💡Audiobook generation
Highlights
Sun's bark is an open-source text-to-speech software that gained attention for its quality and capabilities.
Sun's bark has not been updated recently, as the focus shifted to the chirp feature.
Valley X is another text-to-speech software capable of voice cloning, but it's not the best and can be robotic.
The speaker tested Valley X with their own voice and found it decent but not perfect.
The speaker used a preset voice from Elden Ring, Melina, for a sample.
A Japanese voice preset was tested, and the speaker found it close to the original voice.
Style TTS 2 was praised for its fantastic voice quality, even without personal training.
The speaker plans to train their own voice with Style TTS 2 and share the results.
K TTSs XTTS was tested and found to be faster and of good quality.
XTTS is considered better than Sun's bark and Valley X, with HiFi GAN technology.
Tortoise TTS is the speaker's favorite, offering the best quality and experience.
Tortoise TTS, when pipelined with RVC, produces high-quality voice cloning.
The speaker has extensive experience with Tortoise TTS and recommends it for its intonation and quality.
The speaker used Tortoise TTS to create an audiobook with their voice clone.
The speaker also tested a Lex Freedman voice clone with Tortoise TTS, showcasing its versatility.
The speaker concludes that Tortoise TTS is the best among the tested open-source text-to-speech softwares.
The speaker encourages viewers to share their thoughts and ask for more details in the comments.
The video is the first of the year 2024 for the speaker's channel, marking a new beginning.