Put FACES to your NotebookLM AI Podcast audio! 2 methods!

Bob Doyle Media
21 Sept 202413:45

TLDRIn today's video, the creator discusses how to enhance AI-generated podcasts using Notebook LM from Google. They demonstrate two methods to add realistic faces to the podcast's audio, making it stand out. First, they use Audacity to separate the audio into male and female tracks. Then, they animate these voices using Hedra's facial animation platform. Finally, they experiment with Live Portrait technology to overlay the AI-generated animations onto more realistic-looking people, enhancing the podcast's appeal.

Takeaways

  • 😀 The video discusses two methods to add faces to AI-generated podcast audio using NotebookLM from Google.
  • 🎤 The first method involves using a transcription service like Otter to create a knowledge base for the AI chatbot.
  • 🎧 The AI can generate a podcast from the uploaded knowledge base, which can then be edited and separated into different voice tracks.
  • 👥 Audacity, a free audio editor, is used to split the audio into male and female voice tracks for individual processing.
  • 💾 The separated audio tracks are exported individually to be used for facial animation.
  • 🎭 Hedra is introduced as a facial animation platform that animates a face based on the uploaded audio file.
  • 🖼️ An image is created in Hedra and then animated with the uploaded audio to produce a talking head video.
  • 🎥 The video is then edited to place the animated faces side by side, simulating a conversation.
  • 🤖 Live portrait technology is used to drive the facial animation of a still image or video with the movement from Hedra's animation.
  • 🔄 The process involves reversing and duplicating video clips to create a loop that matches the audio length.
  • 🗣️ Voice conversion is performed using services like 11 Labs to give the AI-generated voices a more human and varied sound.

Q & A

  • What is Notebook LM?

    -Notebook LM is a service from Google that allows users to upload their own knowledge base in text form, such as a book or a course, and it creates an interactive chatbot that can converse about the uploaded material.

  • How can Notebook LM be used to create a podcast?

    -Notebook LM can generate an AI podcast where two AI voices discuss the uploaded information in an excited manner, similar to a podcast format.

  • What is the purpose of the video mentioned in the transcript?

    -The purpose of the video is to demonstrate how to enhance AI-generated podcasts by adding visual elements, such as faces to the voices, and to show how to change the voices of the podcast.

  • How does the process of creating a knowledge base for Notebook LM work?

    -The process involves uploading text-based material to Notebook LM, which then creates a chatbot that can discuss the material. In the transcript, the creator made a knowledge base by brain-dumping information about the Bob Doyle Channel using Otter transcription service.

  • What is the role of Audacity in the podcast production process described?

    -Audacity is used as an audio editor to separate the AI-generated podcast voices into two separate tracks, making it easier to edit and prepare the audio for further processing.

  • What is Hedra and how is it used in the video?

    -Hedra is a facial animation platform that animates a face realistically based on an uploaded audio file. In the video, it is used to create animated faces that correspond to the separate audio tracks of the podcast.

  • How does the process of facial animation with Hedra work?

    -Hedra works by uploading an audio file and then animating a face, which can be created or uploaded as a photograph, to match the audio. The platform allows for customization and stylization of the face.

  • What is Live Portrait technology and how is it applied in the video?

    -Live Portrait technology is used to drive the facial animation of a still image or video using a video of a face talking or making expressions. In the video, it is applied to make more realistic-looking people react to the AI-generated podcast audio.

  • Why is it important to trim the video clips before using Live Portrait technology?

    -It is important to trim the video clips to remove any unnecessary movements, like smiling with teeth, because Live Portrait technology works best when there is minimal existing movement in the video that it is driving.

  • What is the final step described in the transcript for enhancing the AI podcast?

    -The final step is to use voice conversion software, like 11 Labs, to change the voices in the AI podcast to make them sound more unique and less like the default AI voices.

  • How does the process of voice conversion in 11 Labs work?

    -In 11 Labs, the original audio file is uploaded, and then a new voice is selected from a list of available options. The platform generates a new audio file with the converted voice, which can then be used in the final podcast production.

Outlines

00:00

🎙️ Creating AI Podcasts with Google's Notebook LM

The video begins with the creator admitting they forgot to hit record, so the initial minutes will be fabricated. The main focus is on Google's Notebook LM, a service that lets users upload a knowledge base in text form, such as a book or a course, and generate an interactive chatbot. A standout feature is the ability to create AI-generated podcasts with two people discussing the uploaded material in an excited manner, similar to a real podcast. The creator demonstrates how to enhance these podcasts by adding visual elements to the audio files. They create a sample notebook on Notebook LM called 'Bob Doyle Media' by using Otter transcription service to describe the channel's content. This information feeds the chatbot, which then generates a podcast. The creator shows a brief clip of the generated podcast and discusses the process of editing the audio file in Audacity, a free and open-source audio editor, to separate the voices into different tracks for further enhancement.

05:02

👥 Adding Faces to AI Podcast Voices with Hedra

The creator describes how they used Hedra, a facial animation platform, to animate a face based on the audio file of the woman's voice from the podcast. They created an image of a woman in a podcast booth and uploaded the woman's audio file to Hedra, which generated a video of the animated face. The same process was applied to the man's voice, resulting in an animated video of him. The creator then combined these two videos side by side using an editor, creating a visual representation of the AI-generated podcast. They also experimented with using live portrait technology to animate still images or videos with more realistic-looking people, using the Hedra-generated animations as drivers for the facial movements. The process involved finding suitable video clips of people with minimal facial movement, which were then used to drive the facial animations created by Hedra. The creator emphasizes the use of free and open-source tools to achieve these effects.

10:03

🔊 Voice Conversion for AI Podcasts with 11 Labs

In the final paragraph, the creator discusses the idea of voice conversion to further enhance the AI-generated podcasts. They mention that while the default voices are friendly, they can start to sound alike if overused. To address this, they use a service like 11 Labs, which offers speech-to-speech conversion. The woman's voice from the podcast is converted to a different voice with an accent, while the man's voice is changed to a more distinct alternative. The creator demonstrates the process of uploading the original audio files, selecting new voices, and generating the converted speech. They then import these new audio files into their video project, mute the original tracks, and play the converted voices to show the final result. The creator encourages viewers to subscribe to the channel for more content on AI and creative techniques, ending the video with a humorous note about the pursuit of subscribers.

Mindmap

Keywords

💡NotebookLM

NotebookLM is a service from Google that allows users to upload their own knowledge base in text form, such as books or courses, and create an interactive chatbot that can discuss the uploaded material. In the video, it is used to generate an AI podcast featuring two people discussing the uploaded information in an excited and engaging manner, similar to a real podcast.

💡AI-generated podcast

An AI-generated podcast is a digital audio file created by artificial intelligence that simulates a conversation between two or more people discussing a particular topic. In the context of the video, the AI podcast is generated using NotebookLM and is intended to be similar in format to a traditional podcast but with the unique twist of AI-generated dialogue.

💡Otter transcription service

Otter is a transcription service that converts spoken language into written text. In the video script, the creator uses Otter to create a knowledge base for NotebookLM by doing a 'brain dump' about the Bob Doyle Channel, which involves speaking about the channel's content and purpose.

💡Audacity

Audacity is a free, open-source audio editing software that allows users to manipulate audio files, including editing, cutting, and splitting tracks. In the video, Audacity is used to separate the AI-generated podcast into individual tracks for each speaker, making it easier to edit and prepare for further processing.

💡Facial animation

Facial animation refers to the process of creating or manipulating visual representations of faces, particularly in the context of moving images or videos. In the video, facial animation is achieved using a platform called Hedra, which animates a still image or video based on the audio file, creating a more engaging and realistic visual representation of the AI podcast.

💡Live portrait technology

Live portrait technology is a type of software that allows for the facial animation of a still image or video to be driven by a video of a real person's face. In the video, this technology is used to create more realistic facial animations for the AI podcast by using videos of real people's faces as the driving force behind the animation.

💡Voice conversion

Voice conversion involves changing the characteristics of a voice in an audio file to make it sound like a different speaker. In the video, voice conversion is done using a service like 11Labs to give the AI-generated voices a more distinct and varied sound, making the podcast more engaging and less repetitive.

💡Hedra

Hedra is a facial animation platform mentioned in the video that allows users to upload an audio file and animate a face realistically. It can create animations based on either a created or uploaded photograph, with the option to stylize the animation. The platform is used to animate still images with the audio from the AI-generated podcast.

💡Comfy UI

Comfy UI is a user interface mentioned in the video that is used in conjunction with live portrait technology. It is a tool that helps in the process of applying facial animations to still images or videos, allowing for the creation of more realistic and dynamic visual content.

💡Digital avatars

Digital avatars are virtual representations of people that can be used in various digital media, including podcasts and videos. In the video, the creator discusses the option of using third-party services that specialize in creating digital avatars for a similar effect as the facial animation techniques described.

Highlights

Introducing a method to add faces to AI-generated podcast audio using NotebookLM from Google.

NotebookLM allows uploading a knowledge base to create an interactive chat bot.

AI-generated podcasts can mimic a conversational format similar to human-hosted podcasts.

The process involves downloading the AI-generated podcast audio and editing it in Audacity.

Separating voices into different tracks to prepare for facial animation.

Using Hedra, a facial animation platform, to animate a face with the uploaded audio file.

Creating a realistic animated face that reacts to the audio while maintaining a podcast atmosphere.

Combining the animated faces with the original audio to produce a video.

Experimenting with Live Portrait technology to drive facial animations with a still image or video.

Using AI animation services to create a video of a person sitting at a microphone with minimal facial movement.

Applying the facial animation from Hedra to more realistic-looking people in a video.

The importance of selecting a video with minimal facial movement for effective Live Portrait technology.

Editing the video to create a loop that can be driven by the facial animation.

Using Comfy UI and Live Portrait workflow to apply the facial animation to a video.

The process of voice conversion using platforms like 11 Labs to give the AI-generated voices a human touch.

Selecting different voices for the AI-generated audio to avoid sounding generic.

Incorporating the converted voices back into the video project to complete the AI podcast with faces.

Encouraging viewers to subscribe to the channel for more innovative uses of AI.