Create AI Talking Avatar On Phone FREE - Google Colab - D-ID Alternative

Bahaa AI
26 Aug 202305:24

TLDRIn this tutorial, learn how to create a free AI talking avatar on your phone as an alternative to D-ID. Start by selecting a clear, high-quality photo with the face looking towards the camera. Utilize the Motion Portrait app to animate the face, allowing you to adjust expressions and record a 15-second video. Resize the video using easygif.com for faster processing. For the voiceover, use 11labs.io to convert text into a female voice audio file suitable for your avatar. Combine the video and audio using Google Colab, following a step-by-step process that includes uploading files and generating the final lip-synced video. The result is an engaging talking avatar that moves its eyes, lips, and expresses emotions without the need for paid services.

Takeaways

  • 🖼️ Start by selecting a clear, high-quality photo of a face looking at the camera.
  • 📱 Use the Motion Portrait app, available for free on Google Play Store and App Store, to animate the face.
  • 👀 The app automatically detects eyes and lips and allows you to adjust facial expressions.
  • 🎥 Record a 15-second video using the app, which will then be saved in your phone's gallery.
  • 📏 Resize the video to 720p resolution using a tool like easygif.com for better compatibility.
  • 🗣️ For voiceover, visit 11labs.io and use their speech synthesis feature to convert text into a female voice audio file.
  • 🔗 Download the voiceover file to your phone for later use.
  • 💻 Use Google Colab to combine the resized video with the voiceover for lip syncing.
  • 🔍 Follow the steps in Google Colab to upload and combine the video and audio files.
  • ⏳ Be patient during the video creation process as it takes time depending on the length of the voiceover.
  • 🎬 Once done, you can preview and download the final talking avatar video.
  • 📋 Disconnect and delete the runtime on Google Colab after you have finished to avoid unnecessary usage.

Q & A

  • What is the purpose of using stable diffusion on your phone?

    -Stable diffusion is used to create a high-quality image of a face that will be made to speak. It's a process that can be done on a phone and is used to prepare the face for further manipulation in the Motion Portrait app.

  • How can one obtain the stable diffusion image for free?

    -The user can refer to a previous video on the same channel that explains how to use stable diffusion on a phone for free.

  • What are the conditions that the face in the photo must meet for the Motion Portrait app to work properly?

    -The face in the photo must be clear and of high quality, with the face and eyes looking directly at the camera.

  • What is the Motion Portrait app and where can it be downloaded from?

    -The Motion Portrait app is a free application available on both the Google Play Store and the App Store, which can animate a still image of a face by tracking and moving the eyes and lips.

  • How long is the video that the Motion Portrait app creates?

    -The app creates a video that is 15 seconds long.

  • Why is it necessary to resize the video created by the Motion Portrait app?

    -The video resolution is initially too large, which can make the next steps take longer. Resizing the video to a smaller resolution, such as 720 pixels in width, makes the subsequent editing process more manageable.

  • What website is used to resize the video?

    -Easygif.com is used to resize the video by adjusting its resolution.

  • How is the voiceover for the talking avatar created?

    -The voiceover is created using 11labs.io, where text is written, a character voice is selected, and the text is converted into an audio file.

  • What is the final step in creating the talking avatar video?

    -The final step involves using Google Colab to combine the resized video file with the voiceover audio file, creating a lip-synced talking avatar video.

  • How long does it take to create the final video in Google Colab?

    -The process takes time depending on the duration of the voiceover file used. Users must wait until the process is finished without interrupting it.

  • What should be done after the final video is created in Google Colab?

    -After the final video is created and downloaded, the runtime on Google Colab should be disconnected and deleted to free up resources.

  • Where can the final video be found after it is downloaded from Google Colab?

    -The final video can be found in the user's phone's media gallery.

Outlines

00:00

📱 Creating a Talking Avatar with Motion Portrait App

The first paragraph outlines the process of creating a talking avatar using a clear, high-quality photo of a face looking into the camera. It introduces the use of the Motion Portrait app, available for free on both Google Play Store and App Store, to animate the face. The app automatically detects eyes and lips, and allows users to manipulate facial expressions. It also guides on recording a 15-second video and saving it. The video's resolution is mentioned, and the subsequent steps involve resizing the video using easygif.com and preparing a voiceover using 11labs.io. The final video is created with lip syncing through a Google Collab link, which involves uploading the resized video and voiceover file, and waiting for the process to complete before downloading the final video.

05:02

🛠️ Disconnecting and Saving the Final Video

The second paragraph provides instructions on what to do after the final video is ready. It advises users to disconnect the runtime on Google Collab by navigating to 'Runtime' and selecting 'Disconnect and Delete Runtime', confirming with a 'Yes'. Once disconnected, the final video can be found in the user's media gallery, completing the process of creating a talking avatar.

Mindmap

Keywords

💡Stable Diffusion

Stable Diffusion is a term referring to a type of machine learning model used for generating images from textual descriptions. In the context of the video, it is used to create an image of a face on a phone, which is a crucial first step in making an AI talking avatar. The video mentions using this technology on a phone, indicating a mobile application of a complex AI process.

💡Motion Portrait App

The Motion Portrait App is a software application that enables users to animate still images, particularly focusing on facial features like eyes and lips. As per the script, this app is integral to the process of bringing the AI talking avatar to life by automatically detecting and animating the facial features of the prepared image, allowing for natural face movements.

💡Facial Expression

Facial expressions are the observable movements of facial muscles that convey a person's emotions or reactions. In the video's context, the Motion Portrait App allows users to modify the AI avatar's facial expression to reflect emotions like joy, sadness, or anger, adding a layer of realism and interactivity to the avatar.

💡Voiceover

A voiceover is a recording of a voice that is added to a video, typically to narrate or provide additional information. In the script, the voiceover is created using 11labs.io, where text is converted into an audio file. This voiceover is then synchronized with the AI talking avatar's lip movements to create a seamless talking effect.

💡Lip Syncing

Lip syncing is the process of matching an audio track, especially the speech, with the movements of the lips in a video or animation. The video demonstrates how to achieve lip syncing with the AI talking avatar by using a Google Colab script to combine the resized video file and the voiceover audio file, resulting in a realistic talking avatar.

💡Google Colab

Google Colab is a cloud-based platform for machine learning education and research. It is used in the video to create the final video with lip syncing. The script guides users through using Google Colab to process and combine the video and audio files, showcasing its utility for content creation beyond traditional machine learning tasks.

💡EasyGIF.com

EasyGIF.com is an online service that allows users to create and edit GIFs and videos. In the video, it is used to resize the video of the AI talking avatar to a smaller resolution, which is necessary to make subsequent processing steps more manageable and efficient.

💡Resolution

Resolution refers to the number of pixels in an image or video, which determines its clarity and detail. The video script mentions the original video resolution as 1027 by 1632, which is considered large and can affect processing time. Resizing the video to a resolution of 720 in width, as demonstrated, helps optimize the video for further editing.

💡Speech Synthesis

Speech synthesis is the artificial production of human-like speech. In the context of the video, 11labs.io is used for speech synthesis, allowing users to input text and select a character voice to generate an audio file. This synthesized voice is then used as the voiceover for the AI talking avatar.

💡Text-to-Speech

Text-to-speech (TTS) is a technology that converts written text into spoken words. It is a component of the speech synthesis process used in the video. The script details using 11labs.io to perform TTS, where the desired text is input, and a voice is selected to produce the voiceover audio.

💡Runtime Disconnection

Runtime disconnection refers to the act of ending a session or connection with a computing resource, such as a server or cloud-based platform. In the video, after the final video is downloaded from Google Colab, the user is instructed to disconnect and delete the runtime to free up resources and properly conclude the session.

Highlights

Create an AI talking avatar on your phone for free using Google Colab as an alternative to D-ID.

Use stable diffusion on your phone to create a high-quality image of a face that meets certain conditions.

The face in the photo must be clear, with eyes looking towards the camera.

Utilize the free Motion Portrait app available on Google Play Store and App Store to animate the face.

The app automatically identifies eyes and lips for natural facial movements.

Change facial expressions to joy, sadness, or anger within the app.

Record a video of up to 15 seconds long with the app.

The resulting video can be saved to your phone's media gallery.

Resize the video to a smaller resolution using easygif.com for faster processing.

Prepare a voiceover by converting text to speech using 11labs.io with a female voice for the avatar.

Download the voiceover file to your phone for further use.

Combine the resized video and voiceover using Google Colab for lip syncing.

Follow a step-by-step process on Google Colab to create the final video.

The final video creation process may take time depending on the voiceover file duration.

Preview and download the final video from Google Colab.

Ensure to disconnect and delete the runtime on Google Colab after the process is complete.

The final talking avatar video will feature eye, lip, and facial expression movements.

This method provides a free way to create a talking avatar without using paid services like D-ID.

All steps and necessary links are provided in the video description for ease of use.