🐼 stable diffusion角色设计 用Controlnet分别在SD WEBUI和ComfyUI中绘制多角度人像的方法

氪學家
18 Sept 202311:13

TLDRThe video tutorial introduces a method for character design using AI painting, specifically discussing the challenges of maintaining consistency in character appearance from different angles. It highlights a technique involving the use of a diffusion algorithm-based tool called SD, and the training of a model called Lora with images generated entirely from SD. The process involves creating a sequence of multi-angle character images, training Lora with these images, and refining the results. The tutorial provides实操 tips, addresses potential issues with AI-generated images, and demonstrates how to use COMFY UI for efficient deployment of the character design system.

Takeaways

  • 🎨 The video discusses a method for character design using AI painting, specifically focusing on achieving consistency in character portrayal across different angles.
  • 🤖 The method is based on an article by a foreign expert, which addresses the challenge of training Lora models with 100% virtual characters generated by SD (Stable Diffusion).
  • 📸 The process involves creating a sequence of multi-angle character images using text-to-image generation, which are then used to train the Lora model.
  • 📈 The article provides a template with 15 angles for generating faces in open pose and a 15-grid image for segmentation during the image generation process.
  • 🖼️ Images generated should be 256x256 pixels in size, and when combined, the total size should be 1328x800 pixels, considered optimal for current hardware conditions.
  • 💡 The video emphasizes the importance of the image sequence for training Lora, as it allows for image segmentation and subsequent training.
  • 🔍 The video acknowledges the unpredictability of SD-generated images, which may result in image degradation or inconsistencies in the 15-grid layout.
  • 🛠️ The presenter shares personal solutions for potential issues, such as adjusting the weight of the open pose control and using tile models and upscaling algorithms for image repair.
  • 🎓 The video provides a practical demonstration of the process using the Deliberate model and the latest version of SD, including downloading templates and setting up CONTROLNET units.
  • 🌐 The presenter's approach to character design can be adapted to create different characters by changing the prompt words while maintaining consistency in the design.
  • 🔗 The video mentions the availability of the节点 system for character design on the presenter's ko-fi store and offers a way for viewers to receive it for free by following the presenter's Twitter account.

Q & A

  • What is the main topic of the 28th episode of the SD series tutorial?

    -The main topic of the 28th episode of the SD series tutorial is character design using AI painting.

  • What is the significance of consistency in character design in AI painting?

    -Consistency in character design is crucial as it ensures that the character appears uniform across different angles and perspectives, which is currently a hot topic and considered the 'holy grail' in the AI community.

  • What is the traditional solution for maintaining character consistency in AI painting?

    -The traditional solution for maintaining character consistency in AI painting is through 3D modeling.

  • What is the new method introduced in the tutorial for character design using SD?

    -The new method introduced for character design using SD involves training Lora with a sequence of multi-angle images of a character created entirely through AI painting.

  • How many angles are included in the template provided by the article's author for generating different angles of a face?

    -The template provided by the article's author includes 15 angles for generating different angles of a face.

  • What is the minimum number of images required to train Lora?

    -The minimum number of images required to train Lora is 15.

  • What are the two main CONTROLNET models used in the character design process?

    -The two main CONTROLNET models used in the character design process are the open pose skeletal model and the lineart line稿 model.

  • What is the purpose of using the open pose skeletal model in the CONTROLNET?

    -The purpose of using the open pose skeletal model in the CONTROLNET is to control the direction of the face and to generate different angles of the face based on the skeletal structure.

  • What is the role of the lineart model in the CONTROLNET?

    -The role of the lineart model in the CONTROLNET is to perform a grid split on the generated image, which helps in creating separate images for training Lora.

  • How can one improve the quality of images generated with Lora?

    -One can improve the quality of images generated with Lora by using high-quality source images from the AI painting process, optimizing the CONTROLNET settings, and applying image enlargement and repair techniques using models like tile and ESRGAN 4X.

  • What is the issue that may arise when using the new method for character design, and how can it be addressed?

    -The issue that may arise when using the new method for character design is the potential for image degradation or 'image崩坏'. This can be addressed by using image enlargement and repair techniques in the图生图 phase, and by adjusting the settings and权重 in the CONTROLNET models to achieve better results.

  • How can the new character design method be applied in different scenarios?

    -The new character design method can be applied in different scenarios by changing the prompt words to generate characters with varying features, emotions, and accessories, as demonstrated in the tutorial with the 'one girl blue hair, smile white background, red head wear glasses' prompt.

Outlines

00:00

🎨 Introduction to Character Design with AI Painting

The video begins with an introduction to the 28th episode of the SD tutorial series, focusing on character design. The host discusses the widespread interest in AI painting since its inception and shares their experience with the topic. They mention a recent article by an international expert on using SD for character design, which they found useful and decided to share in a tutorial. The host emphasizes the importance of consistency in character design across different angles, traditionally achieved through 3D modeling. However, the article introduces a novel method using SD and Lora training, specifically for characters created entirely through AI-generated images. The process involves creating a sequence of multi-angle character images, training Lora with these images, and refining the results to achieve a consistent character design. The host also provides a link to the article in the video description and outlines the three main stages of the process: creating a sequence of character images, training Lora, and refining the results.

05:01

📚 Understanding the Role of Lora and Training Process

This paragraph delves into the specifics of Lora training and the challenges associated with it. The host explains that Lora training typically relies on real-life人物 or existing素材, which may not align with AI-generated characters. The article's author addresses this by proposing a method to train Lora using 100% virtual characters. The process is divided into three stages: generating a sequence of multi-angle character images, training Lora with these images, and refining the model. The host emphasizes the importance of the image sequence, as it allows for image segmentation and Lora training. They also discuss the practical aspects of training, such as the minimum number of images required and the recommended image size for Lora training. The host shares their实操 experience, including the use of腾讯云16G显存 for image generation and the potential issues with lower VRAM.

10:01

🖌️ Practical Demonstration of Character Design with SD

The host transitions to a practical demonstration of character design using SD. They guide the audience through the process of using the templates provided in the article, including a 15-angle facial template and a 15-grid image for SD's understanding of individual images. The host explains the use of CONTROLNET for image segmentation and the selection of open pose and lineart models for facial direction and detail control. They also discuss the use of tile models and upscaling algorithms to enhance image quality. The host demonstrates the generation of a simple image using the 'one girl' prompt and shows how to放大 the image using tile models and upscaling algorithms. They then provide a detailed explanation of the CONTROLNET models' roles and how they contribute to the final image quality.

🔄 Deployment and Optimization of Character Design System

The host introduces the deployment of the character design system using COMFY UI, which allows for a more streamlined and batch processing of images. They explain the system's capability to output multiple sets of images and discuss the use of CONTROLNET models for image control. The host also shares their拓展用法 of the 15-grid template to a 4-grid template and the rationale behind using two CONTROLNET open pose models to avoid facial duplication in the grid. They provide insights into the impact of prompt words on the final image, emphasizing the importance of consistent seed values and sampling steps. The host also discusses the use of different models and their impact on image quality, offering suggestions for optimizing the training process and addressing common issues such as facial recognition inaccuracies and image degradation.

Mindmap

Keywords

💡SD系列教程

SD系列教程 refers to a series of instructional videos or guides focused on the topic of 'SD', which in the context of the video, likely stands for a specific AI-based drawing or image generation tool. The series is designed to educate viewers on how to use this tool effectively, covering various aspects such as character design and the technicalities of creating consistent images from different angles.

💡角色设计

Character design is the process of creating the visual appearance and personality of a character used in various forms of media, such as video games, comics, or animation. In the context of the video, it refers to the application of AI technology, specifically the SD tool, to design characters with consistent features and poses from multiple angles.

💡扩散算法

Diffusion algorithms are a class of machine learning models used in generative models to create new data samples that resemble a given dataset. In the context of the video, the SD tool's image generation is based on a diffusion algorithm, which is a key technology behind the AI's ability to create images of characters from different perspectives.

💡Lora

Lora, in the context of the video, refers to a type of AI model used for image generation. It is trained on a dataset to produce consistent and high-quality images. The video discusses training Lora with images generated from the SD tool to ensure that the character designs maintain consistency across various angles and poses.

💡多角度角色图片序列

A multi-angle character image sequence is a collection of images that depict a character from various perspectives. This sequence is crucial for training AI models like Lora to understand and generate images of characters consistently from different angles. The video outlines a method to create such a sequence using the SD tool for effective character design.

💡open pose骨骼图

The open pose skeleton graph, or open pose skeleton, is a representation of a character's skeletal structure in a neutral or open pose, without any specific actions or expressions. In the video, this concept is used to help the SD tool identify and generate different angles of a character's face by providing a reference for the facial structure and orientation.

💡tile模型

The tile model in the context of the video refers to a method or algorithm used in image processing to repair or enhance images. It works by repeating smaller sections or 'tiles' of an image to create a larger, more detailed version. This is particularly useful when放大 (enlarging) images generated by the SD tool to improve their quality and resolve any imperfections.

💡放大算法

Enlargement algorithms, or magnification algorithms, are techniques used in image processing to increase the size of an image while maintaining or improving its quality. In the video, these algorithms are applied to the images generated by the SD tool to upscale them to a尺寸 (size) suitable for training Lora, ensuring that the details of the character design are preserved and enhanced.

💡COMFY UI

COMFY UI refers to a user interface or platform that is used to deploy and operate the character design system discussed in the video. It allows for a more streamlined and batch processing of images, enabling users to output multiple sets of images following the established node system based on the character design workflow.

💡提示词

Prompt words, or prompts, are specific phrases or terms used to guide the AI in generating images. In the context of the video, these prompts are crucial for controlling the characteristics of the generated images, such as the emotions, attire, and other features of the characters.

💡Web UI

Web UI stands for Web User Interface, which in this context refers to the online interface used to operate the SD tool for generating images. It is a platform where users can input prompts and control the parameters of the image generation process.

💡节点系统

Node system, in the context of the video, refers to a set of interconnected elements or steps in a character design workflow that are managed through the COMFY UI. These nodes represent different stages or components of the process, such as image generation, enlargement, and repair, which are organized to streamline the creation of character images.

Highlights

The introduction of a method for character design using AI painting, which has been widely discussed since the emergence of AI painting.

The reliance on traditional 3D modeling to ensure consistency of characters from different angles, and the exploration of new methods using SD (Stable Diffusion).

The article by a foreign expert that proposes a novel approach to training Lora with 100% virtual characters created by SD, instead of real-life figures.

The three-stage process involving creating a sequence of multi-angle character images, training Lora with these images, and refining the results.

The use of a 15-angle face template for generating open pose images and a 15-grid image for SD to recognize individual images.

The minimum requirement of 15 images for Lora training and the recommended image size of 256x256.

The practical application of tile models and upscaling algorithms to enhance image quality and size for Lora training material.

The acknowledgment of the author's innovative approach and generous sharing of knowledge.

The unpredictability of SD-generated images and the potential for image degradation, which is characteristic of SD rather than a flaw in the method.

The demonstration of the实操 (practical operation) process using the Deliberate model and SD version 1.6.

The utilization of CONTROLNET to upload and process the template images for open pose recognition and grid segmentation.

The testing of the method with a simple prompt 'one girl' and the observation of the generated 15-angle facial images.

The repair of minor image flaws using tile models and upscaling techniques to achieve higher quality images for Lora training.

The explanation of the role of the two CONTROLNET models in the image generation process and the importance of controlling facial direction and details.

The deployment of the character design system using COMFY UI, which allows for streamlined operations and batch image output.

The拓展 (expansion) of the method to a 4-grid image output by using two CONTROLNET open pose models to control facial orientation.

The emphasis on maintaining consistent prompt words to ensure uniformity in the final images and the impact of emotional prompts on the tone of the generated images.

The provision of solutions for common issues encountered during the image generation process, such as facial recognition inaccuracies and prompt word optimization.

The availability of the COMFY UI node system for purchase and the offer to receive it for free by following the creator's social media and providing an email address.