🐼【AI电商实战】如何用Stable Diffusion生成相同人物相同街景不同角度的图片 独家思路公开 SD电商应用高级篇 stable diffusion角色设计
TLDRThe video script outlines a method for generating multi-angle images of a consistent character and scene using control nets and AI models. It details a process that leverages Google Maps' Street View for reference images, control nets for street and character pose, and AI optimization for realism. The tutorial aims to assist in creating images for e-commerce or sequential storyboards with a focus on achieving uniformity and realism in character depiction across different perspectives.
Takeaways
- 🎨 Utilize 15宫格加openpose to generate consistent facial images of a character from different angles.
- 🚀 Expand the technique to create multi-angle images of the same character in a consistent environment for various applications like e-commerce or storyboarding.
- 🌐 Use Google Maps Street View as a resource to obtain images of streets from different angles without physically visiting the location.
- 🖌️ Process the street view images with image editing tools like Fireworks or Photoshop to create a template for character generation.
- 📐 Maintain a 1328*800 resolution for optimal results when using Tencent Cloud's 16G VRAM for image generation.
- 🖼️ Apply control nets (LINEART and openpose models) to generate street scenes and character poses respectively.
- 🎭 Use a combination of control nets and AI models to refine and enhance the realism of the generated images.
- 🔍 Pay attention to details such as removing杂乱电线 from line稿 to avoid clutter and ensure the final image's quality.
- 🔧 Offer multiple methods for facial repair and consistency, including high-definition facial repair, local redraw, and face-swapping plugins.
- 📈 Provide a step-by-step guide to achieve realistic multi-angle images, emphasizing the importance of each step in the process.
- 👍 Encourage viewer engagement by asking for likes and promising more tutorials upon reaching a certain threshold.
Q & A
What is the main challenge addressed in the script?
-The main challenge addressed in the script is generating coherent and multi-angled images of the same character and scene in the context of AI-generated art, specifically using SD (Stable Diffusion) models.
How does the video creator suggest using Google Maps for AI art generation?
-The video creator suggests using Google Maps' Street View feature to obtain images of streets from different angles without having to physically visit the locations, which can then be used as references for AI art generation.
What is the role of the 15-pose grid in the tutorial?
-The 15-pose grid, combined with OpenPose, is used to generate images of the same character from different angles, which helps in maintaining consistency in facial features across various perspectives for later training of LORA (Low-Rank Adaptation) models.
What are the two control nets mentioned in the script and what do they control?
-The two control nets mentioned are the Lineart model and the OpenPose skeletal model. The Lineart model controls the street scene content and angles, while the OpenPose model controls the generation of the character's主体 (main body) in the scene.
How does the video creator propose to optimize the generated images?
-The video creator suggests optimizing the generated images by adding depth of field and repairing the facial features of each image individually to enhance realism and ensure consistency in the character's appearance across different angles.
What is the significance of the control net unit settings in the SD platform?
-The control net unit settings in the SD platform are crucial for directing the AI to generate images with specific characteristics. Unit 0 might be used for the line drawing control net, while Unit 1 could be for the skeletal control net, allowing for precise control over the generation process.
How does the video creator plan to address issues with the final image's realism?
-The video creator plans to address realism issues by using the '图生图' (image-to-image) feature to refine the generated images, focusing on making them more photorealistic and adding depth of field to enhance the sense of realism.
What are the potential solutions for maintaining facial consistency across different images?
-The potential solutions for maintaining facial consistency include using high-resolution facial修复 (repair) tools,局部重绘 (local redraw) features,换脸 (face swap) plugins, or incorporating custom-trained LORA models to ensure the character's face remains uniform across different images.
How does the video creator suggest improving the similarity of the generated images to the reference images?
-The video creator suggests that more effort should be put into refining the line drawing stage, as the complexity and clutter in the original images can affect the final outcome. However, the focus is on achieving different angles of the same character and scene rather than exact replication.
What is the recommended resolution for the images when using the SD platform with 16G VRAM?
-The recommended resolution for the images when using the SD platform with 16G VRAM, as mentioned in the script, is 1328*800 pixels, which is considered the optimal resolution for the highest quality output in the given hardware conditions.
What is the significance of the '麦橘写实' (Mei Ju realistic) model in this context?
-The '麦橘写实' model is a specific AI model used in the script for generating realistic images. It is mentioned as the model of choice for the demonstrations and tutorials provided in the script.
How does the video creator propose to handle images where the character's clothing appears inconsistent?
-The video creator suggests that if the clothing appears inconsistent across generated images, users can either try generating more images or specify the clothing details explicitly in the prompt to achieve more consistent results.
Outlines
🎨 Generating Consistent Multi-Angle Images
This paragraph introduces the challenge of maintaining character and scene consistency in SD (presumably a digital art or animation software). The speaker discusses a previous video on using a 15-grid method with OpenPose for generating consistent facial images from different angles, which is useful for training LORA. The speaker acknowledges the inspiration from a foreign expert and proceeds to explain how to apply this concept to generate images of the same character in the same scene from different angles. The speaker also mentions their unique approach to this method, which includes finding similar backgrounds from different angles and optimizing the generated images. The use of Google Street View for sourcing images is highlighted, and the speaker provides a practical demonstration of the process, including the use of ControlNet and OpenPose models.
🖌️ Preprocessing and Line Drawing for Image Generation
The speaker delves into the technical process of preprocessing images for line drawing using ControlNet. They explain the importance of resolution matching and the use of the Canny edge detection method. The speaker demonstrates how to upload a preprocessed image to ControlNet, adjust settings for resolution, and generate a line drawing. They also discuss the process of erasing the character area from the line drawing to prepare it for character generation in SD. The speaker provides tips on using templates for this process and mentions the use of Fireworks for image processing, similar to Photoshop.
🌟 Enhancing Realism and Facial Repair in Generated Images
In this paragraph, the speaker focuses on enhancing the realism of generated images and repairing facial details. They explain the process of using ControlNet's Canny and Tile models to control the outline and color of the image, respectively. The speaker emphasizes the importance of avoiding overpainting and maintaining the original composition. They also discuss various methods for facial repair, including high-definition processing, local redraw functionality, and face-swapping plugins. The speaker suggests using models trained with LORA for facial consistency and invites viewers to like the video for more tutorials on AI and image processing.
Mindmap
Keywords
💡Controlnet
💡Openpose
💡Lineart
💡Google Maps Street View
💡AI Image Generation
💡E-commerce
💡Storyboard
💡Photorealistic
💡Depth of Field
💡Facial Repair
💡LORA
Highlights
The speaker introduces a method for generating consistent multi-angle images of characters and scenes, addressing a common issue in SD (presumably a digital art or animation software).
A tutorial video was previously published demonstrating the use of a 15宫格 (15-grid) system combined with OpenPose for generating consistent facial images of a character from different angles.
The current session focuses on extending the previous method to generate images of the same character in the same scene from different camera angles.
The speaker acknowledges the contribution of an unnamed foreign expert whose blog post inspired the method.
The method involves using two control nets, one for the street scene (lineart model) and another for the character (OpenPose skeletal model).
The speaker claims that their approach for generating multi-angle images is an exclusive first release on the internet.
Google Maps' Street View is recommended as a tool for obtaining images of streets from different angles without the need for physical photography.
The process includes finding suitable street views, capturing screenshots, and using image editing tools to prepare the images for the SD software.
The speaker provides a detailed demonstration of the image processing steps, including the use of Fireworks or Photoshop for image editing.
ControlNet's unit0 and unit1 settings are discussed, with the LINEART model being preferred for the lineart control net.
The importance of adjusting the control net's settings to match the resolution of the input images is emphasized.
The speaker explains how to use the骨骼图 (skeletal diagram) to create a template for generating the character in the scene.
The process of erasing the character area in the lineart to allow for character generation is described.
The final step involves using the 图生图 (image generation) feature to optimize and add realism to the generated images.
The speaker addresses the issue of image similarity and suggests that while the final images may not closely resemble the Google Street View images, they achieve the goal of showing the same character in different angles.
The speaker offers three methods for facial repair and unification of the character's face across images.
The tutorial concludes with a call to action for viewers to like the video for more content and a promise of future AI-related tutorials.