🐼【AI电商实战】如何用Stable Diffusion生成相同人物相同街景不同角度的图片 独家思路公开 SD电商应用高级篇 stable diffusion角色设计

氪學家
16 Oct 202313:50

TLDRThe video script outlines a method for generating multi-angle images of a consistent character and scene using control nets and AI models. It details a process that leverages Google Maps' Street View for reference images, control nets for street and character pose, and AI optimization for realism. The tutorial aims to assist in creating images for e-commerce or sequential storyboards with a focus on achieving uniformity and realism in character depiction across different perspectives.

Takeaways

  • 🎨 Utilize 15宫格加openpose to generate consistent facial images of a character from different angles.
  • 🚀 Expand the technique to create multi-angle images of the same character in a consistent environment for various applications like e-commerce or storyboarding.
  • 🌐 Use Google Maps Street View as a resource to obtain images of streets from different angles without physically visiting the location.
  • 🖌️ Process the street view images with image editing tools like Fireworks or Photoshop to create a template for character generation.
  • 📐 Maintain a 1328*800 resolution for optimal results when using Tencent Cloud's 16G VRAM for image generation.
  • 🖼️ Apply control nets (LINEART and openpose models) to generate street scenes and character poses respectively.
  • 🎭 Use a combination of control nets and AI models to refine and enhance the realism of the generated images.
  • 🔍 Pay attention to details such as removing杂乱电线 from line稿 to avoid clutter and ensure the final image's quality.
  • 🔧 Offer multiple methods for facial repair and consistency, including high-definition facial repair, local redraw, and face-swapping plugins.
  • 📈 Provide a step-by-step guide to achieve realistic multi-angle images, emphasizing the importance of each step in the process.
  • 👍 Encourage viewer engagement by asking for likes and promising more tutorials upon reaching a certain threshold.

Q & A

  • What is the main challenge addressed in the script?

    -The main challenge addressed in the script is generating coherent and multi-angled images of the same character and scene in the context of AI-generated art, specifically using SD (Stable Diffusion) models.

  • How does the video creator suggest using Google Maps for AI art generation?

    -The video creator suggests using Google Maps' Street View feature to obtain images of streets from different angles without having to physically visit the locations, which can then be used as references for AI art generation.

  • What is the role of the 15-pose grid in the tutorial?

    -The 15-pose grid, combined with OpenPose, is used to generate images of the same character from different angles, which helps in maintaining consistency in facial features across various perspectives for later training of LORA (Low-Rank Adaptation) models.

  • What are the two control nets mentioned in the script and what do they control?

    -The two control nets mentioned are the Lineart model and the OpenPose skeletal model. The Lineart model controls the street scene content and angles, while the OpenPose model controls the generation of the character's主体 (main body) in the scene.

  • How does the video creator propose to optimize the generated images?

    -The video creator suggests optimizing the generated images by adding depth of field and repairing the facial features of each image individually to enhance realism and ensure consistency in the character's appearance across different angles.

  • What is the significance of the control net unit settings in the SD platform?

    -The control net unit settings in the SD platform are crucial for directing the AI to generate images with specific characteristics. Unit 0 might be used for the line drawing control net, while Unit 1 could be for the skeletal control net, allowing for precise control over the generation process.

  • How does the video creator plan to address issues with the final image's realism?

    -The video creator plans to address realism issues by using the '图生图' (image-to-image) feature to refine the generated images, focusing on making them more photorealistic and adding depth of field to enhance the sense of realism.

  • What are the potential solutions for maintaining facial consistency across different images?

    -The potential solutions for maintaining facial consistency include using high-resolution facial修复 (repair) tools,局部重绘 (local redraw) features,换脸 (face swap) plugins, or incorporating custom-trained LORA models to ensure the character's face remains uniform across different images.

  • How does the video creator suggest improving the similarity of the generated images to the reference images?

    -The video creator suggests that more effort should be put into refining the line drawing stage, as the complexity and clutter in the original images can affect the final outcome. However, the focus is on achieving different angles of the same character and scene rather than exact replication.

  • What is the recommended resolution for the images when using the SD platform with 16G VRAM?

    -The recommended resolution for the images when using the SD platform with 16G VRAM, as mentioned in the script, is 1328*800 pixels, which is considered the optimal resolution for the highest quality output in the given hardware conditions.

  • What is the significance of the '麦橘写实' (Mei Ju realistic) model in this context?

    -The '麦橘写实' model is a specific AI model used in the script for generating realistic images. It is mentioned as the model of choice for the demonstrations and tutorials provided in the script.

  • How does the video creator propose to handle images where the character's clothing appears inconsistent?

    -The video creator suggests that if the clothing appears inconsistent across generated images, users can either try generating more images or specify the clothing details explicitly in the prompt to achieve more consistent results.

Outlines

00:00

🎨 Generating Consistent Multi-Angle Images

This paragraph introduces the challenge of maintaining character and scene consistency in SD (presumably a digital art or animation software). The speaker discusses a previous video on using a 15-grid method with OpenPose for generating consistent facial images from different angles, which is useful for training LORA. The speaker acknowledges the inspiration from a foreign expert and proceeds to explain how to apply this concept to generate images of the same character in the same scene from different angles. The speaker also mentions their unique approach to this method, which includes finding similar backgrounds from different angles and optimizing the generated images. The use of Google Street View for sourcing images is highlighted, and the speaker provides a practical demonstration of the process, including the use of ControlNet and OpenPose models.

05:00

🖌️ Preprocessing and Line Drawing for Image Generation

The speaker delves into the technical process of preprocessing images for line drawing using ControlNet. They explain the importance of resolution matching and the use of the Canny edge detection method. The speaker demonstrates how to upload a preprocessed image to ControlNet, adjust settings for resolution, and generate a line drawing. They also discuss the process of erasing the character area from the line drawing to prepare it for character generation in SD. The speaker provides tips on using templates for this process and mentions the use of Fireworks for image processing, similar to Photoshop.

10:01

🌟 Enhancing Realism and Facial Repair in Generated Images

In this paragraph, the speaker focuses on enhancing the realism of generated images and repairing facial details. They explain the process of using ControlNet's Canny and Tile models to control the outline and color of the image, respectively. The speaker emphasizes the importance of avoiding overpainting and maintaining the original composition. They also discuss various methods for facial repair, including high-definition processing, local redraw functionality, and face-swapping plugins. The speaker suggests using models trained with LORA for facial consistency and invites viewers to like the video for more tutorials on AI and image processing.

Mindmap

Keywords

💡Controlnet

Controlnet is a tool used in the process of generating images with AI, specifically for controlling certain aspects of the generated content. In the video, it is used to control the street scene content and angles, as well as the generation of the character's main body within the scene. The video mentions using two controlnets, one for lineart models and another for openpose skeletal models, to create a cohesive and multi-angle image of a character in a consistent environment.

💡Openpose

Openpose is a pose estimation technology that identifies the key points of a human body in an image. In the context of the video, it is used as a controlnet to generate the main body of a character from a skeletal model. This allows for the creation of images with characters in various poses within a consistent street scene.

💡Lineart

Lineart refers to the creation of images using lines to define the shapes and contours of objects. In the video, a lineart model is used as one of the controlnets to control the content and angles of the street scenes. This helps in generating a consistent background for the character images from different angles.

💡Google Maps Street View

Google Maps Street View is a feature of Google Maps that allows users to view and navigate through street-level imagery of various locations. In the video, it is used to find reference images of streets for creating backgrounds in the AI-generated images. The diversity of locations and scenery available on Google Maps Street View makes it a valuable resource for generating diverse and realistic street scenes.

💡AI Image Generation

AI Image Generation refers to the process of creating images using artificial intelligence algorithms. In the video, this process involves using controlnets and models like Openpose and Lineart to generate images of a character in a consistent environment from different angles. The goal is to create a series of images that can be used for various applications, such as e-commerce or creating continuous storyboards.

💡E-commerce

E-commerce, or electronic commerce, refers to the buying and selling of goods or services using the internet. In the video, the technique of generating multi-angle images of a character in a consistent environment is suggested as a potential application for e-commerce, where such images could be used for product展示 or marketing purposes.

💡Storyboard

A storyboard is a sequence of images or illustrations displayed in a comic strip-like format, often used to visualize a series of events in a film, animation, or video production. In the video, the technique of generating consistent character and scene images from different angles is suggested as a method for creating a continuous storyboard, which could be useful for planning out visual narratives.

💡Photorealistic

Photorealistic refers to images or visual effects that are so realistic that they could be mistaken for photographs. In the video, the term is used to describe the goal of making the AI-generated images look as realistic as possible, achieved by adding elements like depth of field to mimic the blur effect found in real photographs.

💡Depth of Field

Depth of field is a photographic term that describes the range of distance within a scene that appears acceptably sharp. It is an important aspect of photography that adds a sense of realism to images by blurring the out-of-focus areas. In the video, depth of field is used as a technique to make the AI-generated images more realistic by simulating the way a camera focuses on the subject and blurs the background.

💡Facial Repair

Facial repair refers to the process of enhancing or correcting the appearance of a face in an image. In the context of the video, it is one of the methods mentioned for improving the quality and realism of the AI-generated images, particularly focusing on the character's face to ensure consistency and a more polished look.

💡LORA

LORA, or Low-Rank Adaptation, is a technique used in AI models to adapt the model to generate images that are consistent with a specific style or subject. In the video, it is mentioned as a method to achieve facial unity in AI-generated images by training the model with images of a specific model or character, ensuring that the generated faces are consistent across different images.

Highlights

The speaker introduces a method for generating consistent multi-angle images of characters and scenes, addressing a common issue in SD (presumably a digital art or animation software).

A tutorial video was previously published demonstrating the use of a 15宫格 (15-grid) system combined with OpenPose for generating consistent facial images of a character from different angles.

The current session focuses on extending the previous method to generate images of the same character in the same scene from different camera angles.

The speaker acknowledges the contribution of an unnamed foreign expert whose blog post inspired the method.

The method involves using two control nets, one for the street scene (lineart model) and another for the character (OpenPose skeletal model).

The speaker claims that their approach for generating multi-angle images is an exclusive first release on the internet.

Google Maps' Street View is recommended as a tool for obtaining images of streets from different angles without the need for physical photography.

The process includes finding suitable street views, capturing screenshots, and using image editing tools to prepare the images for the SD software.

The speaker provides a detailed demonstration of the image processing steps, including the use of Fireworks or Photoshop for image editing.

ControlNet's unit0 and unit1 settings are discussed, with the LINEART model being preferred for the lineart control net.

The importance of adjusting the control net's settings to match the resolution of the input images is emphasized.

The speaker explains how to use the骨骼图 (skeletal diagram) to create a template for generating the character in the scene.

The process of erasing the character area in the lineart to allow for character generation is described.

The final step involves using the 图生图 (image generation) feature to optimize and add realism to the generated images.

The speaker addresses the issue of image similarity and suggests that while the final images may not closely resemble the Google Street View images, they achieve the goal of showing the same character in different angles.

The speaker offers three methods for facial repair and unification of the character's face across images.

The tutorial concludes with a call to action for viewers to like the video for more content and a promise of future AI-related tutorials.