Stable Diffusion Fix Hands Without ControlNet and Inpainting (Easy) | SDXL FREE! (Automatic1111)

Xclbr Xtra
23 Apr 202406:50

TLDRIn this video tutorial, the presenter explains a straightforward method to generate realistic hand images using the Real V SDXL model without resorting to control nets or inpaintings. The process involves two models: the Real V SDXL for initial generation and the Dream Shipper Turbo for refinement. The presenter emphasizes the importance of using full-body prompts to ensure hands are visible and not just close-up shots. Various settings are discussed, including the use of mid-journey mimic for an aesthetic feel, sampling steps, and CFG scale. The final result is a more realistic image with properly formed hands, suitable for most non-professional use cases.

Takeaways

  • 🎨 **Using Real V SDXL Model**: The Real V SDXL model is capable of generating decent hands but may not feel as realistic as desired.
  • 🖌️ **Avoiding Detailer**: Initially, a detailer is not used as the image will be upscaled later, where the detailer can be more effectively applied.
  • 📸 **Full Body Focus**: The focus is on capturing a full body shot to ensure hands are visible and not just a close-up.
  • 🔍 **Mid Journey Mimic**: The 'mid Journey mimic' setting at 0.5 is used to give an aesthetic feel without being too strong.
  • 🚫 **Negative Prompting**: Basic negative prompting is used to avoid NSFW content, blurriness, and bad hands.
  • 🔢 **Sampling Steps and Settings**: 50 sampling steps with DPM++ 3M, SD exponential are used for the initial model. CFG scale is set between 6 to 7.
  • 🔄 **Upscaling with a Different Model**: After initial generation, the image is upscaled using a different model for better detail.
  • 🔧 **Turbo Model for Realism**: The turbo model is employed to enhance the realism of the image, particularly the skin texture.
  • 📈 **CFG Scale Adjustment**: The CFG scale is adjusted to 1 for the turbo model, which is found to be sufficient for the desired output.
  • 🖋️ **Ad Tailor for Personal Touch**: The 'Ad Tailor' feature is enabled, allowing for personalization of the final image.
  • 📈 **Comparing Results**: Comparing the original and the improved images shows a significant enhancement in realism and detail.

Q & A

  • What is the main focus of the video?

    -The main focus of the video is to demonstrate how to generate proper and non-disfigured hands using the Stable Diffusion model without the use of ControlNet or inpaintings.

  • Which model does the speaker recommend for generating hands?

    -The speaker recommends using the real V SDXL model for generating decent hands.

  • What is the issue with the real V SDXL model according to the speaker?

    -The issue with the real V SDXL model is that it does not feel as realistic as the speaker would like.

  • What are the two models used in the process described in the video?

    -The two models used in the process are the real V SDXL model for initial generation and the dream shipper turbo model for upscaling and improving the image.

  • What is the purpose of using the mid Journey mimic setting?

    -The mid Journey mimic setting is used to give an aesthetic feel to the generated image and to avoid making the image too strong.

  • What is the significance of the sampling steps in the process?

    -The sampling steps, set to 50 in the video, are used to determine the number of iterations the model goes through to generate the image, affecting the quality and detail of the output.

  • What is the role of the CFG scale in the process?

    -The CFG scale, set between 6 to 7 in the video, is used to control the creativity level of the generated image, balancing between detail and randomness.

  • Why is the speaker not using a detailer in the initial prompt?

    -The speaker is not using a detailer in the initial prompt because the image will be upscaled later, and the detailer will be used in that stage for enhancing facial features.

  • What is the benefit of using the dream shipper turbo model for upscaling?

    -The dream shipper turbo model is used for upscaling because it helps in making the image more realistic and enhances the quality of the generated hands.

  • What is the speaker's opinion on the final output of the hands generated by the process?

    -The speaker believes that the final output of the hands is mostly okay, with some hands looking good and others not as much, but overall providing a decent figure for most use cases.

  • How does the speaker suggest further improving the generated image?

    -The speaker suggests that for further improvement, one can try in-painting on the generated image, but notes that it's not necessary for non-professional use.

Outlines

00:00

🎨 Creating Realistic Hands in Art 🎨

The first paragraph introduces the video's main objective, which is to demonstrate a simple method for creating realistic hands in digital art without resorting to complex techniques like control nets or in-painting. The artist emphasizes the use of the real V SDXL model for generating decent hands but acknowledges a lack of realism. The solution proposed involves using two models, starting with a normal prompt and then upscaling with an additional model. The artist also discusses the importance of a full-body view to capture the hands in context. Technical details such as the use of mid Journey mimic at a 0.5 setting, negative prompting to avoid NSFW and blurry images, and specific settings for sampling steps, DPM Plus+ 3M, SD exponential, batch count, and CFG scale are provided. The paragraph concludes with a mention of enabling integrated features like self-attention and guidance in Forge web UI.

05:02

🚀 Enhancing Image Quality with Turbo Models 🚀

The second paragraph details the process of enhancing the image quality and realism using a turbo model. The artist explains that after an initial image generation, they will use the 'dream shipper turbo' model with the same aesthetic settings as before, but with increased sampling steps and a higher CFG scale. The focus is on achieving a more realistic skin texture, which is often a challenge with digital art. The artist shares their findings that turbo models can make the skin look less plasticky and more realistic. The paragraph concludes with the artist's satisfaction with the results, noting that while in-painting could further improve the images, the presented method is sufficient for most non-professional use cases.

Mindmap

Keywords

💡Stable Diffusion

Stable Diffusion is a term referring to a type of generative model used in machine learning, specifically for creating images from textual descriptions. In the context of the video, it is the core technology that the presenter is utilizing to generate images of hands without distortion.

💡Hands

The term 'hands' in this video refers to the human body part that the presenter is focusing on improving in the generated images. The goal is to create images with properly formed hands, avoiding common issues such as extra or missing fingers.

💡ControlNet

ControlNet is a technique used in image generation to control the output more precisely. The presenter mentions that they will not be using ControlNet in their process, indicating a simpler approach to generating images.

💡Inpainting

Inpainting is a process in image editing where missing or damaged parts of an image are filled in. The video suggests that this technique will not be used, aiming for proper hand generation directly through the Stable Diffusion model.

💡Real V SDXL Model

The 'Real V SDXL Model' is a specific model within the Stable Diffusion framework that the presenter mentions as capable of generating decent hands. It is an important tool in the process described in the video.

💡Mid Journey Mimic

This term refers to a setting or feature within the image generation process that gives an aesthetic feel to the images. It is used at a 0.5 setting to avoid making the image too strong, as per the presenter's preference.

💡Negative Prompting

Negative prompting is a technique where the generator is instructed to avoid certain features or elements in the generated image. In the video, it is used to prevent unwanted elements such as NSFW content or blurry images.

💡Sampling Steps

Sampling steps refer to the number of iterations or steps taken in the image generation process to refine the output. The presenter uses 50 sampling steps for their model to achieve better image quality.

💡Denoising

Denoising is a process that reduces the noise or unwanted elements in an image to make it clearer and more realistic. The presenter uses a turbo model to denoise the image and enhance its realism.

💡CFG Scale

CFG stands for 'Control Flow Guide', and the CFG scale is a parameter that influences the level of detail and control in the image generation process. The presenter adjusts this scale to find a balance between detail and the desired aesthetic.

💡Image-to-Image

Image-to-Image is a process where an existing image is used as a starting point to generate a new image with certain modifications or enhancements. The presenter uses this technique to improve the realism of the hands in the generated images.

💡Aesthetic Feel

The term 'aesthetic feel' refers to the visual appeal or artistic quality of the generated images. The presenter aims to achieve an aesthetically pleasing result by adjusting various settings in the image generation process.

Highlights

The video demonstrates a simple method to generate proper hands without using complex tools like ControlNet or inpaint techniques.

The process is suitable for creating normal poses with decent hands, avoiding extra fingers or disfigurements.

The Real V SDXL model is mentioned as capable of generating decent hands but may lack realism.

Two models are used in the process to enhance the realism of the generated hands.

The importance of a full body shot is emphasized for better visibility of hands.

Mid Journey Lura, also known as Mid Journey mimic, is used for an aesthetic feel with a setting of 0.5.

Negative prompting includes avoiding NSFW content, blurry images, and bad hands.

For the Real V model, 50 sampling steps and DPM Plus+ 3M SD exponential are used.

CFG scale is set between 6 to 7 for the initial model, as it will be upscaled later.

Self-attention guidance integration is enabled for better results.

Forge web UI is used, which includes all necessary features without the need for additional extensions.

Clip skip is set to two for better detail in the generated images.

The generated images are assessed for hand quality, with reruns performed if necessary.

Image-to-Image processing is used with the Dream Shipper Turbo model for further enhancement.

The settings for the turbo model include eight steps, S3 caras scale increased to two, and a batch count of four.

CFG scale for the turbo model is set to one, which is found to be effective for realism.

The ADtailor is enabled for additional customization.

The final images showcase more realistic hands compared to the initial generation.

In-painting can be used for further improvement, but the method covers most use cases effectively.

The video concludes by summarizing the process and its practical applications.