Get Consistent Character and Styles with Dalle-3

Quick Start Creative
20 Oct 202315:04

TLDRIn this video, the host discusses the use of custom instructions with Dalle-3 to create consistent characters and styles in comic art. After watching a video by Jil GIlberry, the host was inspired to apply custom instructions to Dalle-3 for generating images. The process involves describing characters, choosing an art style, and providing a simple prompt to Dalle-3 to generate a two-minute action comic story. The host emphasizes the importance of having a 'North Star', which includes the main character, superhero look, and art style, to guide the AI's output. Despite some challenges, such as Dalle-3 dropping the character in one of the panels, the host was pleased with the results. The video concludes with the host suggesting that while Dalle-3 is impressive and imaginative, further fine-tuning with software like Photoshop may be necessary for perfect results. The host plans to use the generated panels for a voice-over project.

Takeaways

  • 🎨 Use custom instructions in Dalle-3 to create consistent characters and styles in generated images.
  • 📚 Follow a YouTuber named Jil uh, gilberry for inspiration on custom instructions for Dalle-3.
  • 🔗 Check out the linked video and custom instructions used for Dalle-3 to see the type of results achieved.
  • 🖌 Customize the output by editing the generated images to better fit the desired comic style.
  • 🚫 Be less descriptive when positioning characters to allow for more variety in their poses in Dalle-3.
  • 👥 Emphasize key character traits and art style to maintain consistency across different images.
  • 📝 Provide a clear and simple prompt to Dalle-3, including character descriptions and a direction for the story.
  • ⏱ Limit the story length to keep the output focused and prevent tangents.
  • 🧩 Paste the first four panels of the story to Dalle-3 and select the best options for the sequence.
  • 🛠️ Edit the final images in Photoshop or similar software for fine-tuning and to correct any inconsistencies.
  • 📖 Convert the panel descriptions into a narration for voice-over work to complete the storytelling process.

Q & A

  • What was the main topic of the video?

    -The main topic of the video was about using custom instructions to achieve consistent characters and styles in Dalle-3, an AI image generation tool.

  • Who is Jil uh Gilberry and why is he mentioned in the video?

    -Jil uh Gilberry is the host of a YouTube channel that focuses on AI and technology. He is mentioned because the video creator was inspired by a video from Gilberry on using custom instructions with Dalle-3.

  • What is a custom instruction in the context of Dalle-3?

    -A custom instruction in Dalle-3 is a set of guidelines or rules that the AI follows to generate images. It includes a background and an output description to guide the AI's output.

  • Why is it important to be less descriptive when describing characters for the first time in Dalle-3?

    -Being less descriptive allows the AI more flexibility in positioning the characters. If a character is described in a specific stance or with hands in a certain position, Dalle-3 tends to carry that over in subsequent images.

  • What is the significance of the 'North Star' in the context of the video?

    -The 'North Star' refers to the key elements that the video creator insists must be present in the AI-generated images, such as the main character, the superhero look, and the art style. These elements serve as guiding principles throughout the creative process.

  • How did the video creator approach the issue of Dalle-3 dropping the character in some images?

    -The video creator re-generated the images using the character's description as a 'North Star', ensuring that the character was included in the output. They also suggested using editing software like Photoshop for fine-tuning the final images.

  • What was the video creator's final step in turning the generated panels into a story?

    -The final step was to instruct the AI to convert the panels into a narration, which could then be used for voice-over work in combination with the generated images.

  • Why did the video creator decide to limit the story to a two-minute duration?

    -Limiting the story to two minutes was a way to prevent the AI from going off on tangents and to focus on building out the character and the scenario in a concise manner.

  • What is the role of the art style in the process of generating images with Dalle-3?

    -The art style provides a framework for the visual aesthetics of the generated images. It helps to ensure consistency in the look and feel of the characters and scenes across different images.

  • How did the video creator handle the issue of Dalle-3 generating images with inappropriate content?

    -The video creator manually reviewed the generated images and excluded those with inappropriate content. They also suggested adding specific instructions to avoid generating such content in future attempts.

  • What is the video creator's overall satisfaction with the results from Dalle-3?

    -The video creator was generally pleased with the results, considering them to be about 80% of what they were aiming for. They acknowledged that the process is not perfect and that some manual editing might be necessary to achieve the desired outcome.

Outlines

00:00

🎨 Custom Instructions for Character Consistency in Dolly 3

The speaker begins by welcoming the audience to their show and referencing a previous live session about using variables for character consistency in Dolly 3, an AI art tool. They mention wanting to revise their approach after watching a video by Jil (possibly a misspelling of 'Jill') Gilberry, whose YouTube channel is recommended. The speaker is inspired by Gilberry's use of custom instructions in Dolly 3 and decides to adapt this technique for creating comics. They discuss the importance of custom instructions for guiding the AI's output, emphasizing the need for a background and output description to direct the AI, like setting rules. The speaker also talks about the process of converting a custom instruction meant for text output into one suitable for comic images. They highlight the flexibility allowed when describing characters for the first time in Dolly 3, noting that less descriptive input can result in more varied character positioning. The summary concludes with the speaker's satisfaction with the output they received after some editing, which included defining the main character, art style, and using sample custom instructions.

05:03

📚 Crafting an Action Comic Story with Dolly 3

The speaker outlines their process of creating a story using Dolly 3, starting with a simple prompt directed at the AI to write a two-minute story about a character beginning his day at a coffee shop, which quickly escalates into an action sequence involving an army in dark orange and a heroic transformation. The story is intended to be descriptive and in the style of an action comic book. The speaker appreciates the AI's output, which includes a sequence of panels reminiscent of a comic book. However, they encounter an issue where the AI-generated images sometimes omit the main character, possibly due to certain triggers in the prompt. To mitigate this, they adjust their language in the prompt to be less graphic and more aligned with their 'North Star'—key elements that must remain consistent throughout the story. The speaker concludes this section by discussing the need for post-generation editing using tools like Photoshop to fine-tune the images and match the desired character appearance.

10:04

🖌️ Diverse Representation and Finalizing the Comic

The speaker emphasizes the diversity present in the AI-generated images, noting the variety of expressions and the mixture of relief and awe on the faces of the characters depicted. They mention that while Dolly 3 does an impressive job of creating imaginative and diverse content, there are times when manual adjustments are necessary. The speaker recounts their process of generating panels and adjusting the language used in the prompts to steer the AI's output towards their vision. They express satisfaction with several of the panels, particularly appreciating the camera angles and the dramatic shots created by the AI. The speaker also discusses the final step of converting the panels into a narration to be used for voice-over work, rounding out the comic creation process. They acknowledge that while the process is not perfect and requires some manual editing, they are content with achieving an 80% completion rate and plan to finalize the project by using 11 labs for further development.

Mindmap

Keywords

💡Dolly 3

Dolly 3 refers to an advanced AI system that is capable of generating images and content based on given prompts. In the context of the video, it is used to create consistent characters and styles in a comic book format. The script mentions using Dolly 3 to generate images for a story, highlighting its ability to follow custom instructions to maintain character and style consistency.

💡Variables

In the context of the video, variables are used to maintain consistency in the characters generated by Dolly 3. The speaker discusses how they initially used variables to ensure that characters remained consistent throughout the comic book creation process. Variables act as placeholders that can be filled with specific details to control the output of the AI.

💡Custom Instructions

Custom instructions are specific rules or guidelines provided to the AI to guide the output. The video mentions using custom instructions to direct Dolly 3 in creating images that fit a certain style or narrative. They are crucial for achieving the desired outcome and are likened to giving the AI a set of rules to follow.

💡Comic Book Style

The term 'comic book style' refers to the visual and narrative style typical of comic books, which includes characteristics like panel layout, dialogue bubbles, and a specific art style. In the video, the speaker aims to generate images and narratives that adhere to a modern Western comic book style, which is a key aspect of the project's aesthetic.

💡Consistency

Consistency in this context means maintaining uniformity in the characters and styles throughout the generated content. The video emphasizes the importance of consistency to ensure that characters are depicted in a uniform manner, which is achieved by using variables and custom instructions with Dolly 3.

💡Character Description

Character description involves detailing the visual and personality traits of characters in a narrative. The video script discusses the need to be less descriptive when positioning characters in Dolly 3 to allow for more variety in their depiction. However, the speaker was more descriptive with the Sentinel's suit to achieve a specific look.

💡Landscape Default

Landscape default likely refers to a setting or a default orientation for the generated images, possibly referring to the way they are arranged or the style of the background. The speaker mentions editing the landscape default, suggesting it is a parameter that can be adjusted in the AI's output settings.

💡null

null

💡North Star

In the video, 'North Star' is a metaphor for the core elements that must remain constant throughout the creative process. For the speaker, the main character, superhero look, and art style are the North Star of the comic—they are non-negotiable and guide the overall direction of the project.

💡Action Comic Book

An action comic book is a genre of comic books that emphasizes exciting action sequences and heroic characters. The video script describes the creation of a story in the style of an action comic book, with the character Blake transforming into the Sentinel to take action against an army in the city streets.

💡Panels

In comic books, panels are the individual frames that contain a portion of the story's visuals and often the dialogue. The video discusses how Dolly 3 structured the generated story into panels, which is a standard format for presenting content in comic books.

💡PG-13 Language

PG-13 language refers to dialogue or content that is suitable for viewers aged 13 and above, avoiding explicit language or themes. The video mentions avoiding PG-13 language in the custom instructions to ensure the output adheres to content guidelines, although the speaker did not include this in their prompt and encountered an issue as a result.

💡Illustrator and Photoshop

Adobe Illustrator and Photoshop are professional graphic design and photo editing software, respectively. The video suggests using these tools for fine-tuning the images generated by Dolly 3, indicating that while AI can create a rough draft, human creativity and manual editing are often necessary to achieve a polished final product.

Highlights

The presenter discussed using variables to achieve consistent character styles in Dalle-3.

A live show was conducted a few days prior to the current one, focusing on character consistency.

The presenter was inspired by a video from Jil uh, gilberry, a YouTuber with a channel dedicated to creative uses of AI.

Custom instructions were explored as a method to refine outputs from Dalle-3, particularly for creating comics.

The importance of being less descriptive when describing characters for the first time in Dalle-3 was emphasized.

The presenter's approach to describing characters involves giving them a unique style, such as a suit for the Sentinel.

The presenter provided a detailed description of the desired art style, which is a modern Western comic style.

Custom instructions were used to guide Dalle-3 in creating a consistent output, acting as a 'North Star' for the project.

The presenter shared a straightforward story prompt to test Dalle-3's capabilities in creating a two-minute action comic.

Dalle-3 was instructed to generate images in a comic book style, which it did successfully, producing 12 panels.

The presenter encountered an issue where Dalle-3 dropped a character from the generated images, requiring a re-generation.

Dalle-3 was praised for its ability to create diverse and expressive images, even when not explicitly directed to do so.

The presenter noted the need for post-processing in Photoshop or similar software for fine-tuning the generated images.

The final step was to convert the panels into a narration, which can then be used for voice-over work.

The presenter concluded by stating that the process is about 80% complete and will be further refined using 11 labs.

The importance of custom instructions and having a clear vision (North Star) for the desired outcome was emphasized as key to the process.

The presenter encouraged viewers to continue being creative and experimenting with Dalle-3.