* This blog post is a summary of this video.

Running Dolly Mini and Mega AI Image Generation Models Locally

Table of Contents

Introduction to Dolly Mini and Mega AI Image Generation Models

Dolly Mini and Dolly Mega are AI models created by Anthropic to generate images based on text descriptions. They are inspired by DALL-E mini and DALL-E models from OpenAI, which pioneered this text-to-image generation technology. In this blog post, we will provide an overview of Dolly models, setup guide to run them locally using Docker, and demonstrate generating images using Dolly Mini and Mega.

Dolly models take a text prompt as input and output a corresponding synthetic image. Some examples of prompts and images from DALL-E 2 model include 'An astronaut playing basketball with cats in space watercolor style' and 'An astronaut riding a horse as a pencil drawing'. While DALL-E 2 access is limited currently, Dolly models aim to provide similar open-sourced alternatives.

What are Dolly AI Models for Image Generation

Dolly Mini and Dolly Mega are deep neural network models trained on vast datasets to generate images from text descriptions. They use techniques like attention and convolution from the field of deep learning to interpret the text prompts and translate them into realistic looking images. Dolly Mega is an improved version of Dolly Mini with better stability and image quality owing to its bigger model size and longer training. Both models can generate 512x512 pixel images given sufficient compute resources like multiple GPUs.

Limited Public Access to OpenAI's DALL-E Models

OpenAI's DALL-E and DALL-E 2 models pioneered text-to-image generation with remarkable results. However, public access to these models remains limited currently. One can request access to DALL-E through a waitlist system, but very few gain access. To enable wider access and allow further research into generative AI, Anthropic created and open-sourced Dolly models. While results are not on par with DALL-E yet, efforts are underway to keep improving Dolly models.

Setup Guide to Run Dolly Mini and Mega Locally

To run Dolly models locally, you need to setup Docker environment on your system along with a GPU preferably. We provide step-by-step instructions below to build Docker container and launch Jupyter notebook to generate images using text prompts.

System Requirements - Docker and GPU

You need Docker installed on your system to build image and run Dolly models. Having an Nvidia GPU with CUDA enabled drivers is highly recommended for reasonable inference times. Dolly Mega model requires about 24GB of GPU memory, so GPUs like RTX 3090 would work well. If no GPU is available, you can still run Dolly Mini on CPU which will just be slower. Building Docker image takes around 30min even on GPU system based on internet bandwidth.

Build Docker Container and Launch Notebook

Dolly Mini/Mega code repository contains Dockerfiles and scripts to build an image with the models and dependencies installed. Once built, you can launch the container and expose ports to access Jupyter notebook. The notebook has code to download models, load them onto GPUs using JAX, and define text prompts for image generation. Finally, it generates images via the inference loop by feeding prompts to models.

Generating Images using Dolly Mini and Mega Models

Provide Text Prompts as Input

Dolly models need text prompts as input to generate corresponding images. For example, "A red rose blooming in the desert" or "Two dogs playing frisbee in a park". The quality and clarity of prompt impacts output image generation capabilities. The notebook contains sample prompts, but you can customize them based on your preferences. Properly framed prompts lead to better image results generally.

Customize Image Generation Parameters

The notebook exposes certain key parameters that impact image generation from Dolly models. This includes number of images to generate per prompt, guidance scale, etc. Guidance scale controls how strongly the models adhere to the textual description. You can tweak these parameters and rerun inference loop to see impact on the generated images. Finding optimal parameters for your use case would enable generating better quality and diverse images.

Save and Display High Quality AI Generated Images

Once image generation completes, the notebook displays all output images from Dolly models based on input prompts. Images are 512x512 resolution PNGs displaying remarkable quality and fine details. You can view any image separately or save them locally or to cloud storage easily right from the Jupyter notebook user interface for further applications.

Conclusion and Future Work on Dolly Models

Summarizing Capabilities of Dolly Mini and Mega

In conclusion, Dolly Mini and Dolly Mega enable reasonably good text-to-image generation along with local setup in this early stage. The open access allows researchers to experiment and further improve such AI models. While stability and coherence still need improvement in some cases, overall results are impressive given a textual description as input to the models. With active development ongoing, capabilities are expected to reach new heights.

Upcoming Enhancements and Research Directions

Some areas of improvement for Dolly models include better handling of abstract concepts, integrating object detectors to improve image quality, and expanding domain coverage through more data. Future research would focus on techniques like diffusion models and deep learning at scale leveraging pathways, transformers etc. to make the models more robust and powerful at converting text to photorealistic images.

FAQ

Q: What hardware is required to run Dolly models locally?
A: You need a GPU with at least 24GB of VRAM and Docker installed. Nvidia GPUs work best but a powerful CPU can also run Dolly Mini.

Q: How do I customize the image generation process?
A: You can customize parameters like number of images, guidance scale, etc. and provide your own text prompts to influence the AI generated images.

Q: Can anyone get access to Dolly 2 model?
A: Currently Dolly 2 access is limited. You can apply to get into beta but access is restricted. Dolly Mini and Mega allow experimenting with similar capabilities.

Q: What improvements are expected in future for Dolly models?
A: The models are continuously being trained so quality and capabilities will keep improving over time. Check back for updates as loss decreases.

Q: What other AI models are similar to Dolly for image generation?
A: Other models like Imagen, Parti, and GLIDE offer similar text-to-image generation capabilities. Dolly focuses on high resolution photorealistic images.

Q: Can Dolly models run on a cloud GPU instance?
A: Yes, you can run the Docker container and notebooks on an Azure, AWS, or GCP cloud GPU instance instead of local hardware.

Q: What applications are best suited for Dolly models?
A: Dolly performs well for creative tasks like generating landscapes, drawings, album covers, etc. based on text prompts provided by user.

Q: What is the best way to save images created by Dolly?
A: You can right click on generated images in Jupyter notebook and open in new tab. Then save images in desired file format like JPG, PNG, etc.

Q: Can Dolly models recreate images they have never seen before?
A: Yes, Dolly attempts to generate novel images based on text prompts. Quality depends on how well the description maps to its training data.

Q: How are Dolly Mini and Mega different from DALL-E models?
A: While DALL-E focuses more on object generation, Dolly specializes in high-resolution scenic images like landscapes, drawings, album covers, etc.