Skip to content

MAPS-research/sd-webui-ditail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sd-webui-ditail

The official implementation of the 'Diffusion Cocktail' (Ditail) extension for Automatic 1111 Webui.

Ditail offers a training-free method for novel image generations and fine-grained manipulations of content/style, enabling flexible integrations of existing pre-trained Diffusion models and LoRAs.

Two use cases of Ditail are as follows:

(a) Stylizing Real/Generated Images (SD Checkpoint + Optional LoRA)

Ditail Intro Figure

(b) Prompt-based Image Manipulation

Ditail Intro Figure

Quick Links

Install

You can directly find and install from "Extensions" tab in the webui. We tagged this extension as script, tab and editing.

Ditail Pipeline Illustration

OR

(from Mikubill/sd-webui-controlnet)

  1. Open "Extensions" tab.
  2. Open "Install from URL" tab in the tab.
  3. Enter https://github.com/MAPS-research/sd-webui-ditail.git to "URL for extension's git repository".
  4. Press "Install" button.
  5. Wait 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\adetailer. Use Installed tab to restart".
  6. Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI". (The next time you can also use this method to update extensions.)
  7. Completely restart A1111 webui including your terminal. (If you do not know what is a "terminal", you can reboot your computer: turn your computer off and turn it on again.)

Hyperparameters Explanation

Illustration for Ditail Pipeline

Ditail Pipeline Illustration

In this example, the image is transformed from a photorealistic style to an anime style. Both the inversion prompt and generation prompt are set to ‘a glass of orange juice.’ You can optionally use different prompts for inversion and generation. For more details, refer to ‘Positive Inversion Prompt’ and ‘Negative Inversion Prompt’ below.

Basic Options:

Name Description Default Value
Content Image The image to be manipulated. Ditail will keep the content and structure of this image while changing the style. An image of cocktail we took
Source Checkpoint The checkpoint for DDIM inversion. The checkpoint that matches the content image's style is recommended. None = the same checkpoint as that in main ui
Source VAE The VAE checkpoint for DDIM inversion. The checkpoint that matches the source checkpoint is recommended. None = 'Automatic'
Positive Prompt Scaling (Alpha) The scaling factor for the positive prompt. The larger the value, more content and structure of the content image will be preserved. Value between 3-7 is a good starting point. 5.0
Negative Prompt Scaling (Beta) The scaling factor for the negative prompt. The larger the value, more content and structure of the content image will be preserved. However, beta being too large might lead to weird colors. 0.5

Extra Options:

Name Description Default Value
Positive Inversion Prompt The positive prompt for DDIM inversion. None = Prompt for generation from main UI
Negative Inversion Prompt The negative prompt for DDIM inversion. None = Negative prompt for generation from main UI
Convolutional Ratio Controls the ratio of steps where we inject the features from the content image to convolutional layers. 0.8
Attention Ratio Controls the ratio of steps where we inject the features from the content image to attention layers. 0.5

Notes:

  • Ditail plugin works in both txt2img and img2img modes. However, we haven't tested inpainting yet.
  • When Ditial plugin is enabled, the 'Sampling method' and 'Schedule type' in the main ui will be disabled and set to 'DDIM' and 'Automatic' respectively.
  • The source image will be automatically resized and center cropped to the be same as the input 'width' and 'height' in the main ui.
  • DDIM inversion steps will be the same as the 'Sampling steps' in the main ui.
  • 'Source checkpoint' should has the same structure as the 'Stable Diffusion checkpoint' in the main ui. We only support SD1.5 checkpoints for now. Please stay tuned for the updates.
  • To use LoRA, simply add them in the main ui prompt as you would normally do.

Known Issues

  • Only support SD1.5 checkpoints for now. We are working on supporting SDXL checkpoints.
  • If the length of the conditions / chunk sizes are not matched, the plugin will not work properly. Please keep your prompts within 75 words for now. (Thanks to @w-e-w for pointing out this issue)
  • FP8 checkpoints are not supported yet.

TODO:

  • Fix prompt length mismatch issue
  • Add more options for resizing and cropping the source image
  • Support for batch count and batch size larger than 1
  • SDXL support
  • Flux support
  • FP8 checkpoints support
  • We are also working on developing extension for comfyui. Stay tuned!

Acknowledgement

This work is supported in part by the Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning at NYU Shanghai, NYU Shanghai Boost Fund, and NYU HPC resources and services.

Citation

If you find our work helpful, please consider cite it as follows:

@article{liu2023ditail,
  title={Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations},
  author={Liu, Haoming and Guo, Yuanhe and Wang, Shengjie and Wen, Hongyi},
  journal={arXiv preprint arXiv:2312.08873},
  year={2023}
}

About

Diffusion Cocktail Automatic 1111 Webui Extension

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages