Yujun Shi Chuhui Xue Jiachun Pan Wenqing Zhang Vincent Y. F. Tan Song Bai
This is a research project, NOT a commercial product.
It is recommended to run our code on a Nvidia GPU with a linux system. We have not yet tested on other configurations.
To install the required libraries, simply run the following command:
conda env create -f environment.yaml
conda activate dragdiff
Before running DragDiffusion, you might need to set up "accelerate" with the following command:
accelerate config
In all our experiments, we used the following configuration for "accelerate":
- To train a LoRA on our input image, we first put the image under a folder. Note that this folder should ONLY contain this one image.
- Then, we set "SAMPLE_DIR" and "OUTPUT_DIR" in the script "lora/train_lora.sh" to be proper values. "SAMPLE_DIR" should be the directory containing our input image; "OUTPUT_DIR" should be the directory where we want to save the trained LoRA.
- Also, we need to set the option "--instance_prompt" in the script "lora/train_lora.sh" to be a proper prompt. Note that this prompt does NOT have to be a complicated one. Examples of prompts (i.e., prompts used in our Demo video) are given in "lora/samples/prompts.txt".
- Finally, After the "lora/train_lora.sh" file has been configured properly, run the following command to train a LoRA:
bash lora/train_lora.sh
After training the LoRA, we can now run the following command to start the gradio user interface:
python3 drag_ui_real.py
Please refer to our Demo video to see how to do the "drag" editing.
The editing process is consist of the following steps:
- Drop our input image into the left-most box.
- Draw a mask in the left-most box to specify the editable areas.
- Click handle and target points in the middle box. Also, you may reset all points by clicking "Undo point".
- Input "prompt" and "lora path". "lora path" is the directory storing our trained LoRA; "prompt" should be the same prompt we used to train our LoRA.
- Finally, click the "Run" button to run our algorithm. Edited results will be displayed in the right-most box.
Explanation for parameters in the user interface:
Parameter | Explanation |
---|---|
prompt | The prompt describing the user input image (This needs to be the same as the prompt used to train LoRA). |
lora_path | The path to the trained LoRA |
n_pix_step | Maximum number of steps of motion supervision. Increase this value if handle points have not been "dragged" to desired position. |
lam | The regularization coefficient controlling unmasked region stays unchanged. Increase this value if the unmasked region has changed more than what was desired (do not have to tune in most cases). |
n_actual_inference_step | Number of DDIM inversion steps performed (do not have to tune in most cases). |
This work is inspired by the amazing DragGAN. The lora training code is modified from an example of diffusers. Image samples are collected from unsplash, pexels, pixabay. Finally, a huge shout-out to all the amazing open source diffusion models and libraries.
Code related to the DragDiffusion algorithm is under Apache 2.0 license.
@article{shi2023dragdiffusion,
title={DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing},
author={Shi, Yujun and Xue, Chuhui and Pan, Jiachun and Zhang, Wenqing and Tan, Vincent YF and Bai, Song},
journal={arXiv preprint arXiv:2306.14435},
year={2023}
}
- Upload trained LoRAs of our examples
- Support arbitrary size input
- Integrate the lora training function into the user interface.
- Try to use another user interface that can respond faster.
For any questions on this project, please contact Yujun ([email protected])