Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process.
Diffusion models are inspired by non-equilibrium thermodynamics. They define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise. Unlike VAE or flow models, diffusion models are learned with a fixed procedure and the latent variable has high dimensionality (same as the original data).
Now get deeper into the diffusion models: diffusion models consists of two processes as shown in the image below:
- Forward process (with red lines).
- Reverse process (with blue lines).
you have two different ways to run with interface:
- using gradio: run this command
python sd_gradio.py
- using fastapi: enter the directory 'stable_diffusion_api', then run this command
uvicorn sd_api:app --reload
assuming that you already downloaded the checkpoints, and you are in the directory. in the fastapi option, you have like an engine, creating your project, select your hardware, and then you have the generate image page
Note
Download vocab.json and merges.txt from huggingface/stable_diffusion.tokenizer and Download v1-5-pruned-emaonly.ckpt from huggingface/stable_diffusion.checpoints and save it in the data folder
@misc{ho2020denoising,
title = {Denoising Diffusion Probabilistic Models},
author = {Jonathan Ho and Ajay Jain and Pieter Abbeel},
year = {2020},
eprint = {2006.11239},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{https://doi.org/10.48550/arxiv.2204.11824,
doi = {10.48550/ARXIV.2204.11824},
url = {https://arxiv.org/abs/2204.11824},
author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Retrieval-Augmented Diffusion Models},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}