Skip to content

Commit

Permalink
add DeciDiffusion notebook (#1379)
Browse files Browse the repository at this point in the history
* add DeciDiffusion notebook

* text

* apply review comments
  • Loading branch information
eaidova authored Oct 19, 2023
1 parent 55d9755 commit 3ca2bb4
Show file tree
Hide file tree
Showing 9 changed files with 1,366 additions and 1 deletion.
1 change: 1 addition & 0 deletions .ci/ignore_convert_execution.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,5 @@ notebooks/255-mms-massively-multilingual-speech/255-mms-massively-multilingual-s
notebooks/256-bark-text-to-audio/256-bark-text-to-audio.ipynb
notebooks/257-llava-multimodal-chatbot/257-llava-multimodal-chatbot.ipynb
notebooks/258-blip-diffusion-subject-generation/258-blip-diffusion-subject-generation.ipynb
notebooks/259-decidiffusion-image-generation/259-decidiffusion-image-generation.ipynb
notebooks/404-style-transfer-webcam/404-style-transfer.ipynb
1 change: 1 addition & 0 deletions .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
256-bark-text-to-audio
257-llava-multimodal-chatbot
258-blip-diffusion-subject-generation
259-decidiffusion-image-generation
301-tensorflow-training-openvino
305-tensorflow-quantization-aware-training
404-style-transfer-webcam
1 change: 1 addition & 0 deletions .ci/ignore_treon_linux.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,5 @@
256-bark-text-to-audio
257-llava-multimodal-chatbot
258-blip-diffusion-subject-generation
259-decidiffusion-image-generation
404-style-transfer-webcam
1 change: 1 addition & 0 deletions .ci/ignore_treon_mac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@
256-bark-text-to-audio
257-llava-multimodal-chatbot
258-blip-diffusion-subject-generation
259-decidiffusion-image-generation
404-style-transfer-webcam
3 changes: 2 additions & 1 deletion .ci/ignore_treon_win.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@
255-mms-massively-multilingual-speech
256-bark-text-to-audio
257-llava-multimodal-chatbot
258-blip-diffusion-subject-generation
258-blip-diffusion-subject-generation
259-decidiffusion-image-generation
5 changes: 5 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ deblurred
Deblurring
deblurring
deconvolution
decidiffusion
Deci
DeciDiffusion
DeciDiffusion's
deduplicated
DeepFloyd
DeepLabV
Expand Down Expand Up @@ -543,6 +547,7 @@ TorchMetrics
TorchScript
torchvision
TorchVision
transformative
TTS
tunable
tv
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Check out the latest notebooks that show how to optimize and deploy popular mode
| [Bark Text-to-Speech](notebooks/256-bark-text-to-audio/)<br> | Text-to-Speech generation using Bark and OpenVINO™ | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/9a770279-0045-480e-95f2-1a2f2d0a5115 width=300>
| [LLaVA Multimodal Chatbot](notebooks/257-llava-multimodal-chatbot/)<br> | Visual-language assistant with LLaVA and OpenVINO™ | <img src=https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png width=300>
| [BLIP-Diffusion - Subject-Driven Generation](notebooks/258-blip-diffusion-subject-generation)<br> | Subject-driven image generation and editing using BLIP Diffusion and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/0ecf621f-b544-44ae-8258-8a49be704989" width=300 />
| [DeciDiffusion](notebooks/259-decidiffusion-image-generation/)<br> | Image generation with DeciDiffusion and OpenVINO™ | <img src=https://github.com/huggingface/optimum-intel/assets/29454499/cd734349-9954-4656-ab96-08a903e846ef width=300> |

## Table of Contents

Expand Down Expand Up @@ -193,6 +194,7 @@ Demos that demonstrate inference on a particular model.
| [256-bark-text-to-audio](notebooks/256-bark-text-to-audio)<br> | Text-to-Speech generation using Bark and OpenVINO™ | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/9a770279-0045-480e-95f2-1a2f2d0a5115 width=225> |
| [257-llava-multimodal-chatbot](notebooks/257-llava-multimodal-chatbot)<br> | Visual-language assistant with LLaVA and OpenVINO™ | <img src=https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png width=225> |
| [258-blip-diffusion-subject-generation](notebooks/258-blip-diffusion-subject-generation)<br> | Subject-driven image generation and editing using BLIP Diffusion and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/0ecf621f-b544-44ae-8258-8a49be704989" width=225 /> |
| [259-decidiffusion-image-generation](notebooks/259-decidiffusion-image-generation)<br> | Image generation with DeciDiffusion and OpenVINO™ | <img src=https://github.com/huggingface/optimum-intel/assets/29454499/cd734349-9954-4656-ab96-08a903e846ef width=225> |

<div id='-model-training'></div>

Expand Down

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions notebooks/259-decidiffusion-image-generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Image Generation with DeciDiffusion

DeciDiffusion 1.0 is a diffusion-based text-to-image generation model. While it maintains foundational architecture elements from Stable Diffusion, such as the Variational Autoencoder (VAE) and CLIP's pre-trained Text Encoder, DeciDiffusion introduces significant enhancements. The primary innovation is the substitution of U-Net with the more efficient U-Net-NAS, a design pioneered by Deci. This novel component streamlines the model by reducing the number of parameters, leading to superior computational efficiency.

The domain of text-to-image generation, with its transformative potential in design, art, and advertising, has captivated both experts and laypeople. This technology’s allure lies in its ability to effortlessly transform text into vivid images, marking a significant leap in AI capabilities. While Stable Diffusion’s open-source foundation has spurred many advancements, it grapples with practical deployment challenges due to its heavy computational needs. These challenges lead to notable latency and cost concerns in training and deployment. In contrast, DeciDiffusion stands out. Its superior computational efficiency ensures a smoother user experience and boasts an impressive reduction of nearly 66% in production costs.

More details about model can be found in [blog post](https://deci.ai/blog/decidiffusion-1-0-3x-faster-than-stable-diffusion-same-quality/) and [model card](https://huggingface.co/Deci/DeciDiffusion-v1-0).

In this tutorial we consider how to convert and run DeciDiffusion using OpenVINO, making text-to-image generative applications more accessible and feasible.
It considers two approaches of image generation using an AI method called `diffusion`:

* `Text-to-image` generation to create images from a text description as input.
* `Text-guided Image-to-Image` generation to create an image, using text description and initial image semantic.

The complete pipeline of this demo is shown below.

<p align="center">
<img src="https://user-images.githubusercontent.com/29454499/260981188-c112dd0a-5752-4515-adca-8b09bea5d14a.png"/>
</p>


This is a demonstration in which you can type a text description (and provide input image in case of Image-to-Image generation) and the pipeline will generate an image that reflects the context of the input text.
Step-by-step, the diffusion process will iteratively denoise latent image representation while being conditioned on the text embeddings provided by the text encoder.

The following image shows an example of the input sequence and corresponding predicted image.

**Input text:** Highly detailed realistic portrait of a grumpy small, adorable cat with round, expressive eyes

<p align="center">
<img src="https://github.com/huggingface/optimum-intel/assets/29454499/cd734349-9954-4656-ab96-08a903e846ef"/>
</p>

## Notebook Contents

This notebook demonstrates how to convert and run [DeciDiffusion](https://huggingface.co/Deci/DeciDiffusion-v1-0) using OpenVINO.

The notebook contains the following steps:

1. Convert PyTorch models to OpenVINO Intermediate Representation using OpenVINO Converter Tool (OVC).
2. Prepare Inference Pipeline.
3. Run Inference pipeline with OpenVINO.
4. Run Interactive demo for DeciDiffusion model

The notebook also provides interactive interface for image generation based on user input (text prompts and source image, if required).

**Text-to-Image Generation Example**
![text2img.png](https://user-images.githubusercontent.com/29454499/260905732-f291d316-8835-4872-8d9b-8a1214448bfd.png)

**Image-to-Image Generation Example**
![img2img.png](https://user-images.githubusercontent.com/29454499/260905907-4b7835c6-1f63-4d00-a1ec-ccc4d7fca182.png)



## Installation Instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

0 comments on commit 3ca2bb4

Please sign in to comment.