Skip to content

Latest commit

 

History

History
 
 

stable_diffusion

Stable_diffusion Model

Introduction

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

Details

The entry point to functional_stable_diffusion model is UNet2DConditionModel in models/demos/wormhole/stable_diffusion/tt2/ttnn_functional_unet_2d_condition_model.py. The model picks up certain configs and weights from huggingface pretrained model. We have used CompVis/stable-diffusion-v1-4 version from huggingface as our reference.

Inputs

Inputs by default are provided from input_data.json. If you wish to change the inputs, provide a different path to test_demo.We do not recommend modifying input_data.json file.

How to Run

To run the demo, make sure to build the project, activate the environment, and set the appropriate environment variables. For more information, refer installation and build guide.

Use pytest --disable-warnings --input-path="models/demos/wormhole/stable_diffusion/demo/input_data.json" models/demos/wormhole/stable_diffusion/demo/demo.py::test_demo to run the demo.

If you wish to run the demo with a different input use pytest --disable-warnings --input-path="<address_to_your_json_file.json>" models/demos/wormhole/stable_diffusion/demo/demo.py::test_demo

If you would like to run an interactive demo which will prompt you for the input, use pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo

Our second demo is designed to run poloclub/diffusiondb dataset, run this with pytest --disable-warnings models/demos/wormhole/stable_diffusion/demo/demo.py::test_demo_diffusiondb.

If you wish to run for num_prompts samples and num_inference_steps denoising steps, use pytest --disable-warnings models/demos/wormhole/stable_diffusion/demo/demo.py::test_demo_diffusiondb[<num_prompts>-<num_inference_steps>]

Note: ttnn stable diffusion utilizes PNDMScheduler and requires num_inference_steps to be greater than or equal to 4. Reference

Metrics Interpretation

FID Score (Fréchet Inception Distance) evaluates the quality of generated images by measuring the similarity between their feature distributions and those of real images. A lower FID score indicates better similarity between generated and real images. For more information, refer FID Score.

CLIP Score measures the similarity between the generated images and the input prompts. Higher CLIP scores indicate better alignment between the generated images and the provided text prompts. For more information, refer CLIP Score.