Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for using CPU if no CUDA device is detected #56

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ModeratePrawn
Copy link

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU.
It is slow, as expected, but works.

Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.

This should allow for more people to experiment even without owning an nvidia GPU

@magnusviri
Copy link

I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. #47

@ModeratePrawn
Copy link
Author

I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. #47

Thanks for the heads up! I would recommend your PR over mine, since it has more functionality. Stable Diffusion for all!

@fragmentshader2022
Copy link

fragmentshader2022 commented Aug 23, 2022

This pull request will fix #62

@MojoJojo43
Copy link

MojoJojo43 commented Aug 26, 2022

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.

Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.

This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

@breadbrowser
Copy link

@MojoJojo43
Copy link

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.

@breadbrowser
Copy link

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.

https://huggingface.co/spaces/Shuang59/Composable-Diffusion

@ModeratePrawn
Copy link
Author

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.
Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.
This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

@MojoJojo43
Copy link

MojoJojo43 commented Aug 27, 2022

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.
Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.
This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

YOU DA MAN/WOMAN!!! Absolutely incredible work that you all have done here. Mind boggling for sure.

Quick question, I see that you guys/gals have coded a moderator to catch explicit outputs? [Safety Checker Module]
Is there any way to disable that?

My hat's off to you guys/gals.

I haven't got it running yet but fingers crossed! ;-)

@DasWookie
Copy link

Verified this patch works. Painfully slow, but it works!

@MojoJojo43
Copy link

MojoJojo43 commented Aug 28, 2022

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.
Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.
This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

Hi, I am having one helluva time and am hoping you might be able to take a look to see what I am doing wrong. Here is what I have done and am doing:

  1. I open Ubuntu.
  2. I grab the forked repository git clone https://github.com/MojoJojo43/stable-diffusion-cpu.git
  3. I change directory: cd stable-diffusion/
  4. I create the environment: conda env create -f environment.yaml
  5. I activate ldm: conda activate ldm
  6. I pull the ckpt model: curl https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media > sd-v1-4.ckpt
  7. I run a generic prompt: python scripts/txt2img.py --prompt "a photorealistic image of a lizard riding a snowboard through space" --plms --ckpt sd-v1-4.ckpt --skip_grid --n_samples 1

And after that it seems to get going but ultimately stops with an error that reads:

RuntimeError: No CUDA GPUs are available

I have been troubleshooting since 8am and it's now almost 12:00 am my time and I am still no closer to figuring this out lol.

Thanks in advance!!!

@ModeratePrawn
Copy link
Author

Can you post the full output of the error you get after you run the inference script?

@DasWookie
Copy link

There's a missed instance of
model.cuda()

That needs to also be updated to:
if torch.cuda.is_available():
model.cuda()

in img2img.

@MojoJojo43
Copy link

MojoJojo43 commented Aug 28, 2022

Can you post the full output of the error you get after you run the inference script?

Hi there..... So, I have never used Ubuntu.......don't even really know anything about the windows cmd prompt other than it can do some nifty and nasty things to your pc :-) HAHA!

I brute forced my way out of all other possibilities and then started thinking about the env folder structures and whether or not I was in the right folders while executing all the commands. So I ended up removing EVERYTHING and starting from scratch.......

AND IT WORKS!!!!!! WHOOOOOOHOOOOOO!

It's about as slow as molasses but hey, just means I need to invest in a good GPU if I want to do this for really realz ;-)

I have one final question. My machines GPU cannot be upgraded and is too old for Stable Diffusion. Would it be possible to use an eGPU? If so, what type should I be looking at to ensure compatibility? I'm not looking to spend over $500 on it either....the more affordable the better ;-)

Thanks for all the hard work and for being there to help out. Much appreciated!

@Apoo711
Copy link

Apoo711 commented Aug 31, 2022

Hi, is it possible to run this on an Oracle Cloud Ampere A1 with ubuntu as the os? And if so does having 4 cores speed it up in any way?

@Apoo711
Copy link

Apoo711 commented Sep 1, 2022

Hello, when running the img2img script I get this error, https://blazebin.io/ishkwmjgmngo. but, my txt2img script works great! So thank you very much for this fork and the time you spent on it!

SpandexWizard and others added 4 commits September 1, 2022 03:52
Made img2img.py CPU compatible as well.
Wong folder
Fix img2img.py for cpu code.
@ModeratePrawn
Copy link
Author

ModeratePrawn commented Sep 1, 2022

I just merged an updated img2img.py, so it should work now. I forgot to modify that script's code originally. Thanks to everyone who pointed it out, and SpandexWizard for applying the fix.
Someone please test and let me know.
Updated file is https://github.com/ModeratePrawn/stable-diffusion-cpu/blob/main/scripts/img2img.py here, and I think this pull request should have updated with the new file as well.

added cpu support to img2img.py
@Zylann
Copy link

Zylann commented Sep 5, 2022

My graphics card has CUDA, but doesn't have enough memory to run. I have plenty of RAM though. Would there be a way to choose to run on the CPU then?

@agajdosi
Copy link

agajdosi commented Sep 6, 2022

@Zylann Set env variable CUDA_VISIBLE_DEVICES="". To do that:

  • run export CUDA_VISIBLE_DEVICES="" and then run commands you need (will be lost once terminal is closed)
  • or just set CUDA_VISIBLE_DEVICES="" before each command you run, e.g. CUDA_VISIBLE_DEVICES="" python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

@chipmunkofdoom2
Copy link

@ModeratePrawn thanks for this. Works great on my system (AMD GPU, not Nvidia GPU). I just enter prompts like vanilla stable-diffusion and it defaults to CPU.

@Zylann
Copy link

Zylann commented Sep 6, 2022

@agajdosi Oh ok. I just thought that instead of hardcoding a particular device or behavior in multiple places of the code, that the choice could be done up-front in just one place, in turn allowing to make a choice much more easily. Thanks for the tip!

This PR also helped me tweaking the code, it works well!
I'm wondering if switching to the diffusers lib would be better than using the original repo though? (It looks easier to choose a device, and allows to choose fp16)

@Zylann Zylann mentioned this pull request Sep 7, 2022
@frob frob mentioned this pull request Sep 7, 2022
@bfung
Copy link

bfung commented Sep 7, 2022

It was painful, but I've verified that this PR at commit d68cd0d merged into CompVis/main at 69ae4b3 works.

My laptop:

macOS Catalina 10.15.7
MacBook Pro (Retina, 15-inch, Early 2013)  <-- yes, almost 10 year old computer
Processor 2.7 GHz Quad-Core Intel Core i7
Memory 16 GB 1600 MHz DDR3
Graphics Intel HD Graphics 4000 1536 MB    <-- no GPU, old graphics card

And running through the tutorial:

$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
...
/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling:   0%|                                                                  | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (3, 4, 64, 64)                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:54:07<00:00, 136.95s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:57:25<00:00, 7045.09s/it]
Sampling:  50%|██████████████████████████                          | 1/2 [1:57:25<1:57:25, 7045.09s/itData shape for PLMS sampling is (3, 4, 64, 64)                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
                                                                                                      ^[[B
PLMS Sampler:   0%|                                                             | 0/50 [00:00<?, ?it/s]




PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:52:48<00:00, 135.38s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:55:53<00:00, 6953.26s/it]
Sampling: 100%|██████████████████████████████████████████████████████| 2/2 [3:53:18<00:00, 6999.18s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples

Enjoy.

I ran this late last night, went to bed, and saw some astronauts on horses. QA'ed, it works!

Edit: added OS version

@magnusviri
Copy link

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

@bfung
Copy link

bfung commented Sep 8, 2022

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a NVIDIA GeForce GT 650m 1GB in addition to Intel integrated graphics, curious to see if torch mps device would use the NVIDIA card speed up from a 6hr cpu run 😆 (it should...). I did a small test with the Metal framework and the default MTLDevice says NVIDIA, so in theory, it should.

The OS is macOS Catalina 10.15.7, which I also learned today that CUDA is off the table unless I downgrade to macOS High Sierra 10.13.

🤞 for mps.

@chipmunkofdoom2
Copy link

chipmunkofdoom2 commented Sep 8, 2022

@ModeratePrawn is there any way to utilize more CPU cores? On Windows 11 21H2 I'm getting an average of 10s/it, which isn't too bad, but stable-diffusion only uses about ~50% of my CPU (12c/24t):

warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling:   0%|                                                                                  | 0/1 [00:00<?, ?it/s]Data shape for PLMS sampling is (1, 4, 64, 64)                                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████| 50/50 [08:08<00:00,  9.76s/it]
data: 100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Sampling: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples

Enjoy.

image

If I can increase CPU utilization by about 50% (to about 75% total), I could get a roughly proportional decrease in run time.

@MojoJojo43
Copy link

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a `NVIDIA GeForce GT 650m

From what I learned from my PC, your GT 650M is only cuda compatible up to 3.0 and pytorch dropped cuda 3.0 years ago in pytorch version 0.31. I'm curious if you will actually get your GPU to run Diffusion.

@bfung
Copy link

bfung commented Sep 13, 2022

Following up on my comment and to recap the information:

This PR works on CPU on a 10 year old mac

I have a ~10 year old mac laptop with the following specs:

macOS Catalina 
Version 10.15.7

MacBook Pro (Retina, 15-inch, Early 2013)
Processor 2.7 GHz Quad-Core Intel Core i7
Memory    16 GB 1600 MHz DDR3
Graphics  NVIDIA GeForce GT 650M 1 GB
          Intel HD Graphics 4000 1536 MB

GPU on 10 year old mac?

will you try the https://github.com/lstein/stable-diffusion version and see if it works?

@magnusviri @MojoJojo43

With my old laptop hardware and OS version, the https://github.com/lstein/stable-diffusion version didn't work for me when using python3 scripts/dream.py --device mps. A couple error message appeared, but looks like the errors are thrown from the underlying libraries. It's probably not worth the effort in trying to support my very old laptop. I found several things:

  1. pytorch backend support
    With a couple different scripts, such as https://pytorch.org/docs/stable/notes/mps.html, the torch.backend.mps is only available for macOS 12.3+, while I'm stuck on 10.15.
  2. I tried using https://github.com/geohot/tinygrad to swap out the pytorch backend to use OpenCL, but again, my hardware/OS version is so old that some of the OpenCL stuff doesn't really work and would need a lot of coding to get it to work.

On the bright side, it looks like other people have gotten the mps backend to work with lstein/stable-diffusion. For example:

Hope this info helps - cheers all.

enzymezoo-code added a commit to enzymezoo-code/stable-diffusion that referenced this pull request Sep 25, 2022
@ryanhugh
Copy link

Can this be merged? support for MPS would be great too. thanks.

@magnusviri
Copy link

Can this be merged? support for MPS would be great too. thanks.

MPS support is in A1111, InvokeAI, ComfyUI, and others I'm sure. This CompVis repo is abandoned.

@ryanhugh
Copy link

Oh my bad, I just Googled "stable diffusion github", this repo was the first result, so I tried to use it. I'll check out those - Thanks!

image

@svankamamidi
Copy link

On my windows 10 machine without GPU and setting
(ldm) C:\text2img\stable-diffusion>set CUDA_VISIBLE_DEVICES=""

I still get below error

  • This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "scripts/txt2img.py", line 352, in
    main()
    File "scripts/txt2img.py", line 246, in main
    model = load_model_from_config(config, f"{opt.ckpt}")
    File "scripts/txt2img.py", line 64, in load_model_from_config
    model.cuda()
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 127, in cuda
    return super().cuda(device=device)
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 688, in cuda
    return self._apply(lambda t: t.cuda(device))
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
    module._apply(fn)
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
    module._apply(fn)
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
    module._apply(fn)
    [Previous line repeated 1 more time]
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply
    param_applied = fn(param)
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 688, in
    return self.apply(lambda t: t.cuda(device))
    File "C:\installed\miniconda3\envs\ldm\lib\site-packages\torch\cuda_init
    .py", line 216, in _lazy_init
    torch._C._cuda_init()
    RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.