Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't compile SD2.1 VAE with Batch Input #11

Open
furkancoskun opened this issue May 22, 2023 · 3 comments
Open

Can't compile SD2.1 VAE with Batch Input #11

furkancoskun opened this issue May 22, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@furkancoskun
Copy link

furkancoskun commented May 22, 2023

I have changed the batch sizes of the trace tensor inputs in hf_pretrained_sd2_512_inference.ipynb notebook. Although
text encoder, unet and vae_post_quant_conv were compiled, vae wasn't compiled.

batch=2

import torch_neuronx
from diffusers import StableDiffusionPipeline
import torch
import os, copy

COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512_batch2'
model_id = "stabilityai/stable-diffusion-2-1-base"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
decoder = copy.deepcopy(pipe.vae.decoder)
del pipe

decoder_in = torch.randn([2, 4, 64, 64])
decoder_neuron = torch_neuronx.trace(
    decoder, 
    decoder_in, 
    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),
)

decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')
torch.jit.save(decoder_neuron, decoder_filename)

del decoder
del decoder_neuron

I get error message:

Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 234016.96it/s]
Selecting 161763 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 137856 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 272047 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 52275 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 318165 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 8981 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 323589 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
2023-05-22T06:51:17Z WARNING 28201 [SB_Allocator]: couldn't allocate every tensor in SB
2023-05-22T06:51:17Z WARNING 28201 [SB_Allocator]: disabling special handling of accumulation groups
Selecting 323589 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 2233 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 325190 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: couldn't allocate every tensor in SB and spilling can't help
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: 10 biggest memlocs:
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_312_pftranspose_5198_i6_ReloadStore32338_ReloadStore166495 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]:
mhlo_add_312_pftranspose_5198_i0_ReloadStore32560_Remat_166496 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i5_ReloadStore32107_Remat_166430 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_259_i0 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i7 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i1 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i6 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i5_ReloadStore32107_Remat_166431 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i7_ReloadStore32024_Remat_121920_Remat_166327 65536

I have used inf2.8xlarge instance and set 100GB swap space. Any ideas on this batch input compilation problem?

@jyang-aws
Copy link
Contributor

jyang-aws commented May 22, 2023

Hi furkancoskun,
Thanks for reporting the issue. We'll try to reproduce and look into it.
just to confirm, the issue shows up in the latest 2.10 neuron-sdk?

@jyang-aws jyang-aws added the bug Something isn't working label May 22, 2023
@furkancoskun
Copy link
Author

Yes, the issue shows up in 2.10

@aws-mvaria
Copy link

Hi @furkancoskun , We have reproduced the issue and are currently looking at fixing this in a future release. However, you can continue to use batch=1 in the meantime.

If you are looking to use higher batch sizes to improve performance, note that our batch=1 configuration is expected to be performant. We will continue to improve batch=1 performance as well as support multiple batches in future releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants