Can't compile SD2.1 VAE with Batch Input #11

furkancoskun · 2023-05-22T08:49:03Z

I have changed the batch sizes of the trace tensor inputs in hf_pretrained_sd2_512_inference.ipynb notebook. Although
text encoder, unet and vae_post_quant_conv were compiled, vae wasn't compiled.

batch=2

import torch_neuronx
from diffusers import StableDiffusionPipeline
import torch
import os, copy

COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512_batch2'
model_id = "stabilityai/stable-diffusion-2-1-base"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
decoder = copy.deepcopy(pipe.vae.decoder)
del pipe

decoder_in = torch.randn([2, 4, 64, 64])
decoder_neuron = torch_neuronx.trace(
    decoder, 
    decoder_in, 
    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),
)

decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')
torch.jit.save(decoder_neuron, decoder_filename)

del decoder
del decoder_neuron

I get error message:

Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 234016.96it/s]
Selecting 161763 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 137856 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 272047 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 52275 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 318165 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 8981 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 323589 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
2023-05-22T06:51:17Z WARNING 28201 [SB_Allocator]: couldn't allocate every tensor in SB
2023-05-22T06:51:17Z WARNING 28201 [SB_Allocator]: disabling special handling of accumulation groups
Selecting 323589 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 2233 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Selecting 325190 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: couldn't allocate every tensor in SB and spilling can't help
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: 10 biggest memlocs:
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_312_pftranspose_5198_i6_ReloadStore32338_ReloadStore166495 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]:
mhlo_add_312_pftranspose_5198_i0_ReloadStore32560_Remat_166496 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i5_ReloadStore32107_Remat_166430 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_259_i0 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i7 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i1 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i6 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i5_ReloadStore32107_Remat_166431 65536
2023-05-22T06:51:39Z FATAL 28201 [SB_Allocator]: mhlo_add_294_i7_ReloadStore32024_Remat_121920_Remat_166327 65536

I have used inf2.8xlarge instance and set 100GB swap space. Any ideas on this batch input compilation problem?

The text was updated successfully, but these errors were encountered:

jyang-aws · 2023-05-22T15:40:28Z

Hi furkancoskun,
Thanks for reporting the issue. We'll try to reproduce and look into it.
just to confirm, the issue shows up in the latest 2.10 neuron-sdk?

furkancoskun · 2023-05-24T06:45:34Z

Yes, the issue shows up in 2.10

aws-mvaria · 2023-05-30T17:54:16Z

Hi @furkancoskun , We have reproduced the issue and are currently looking at fixing this in a future release. However, you can continue to use batch=1 in the meantime.

If you are looking to use higher batch sizes to improve performance, note that our batch=1 configuration is expected to be performant. We will continue to improve batch=1 performance as well as support multiple batches in future releases.

jyang-aws added the bug Something isn't working label May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't compile SD2.1 VAE with Batch Input #11

Can't compile SD2.1 VAE with Batch Input #11

furkancoskun commented May 22, 2023 •

edited

Loading

jyang-aws commented May 22, 2023 •

edited

Loading

furkancoskun commented May 24, 2023

aws-mvaria commented May 30, 2023

Can't compile SD2.1 VAE with Batch Input #11

Can't compile SD2.1 VAE with Batch Input #11

Comments

furkancoskun commented May 22, 2023 • edited Loading

jyang-aws commented May 22, 2023 • edited Loading

furkancoskun commented May 24, 2023

aws-mvaria commented May 30, 2023

furkancoskun commented May 22, 2023 •

edited

Loading

jyang-aws commented May 22, 2023 •

edited

Loading