Lumina: RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200 #10650

nitinmukesh · 2025-01-25T12:03:18Z

Describe the bug

If I use width=1920 and height=1080, error reported

Documentation says both should be divisible by 8
https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/lumina/pipeline_lumina.py

if height % 8 != 0 or width % 8 != 0:
raise ValueError(f"height and width have to be divisible by 8 but are {height} and {width}.")

The following works, 1024x576 and 1024x2048.

Reproduction

import torch
from diffusers import LuminaText2ImgPipeline

pipe = LuminaText2ImgPipeline.from_pretrained(
	"Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompt = "Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures"
image = pipe(
    prompt=prompt,
    width=1920,
    height=1080
).images[0]

Logs

File "C:\aiOWN\diffuser_webui\lumina.py", line 11, in <module>
    prompt = "Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures"
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina\pipeline_lumina.py", line 846, in __call__
    noise_pred = self.transformer(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\accelerate\hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\models\transformers\lumina_nextdit2d.py", line 310, in forward
    hidden_states, mask, img_size, image_rotary_emb = self.patch_embedder(hidden_states, image_rotary_emb)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\models\embeddings.py", line 609, in forward
    x = x.view(batch_size, channel, height_tokens, patch_height, width_tokens, patch_width).permute(
RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200

System Info

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.48.1
- Accelerate version: 1.4.0.dev0
- PEFT version: not installed
- Bitsandbytes version: 0.45.0
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

No response

The text was updated successfully, but these errors were encountered:

victolee0 · 2025-01-25T13:28:49Z

I think the check_inputs function is incorrect. Since the VAE reduces the image size by a factor of 8 and patch_size=2, the height and width must be divisible by 16.

nitinmukesh added the bug Something isn't working label Jan 25, 2025

victolee0 linked a pull request Jan 25, 2025 that will close this issue

fix check_inputs func in LuminaText2ImgPipeline #10651

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lumina: RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200 #10650

Lumina: RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200 #10650

nitinmukesh commented Jan 25, 2025 •

edited

Loading

victolee0 commented Jan 25, 2025

Lumina: RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200 #10650

Lumina: RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200 #10650

Comments

nitinmukesh commented Jan 25, 2025 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

victolee0 commented Jan 25, 2025

nitinmukesh commented Jan 25, 2025 •

edited

Loading