You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if height % 8 != 0 or width % 8 != 0:
raise ValueError(f"height and width have to be divisible by 8 but are {height} and {width}.")
The following works, 1024x576 and 1024x2048.
Reproduction
import torch
from diffusers import LuminaText2ImgPipeline
pipe = LuminaText2ImgPipeline.from_pretrained(
"Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompt = "Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures"
image = pipe(
prompt=prompt,
width=1920,
height=1080
).images[0]
Logs
File "C:\aiOWN\diffuser_webui\lumina.py", line 11, in<module>
prompt = "Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures"
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina\pipeline_lumina.py", line 846, in __call__
noise_pred = self.transformer(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\accelerate\hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\models\transformers\lumina_nextdit2d.py", line 310, in forward
hidden_states, mask, img_size, image_rotary_emb = self.patch_embedder(hidden_states, image_rotary_emb)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\models\embeddings.py", line 609, in forward
x = x.view(batch_size, channel, height_tokens, patch_height, width_tokens, patch_width).permute(
RuntimeError: shape '[2, 4, 67, 2, 120, 2]' is invalid for input of size 259200
System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.48.1
- Accelerate version: 1.4.0.dev0
- PEFT version: not installed
- Bitsandbytes version: 0.45.0
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
No response
The text was updated successfully, but these errors were encountered:
I think the check_inputs function is incorrect. Since the VAE reduces the image size by a factor of 8 and patch_size=2, the height and width must be divisible by 16.
Describe the bug
If I use width=1920 and height=1080, error reported
Documentation says both should be divisible by 8
https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/lumina/pipeline_lumina.py
if height % 8 != 0 or width % 8 != 0:
raise ValueError(f"
height
andwidth
have to be divisible by 8 but are {height} and {width}.")The following works, 1024x576 and 1024x2048.
Reproduction
Logs
System Info
Who can help?
No response
The text was updated successfully, but these errors were encountered: