Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Colab Demo #187

Open
ProjectRobal opened this issue Jul 13, 2024 · 2 comments
Open

Broken Colab Demo #187

ProjectRobal opened this issue Jul 13, 2024 · 2 comments

Comments

@ProjectRobal
Copy link

Hello, I wanted to test metavoice using official Colab Demo but I am getting this error during inference:

using dtype=float16 Fetching 6 files: 100%  6/6 [00:00<00:00, 293.22it/s] number of parameters: 14.07M 2024-07-13 18:39:34 | INFO | DF | Loading model settings of DeepFilterNet3 2024-07-13 18:39:34 | INFO | DF | Using DeepFilterNet3 model at /root/.cache/DeepFilterNet/DeepFilterNet3 2024-07-13 18:39:34 | INFO | DF | Initializing model deepfilternet3`
2024-07-13 18:39:34 | INFO | DF | Found checkpoint /root/.cache/DeepFilterNet/DeepFilterNet3/checkpoints/model_120.ckpt.best with epoch 120
2024-07-13 18:39:34 | INFO | DF | Running on device cuda:0
2024-07-13 18:39:34 | INFO | DF | Model loaded
Using device=cuda
Loading model ...
using dtype=float16
Time to load model: 12.20 seconds
Compiling...Can take up to 2 mins.

TorchRuntimeError Traceback (most recent call last)
in <cell line: 7>()
5 torch._dynamo.config.suppress_errors = True
6
----> 7 tts = TTS()

89 frames
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py in run_node(tracer, node, args, kwargs, nnmodule)
1762 try:
1763 if op == "call_function":
-> 1764 return node.target(*args, **kwargs)
1765 elif op == "call_method":
1766 return getattr(args[0], node.target)(*args[1:], **kwargs)

TorchRuntimeError: Failed running call_function (*(FakeTensor(..., device='cuda:0', size=(2, 16, s0, 128)), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(1, 1, s0, 2048), dtype=torch.bool), 'dropout_p': 0.0}):
Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::Half and value.dtype: c10::Half instead.

from user code:
File "/content/metavoice-src/metavoice-src/fam/llm/fast_inference_utils.py", line 131, in prefill
logits = model(x, spk_emb, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/metavoice-src/fam/llm/fast_model.py", line 160, in forward
x = layer(x, input_pos, mask)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/metavoice-src/fam/llm/fast_model.py", line 179, in forward
h = x + self.attention(self.attention_norm(x), mask, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/metavoice-src/fam/llm/fast_model.py", line 222, in forward
y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information`

Moreover I am also getting this error when I tried to run model locally using instruction provided by README.

@zapakh
Copy link

zapakh commented Jul 31, 2024

I encountered the same error trying to use fast_inference.py locally in WSL. I bypassed this error by replacing q with q.half() in line 222 of fast_model.py. However, this causes an AssertionError later on due to another dtype mismatch:

Traceback (most recent call last):
  File "/mnt/f/usr/ai/metavoice-src/fam/llm/fast_inference.py", line 203, in <module>
    tts = tyro.cli(TTS)
  File "/.../pypoetry/virtualenvs/fam-r1NpSZZz-py3.10/lib/python3.10/site-packages/tyro/_cli.py", line 217, in cli
    return run_with_args_from_cli()
  File "/.../metavoice-src/fam/llm/fast_inference.py", line 101, in __init__
    self.model, self.tokenizer, self.smodel, self.model_size = build_model(
  File "/.../metavoice-src/fam/llm/fast_inference_utils.py", line 375, in build_model
    y = generate(
  File "/.../.pypoetry/virtualenvs/fam-r1NpSZZz-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/.../metavoice-src/fam/llm/fast_inference_utils.py", line 211, in generate
    next_token = prefill(model, prompt.view(1, -1).repeat(2, 1), spk_emb, input_pos, **sampling_kwargs)
...
  File "/.../pypoetry/virtualenvs/fam-r1NpSZZz-py3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 2105, in slice_scatter
    assert x.get_dtype() == src.get_dtype()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
LoweringException: AssertionError: 
  target: aten.slice_scatter.default
  args[0]: TensorBox(StorageBox(
    InputBuffer(name='arg175_1', layout=FixedLayout('cuda', torch.float16, size=[2, 16, 2048, 128], stride=[4194304, 262144, 128, 1]))
  ))
  args[1]: TensorBox(StorageBox(
    ComputedBuffer(name='buf8', layout=FlexibleLayout('cuda', torch.float32, size=[2, 16, 2048, 128], stride=[4194304, 262144, 128, 1]), data=Pointwise(
      'cuda',
      torch.float32,
      def inner_fn(index):
          i0, i1, i2, i3 = index
          tmp0 = ops.load(arg175_1, i3 + 128 * i2 + 262144 * i1 + 4194304 * i0)
          tmp1 = ops.to_dtype(tmp0, torch.float32)
          return tmp1
      ,
      ranges=[2, 16, 2048, 128],
      origin_node=index_put,
      origins={index_put}
    ))
  ))
  args[2]: 1
  args[3]: 0
  args[4]: 9223372036854775807

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

@chinmaya-growexxer
Copy link

I am also getting the same error while testing on google colab. If you have any solution please guide.
Error :
1901 try:
1902 if op == "call_function":
-> 1903 return node.target(*args, **kwargs)
1904 elif op == "call_method":
1905 return getattr(args[0], node.target)(*args[1:], **kwargs)

TorchRuntimeError: Failed running call_function (*(FakeTensor(..., device='cuda:0', size=(2, s6, s0, (s5//s6))), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(1, 1, s0, 2048), dtype=torch.bool), 'dropout_p': 0.0}):
Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::Half and value.dtype: c10::Half instead.

from user code:
File "/content/metavoice-src/fam/llm/fast_inference_utils.py", line 131, in prefill
logits = model(x, spk_emb, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 160, in forward
x = layer(x, input_pos, mask)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 179, in forward
h = x + self.attention(self.attention_norm(x), mask, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 222, in forward
y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants