Error when trying to use it in a one hour video #22

vreabernardo · 2024-10-19T10:35:36Z

Error transcribing chunk 25 in video.mp4
The length of decoder_input_ids, including special start tokens, prompt tokens, and previous tokens, is 2, and max_new_tokens is 512. Thus, the combined length of decoder_input_ids and max_new_tokens is: 514. This exceeds the max_target_positions of the Whisper model: 448. You should either reduce the length of your prompt, or reduce the value of max_new_tokens, so that their combined length is less than 448.

The text was updated successfully, but these errors were encountered:

echo-lalia · 2024-10-21T05:52:14Z

This error is also happening for me. I tried it with a venv using the quick start guide and the example video, and am getting the exact same error messages.

I also tried the linked Colab notebook, and got the same error. Here is the full information that gets printed in the Colab doc:

/usr/local/lib/python3.10/dist-packages/neuspell/seq_modeling/sclstmbert.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint_data = torch.load(os.path.join(checkpoint_path, "model.pth.tar"), map_location=map_location)

transcribing...: 100%
 1/1 [00:02<00:00,  2.17s/it]
Creating .wav audio clips: 100%
 8/8 [00:00<00:00, 169.84it/s]
Transcribing video: 100%
 8/8 [00:00<00:00, 28.34it/s]

/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 0 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 1 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 2 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 3 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 4 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 5 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 6 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 7 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")

SC_pipeline - transcribed audio: 100%
 1/1 [00:00<00:00, 30.69it/s]

And, the resulting text files are empty.

echo-lalia · 2024-10-21T06:24:06Z

Based on the log messages, I was able to find a quick fix.
Since I don't know what caused this to become broken in the first place, I'm worried this fix may be missing the real issue. But, this change works for me:

In vid2cleantxt/transcribe.py, line 236, change:

    chunk_max_new_tokens=512,

to:

    chunk_max_new_tokens=446,

This stops the above error, and allows the transcription to complete successfully.

pszemraj · 2024-10-22T00:44:55Z

hey, thanks for reporting this and the PR. I'll give it a look over the next few days. It's definitely possible some code got shifted around in transformers as it's been a while since I updated this

will report back here and on the PR once I have a chance to look at it!

echo-lalia mentioned this issue Oct 21, 2024

Modify default max tokens for whisper #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to use it in a one hour video #22

Error when trying to use it in a one hour video #22

vreabernardo commented Oct 19, 2024

echo-lalia commented Oct 21, 2024

echo-lalia commented Oct 21, 2024

pszemraj commented Oct 22, 2024

Error when trying to use it in a one hour video #22

Error when trying to use it in a one hour video #22

Comments

vreabernardo commented Oct 19, 2024

echo-lalia commented Oct 21, 2024

echo-lalia commented Oct 21, 2024

pszemraj commented Oct 22, 2024