Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to use it in a one hour video #22

Open
vreabernardo opened this issue Oct 19, 2024 · 3 comments
Open

Error when trying to use it in a one hour video #22

vreabernardo opened this issue Oct 19, 2024 · 3 comments

Comments

@vreabernardo
Copy link

Error transcribing chunk 25 in video.mp4
The length of decoder_input_ids, including special start tokens, prompt tokens, and previous tokens, is 2, and max_new_tokens is 512. Thus, the combined length of decoder_input_ids and max_new_tokens is: 514. This exceeds the max_target_positions of the Whisper model: 448. You should either reduce the length of your prompt, or reduce the value of max_new_tokens, so that their combined length is less than 448.

@echo-lalia
Copy link

This error is also happening for me. I tried it with a venv using the quick start guide and the example video, and am getting the exact same error messages.

I also tried the linked Colab notebook, and got the same error. Here is the full information that gets printed in the Colab doc:

/usr/local/lib/python3.10/dist-packages/neuspell/seq_modeling/sclstmbert.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint_data = torch.load(os.path.join(checkpoint_path, "model.pth.tar"), map_location=map_location)

transcribing...: 100%
 1/1 [00:02<00:00,  2.17s/it]
Creating .wav audio clips: 100%
 8/8 [00:00<00:00, 169.84it/s]
Transcribing video: 100%
 8/8 [00:00<00:00, 28.34it/s]

/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 0 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 1 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 2 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 3 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 4 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 5 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 6 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")
/usr/local/lib/python3.10/dist-packages/vid2cleantxt/transcribe.py:306: UserWarning: Error transcribing chunk 7 - see log for details
  warnings.warn(f"Error transcribing chunk {i} - see log for details")

SC_pipeline - transcribed audio: 100%
 1/1 [00:00<00:00, 30.69it/s]

And, the resulting text files are empty.

@echo-lalia
Copy link

Based on the log messages, I was able to find a quick fix.
Since I don't know what caused this to become broken in the first place, I'm worried this fix may be missing the real issue. But, this change works for me:

In vid2cleantxt/transcribe.py, line 236, change:

    chunk_max_new_tokens=512,

to:

    chunk_max_new_tokens=446,

This stops the above error, and allows the transcription to complete successfully.

@pszemraj
Copy link
Owner

hey, thanks for reporting this and the PR. I'll give it a look over the next few days. It's definitely possible some code got shifted around in transformers as it's been a while since I updated this

will report back here and on the PR once I have a chance to look at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants