Replies: 1 comment
-
I think if you use tokenizer with predict_timestamps=True during training, it should be like this: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been experimenting with the helpful tutorial on tuning Whisper with Peft:
peft/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
I have come across something suspect: when I use the evaluation code in the notebook to evaluate my tuned model, I get much better results than from creating a pipeline. I get a WER of .34 for the former, .47 for the latter.
I have tried all sorts of things but I have narrowed it down to the following argument when calling model.generate()
decoder_input_ids=batch["labels"][:, :4].to("cuda")
I believe the first 3 elements of the labels are the usual special IDs specifying task/language etc. These are the same for all inputs. However the 4th element differs for each input, and I believe it is actually the first token in the ground-truth transcript. Therefore the model is cheating by getting a prompt.
@pacman100 Does this sound right? I am not sure if I have interpreted the input IDs correctly. If not, what is the reason for keeping the first 4 tokens from the labels during inference?
Thanks in advance for any responses!
Beta Was this translation helpful? Give feedback.
All reactions