-
I am manipulating the audio and adding two second breaks for better transcription of podcasts. This causes Whisper with stable-ts ignore the last part of this audio file (as of 05:40). Also the transcription has some mistakes before. The base Whisper however transcribes correctly. Maybe I should add suppress silence. Any ideas?
My code as reference:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
shifted_audio.mp4The 2nd is with |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick fix, the video recording above looks great. However, I am somehow still getting the same results with the latest build (280999c). Also when I set the beam_size to None. Is this prompt correct?
|
Beta Was this translation helpful? Give feedback.
-
I am now getting some new errors with another file when silence is suppressed. It works with
|
Beta Was this translation helpful? Give feedback.
suppress_silence=True
is default. The timestamp decoding logic was not properly implemented for beam search, but it should work properly in 280999c.shifted_audio.mp4
The 2nd is with
beam_size=5
The 3rd is with
beam_size=None
(i.e. greedy search)It appears that silence suppression works better with greedy search.