Python data manipulation #110

pokemp · 2023-03-22T10:28:24Z

pokemp
Mar 22, 2023

First of all, thank you very much for this amazing version of whisper.

I just wanted to ask you for the word_level function, resulting to a srt file with timecodes for each word.

If i want to format the output, it's the same logic as original whisper?

Every word is a segment? 
TimeCode IN for each word is "segment['start']" and TC OUT is segment['end'] ?

Thank you very much for your time

jianfch · 2023-03-22T16:11:52Z

By default it show word-level and segment-level timing. So you just need to disable one to show only the other.
To show word-level only:

result = model.transcribe('audio.mp3')
result.to_srt_vtt('output.srt', segment_level=False)

0 replies