You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained a streaming zipformer transducer on my data and converted the model to JIT by export.py with specific values of chunk_length and left_context_frames. Then I wanted to run streaming decoding using jit_pretrained_streaming.py and it seems this script does not decode the final part of audio.
But if after last call to greedy_search this condition is not satisfied anymore, all computed features remain unprocessed.
As a result, the decoding hypotheses are sometimes truncated
The text was updated successfully, but these errors were encountered:
In line chunk = int(0.25 * args.sample_rate) chunk length is set to 0.25 seconds but comment says 0.2 seconds, nothing more.
I understand that increasing tail_padding solves the problem of lost frames in decoding. But it seems the hardcoded length of 0.3 seconds does not fit well to longer decoding chunks. For example, if I have chunk_size = 64, it corresponds to 128 frames, besides, the encoder.pad_length is added to obtain T=141 frames. It is much more than hardcoded 30 frames of tail_padding and it leads to loss of last real (not padded) frames. So my suggestion is to make tail_padding dependent on and consistent with chunk_size and encoder.pad_length
Hi,
I trained a streaming zipformer transducer on my data and converted the model to JIT by export.py with specific values of chunk_length and left_context_frames. Then I wanted to run streaming decoding using jit_pretrained_streaming.py and it seems this script does not decode the final part of audio.
First of all there is a misprint in
icefall/egs/librispeech/ASR/zipformer/jit_pretrained_streaming.py
Line 218 in f84270c
Next, features are generated chunk-by-chunk and are decoded whenever the condition in
icefall/egs/librispeech/ASR/zipformer/jit_pretrained_streaming.py
Line 234 in f84270c
But if after last call to greedy_search this condition is not satisfied anymore, all computed features remain unprocessed.
As a result, the decoding hypotheses are sometimes truncated
The text was updated successfully, but these errors were encountered: