Randomly getting error while generating word timestamps #59

rahulmate · 2024-04-09T17:36:35Z

code
`model = whisper_s2t.load_model(model_identifier="large-v2", asr_options={'word_timestamps': True},backend='TensorRT-LLM')

files = ['output.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
lang_codes=lang_codes,
tasks=tasks,
initial_prompts=initial_prompts,
batch_size=16)`

For above code sometime it throws in below error for same file. Is there any explanation for it.
`RuntimeError Traceback (most recent call last)
Cell In[15], line 10
8 initial_prompts = [None]
9 start =time.time()
---> 10 out = model.transcribe_with_vad(files,
11 lang_codes=lang_codes,
12 tasks=tasks,
13 initial_prompts=initial_prompts,
14 batch_size=16)
15 end =time.time()
16 print(f"batch :: {16} time:: {end-start}")

File ~/temp_triton/triton_env/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/init.py:171, in WhisperModel.transcribe_with_vad(self, audio_files, lang_codes, tasks, initial_prompts, batch_size)
169 for signals, prompts, seq_len, seg_metadata, pbar_update in self.data_loader(audio_files, lang_codes, tasks, initial_prompts, batch_size=batch_size):
170 mels, seq_len = self.preprocessor(signals, seq_len)
--> 171 res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
173 for res_idx, _seg_metadata in enumerate(seg_metadata):
174 responses[_seg_metadata['file_id']].append({**res[res_idx],
175 'start_time': round(_seg_metadata['start_time'], 3),
176 'end_time': round(_seg_metadata['end_time'], 3)})

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/tensorrt/model.py:248, in WhisperModelTRT.generate_segment_batched(self, features, prompts, seq_lens, seg_metadata)
246 text_tokens = [[_t for _t in x[0] if t < self.tokenizer.eot]+[self.tokenizer.eot] for x in result]
247 sot_seqs = [tuple([-4:]) for _ in prompts]
--> 248 word_timings = self.align_words(features, texts, text_tokens, sot_seqs, seq_lens, seg_metadata)
250 for _response, _word_timings in zip(response, word_timings):
251 _response['word_timestamps'] = _word_timings

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/tensorrt/model.py:200, in WhisperModelTRT.align_words(self, features, texts, text_tokens, sot_seqs, seq_lens, seg_metadata)
198 token_alignments = [[] for _ in seg_metadata]
199 for start_seq, req_idx in start_seq_wise_req.items():
--> 200 res = self.aligner_model.align(ctranslate2.StorageView.from_array(features[req_idx]),
201 start_sequence=list(start_seq),
202 text_tokens=[text_tokens[_] for _ in req_idx],
203 num_frames=list(seq_lens[req_idx].detach().cpu().numpy()),
204 median_filter_width=7)
206 for _res, _req_idx in zip(res, req_idx):
207 token_alignments[_req_idx] = _res

RuntimeError: No position encodings are defined for positions >= 448, but got position 454`

aleksandr-smechov · 2024-04-09T23:04:00Z

You can try adjusting the align_words method here to this:

for start_seq, req_idx in start_seq_wise_req.items():
    # adding adjusted_num_frames
    adjusted_num_frames = [min(frame, MAX_TEXT_TOKEN_LENGTH) for frame in seq_lens[req_idx].detach().cpu().numpy()]
    res = self.aligner_model.align(
        ctranslate2.StorageView.from_array(features[req_idx]),
        start_sequence=list(start_seq),
        text_tokens=[text_tokens[_] for _ in req_idx],
        num_frames=adjusted_num_frames,
        median_filter_width=7
    )

and adjusting data_collate_fn here to:

def data_collate_fn(self, batch):
    # adding max_seq_len_samples
    max_seq_len_samples = MAX_TEXT_TOKEN_LENGTH * (HOP_LENGTH * INPUT_STRIDE)
    if self.use_dynamic_time_axis:
        max_len = min(max([_[3] for _ in batch]) + self.dta_padding, N_SAMPLES, max_seq_len_samples)
    else:
        max_len = min(N_SAMPLES, max_seq_len_samples)

Let me know if that fixes anything @rahulmate

rahulmate · 2024-04-11T14:25:04Z

Thanks @aleksandr-smechov changes in align_words function solved the issue. I haven’t done benchmark yet but will run it to check the timestamps. For changes in data_collate_fn I was getting error with tensorRt model tensor

Could not set shape torch.Size([16, 80, 896]) for tensor x. Please check the profile range for which your model was build. Selection deleted
Currently only using changes in align_words because originally I was getting issue with align model itself.

See shashikg#59 (comment) Error: No position encodings are defined for positions >= 448, but got position 454

milosjovanov · 2024-07-26T11:27:04Z

For me, the above didn't solve anything. The issue I'm facing is that model (large-v3) is hallucinating and creating repetition of some phrases, which then increases length of chunk/tokens. Large-v2 didn't have this problem with this specific audio, but it did with some that were fine with large-v3. Overall, i would say that tensorrt-llm backend is showing more hallucinations than ctranslate2 is.

brunjo added a commit to ai-avatar/WhisperS2T2 that referenced this issue May 26, 2024

Fix aligner error

ada246a

See shashikg#59 (comment) Error: No position encodings are defined for positions >= 448, but got position 454

brunjo added a commit to ai-avatar/WhisperS2T2 that referenced this issue May 26, 2024

Fix aligner error

3c755d7

See shashikg#59 (comment) Error: No position encodings are defined for positions >= 448, but got position 454

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomly getting error while generating word timestamps #59

Randomly getting error while generating word timestamps #59

rahulmate commented Apr 9, 2024

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

rahulmate commented Apr 11, 2024 •

edited

Loading

milosjovanov commented Jul 26, 2024 •

edited

Loading

Randomly getting error while generating word timestamps #59

Randomly getting error while generating word timestamps #59

Comments

rahulmate commented Apr 9, 2024

aleksandr-smechov commented Apr 9, 2024 • edited Loading

rahulmate commented Apr 11, 2024 • edited Loading

milosjovanov commented Jul 26, 2024 • edited Loading

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

rahulmate commented Apr 11, 2024 •

edited

Loading

milosjovanov commented Jul 26, 2024 •

edited

Loading