You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In trying to generate confidence scores with timestamps using an RNN transducer model (stt_en_conformer_transducer_large), with buffering, there is a type mismatch error on this linefor ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)])
Sample:: 100%|██████████| 1/1 [00:00<00:00, 11618.57it/s]
[NeMo W 2024-12-03 14:55:42 rnnt_decoding:1184] Specified segment seperators are not in supported punctuation {"'"}. If the seperators are not punctuation marks, ignore this warning. Otherwise, specify 'segment_gap_threshold' parameter in decoding config to form segments.
<class 'dict'>
Backend macosx is interactive backend. Turning interactive mode on.
Error executing job with overrides: ['model_path=null', 'pretrained_name=stt_en_conformer_transducer_large', 'audio_dir=/Users/aanchan/work/podcast_transcription_using_nemo/test', 'output_filename=/Users/aanchan/work/podcast_transcription_using_nemo/test_rnn_t_f1.json', 'total_buffer_in_secs=4.0', 'chunk_len_in_secs=1.6', 'model_stride=4', 'batch_size=32', 'merge_algo=lcs', 'lcs_alignment_dir=$PWD/lcs']
Traceback (most recent call last):
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/Users/aanchan/work/podcast_transcription_using_nemo/rnnt_timestamps.py", line 301, in main
hyps = get_buffered_pred_feat_rnnt(
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/transcribe_utils.py", line 95, in get_buffered_pred_feat_rnnt
hyp_list = asr.transcribe(tokens_per_chunk, delay)
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/streaming_utils.py", line 1309, in transcribe
self.infer_logits()
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/streaming_utils.py", line 1081, in infer_logits
self._get_batch_preds()
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/streaming_utils.py", line 1148, in _get_batch_preds
best_hyp, _ = self.asr_model.decoding.rnnt_decoder_predictions_tensor(
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 569, in rnnt_decoder_predictions_tensor
hypotheses = self.compute_confidence(hypotheses)
File "/Users/aanchan/work/podcast_transcription_using_nemo/env_nemo_1/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 688, in compute_confidence
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]):
TypeError: unhashable type: 'slice'
A clear and concise description of what the bug is.
From the debugger it looks like hyp.timestamp is a dict, and really the zip should be happening over hyp.timestamp['timestamp'] which happens to be a PyTorch tensor. The slicing over a dict type seems incorrect e.g. hyp.timestamp[1:]
Steps/Code to reproduce bug
Please list minimal steps or code snippet for us to be able to reproduce the bug.
from nemo.collections.asr.parts.utils.asr_confidence_utils import (
ConfidenceConfig,
ConfidenceConstants,
ConfidenceMethodConfig,
ConfidenceMethodConstants,
)
confidence_cfg = ConfidenceConfig(
preserve_frame_confidence=True, # Internally set to true if preserve_token_confidence == True
# or preserve_word_confidence == True
preserve_token_confidence=True, # Internally set to true if preserve_word_confidence == True
preserve_word_confidence=True,
aggregation="prod", # How to aggregate frame scores to token scores and token scores to word scores
exclude_blank=False, # If true, only non-blank emissions contribute to confidence scores
tdt_include_duration=False, # If true, calculate duration confidence for the TDT models
method_cfg=ConfidenceMethodConfig( # Config for per-frame scores calculation (before aggregation)
name="max_prob", # Or "entropy" (default), which usually works better
entropy_type="gibbs", # Used only for name == "entropy". Recommended: "tsallis" (default) or "renyi"
alpha=0.5, # Low values (<1) increase sensitivity, high values decrease sensitivity
entropy_norm="lin" # How to normalize (map to [0,1]) entropy. Default: "exp"
)
)
Change the decoding strategy and attach the confidence config to the RNNTDecoderConfig being used
Describe the bug
In trying to generate confidence scores with timestamps using an RNN transducer model (
stt_en_conformer_transducer_large
), with buffering, there is a type mismatch error on this linefor ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)])
A clear and concise description of what the bug is.
From the debugger it looks like hyp.timestamp is a dict, and really the zip should be happening over hyp.timestamp['timestamp'] which happens to be a PyTorch tensor. The slicing over a dict type seems incorrect e.g. hyp.timestamp[1:]
Steps/Code to reproduce bug
Please list minimal steps or code snippet for us to be able to reproduce the bug.
A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.
An example code file and input is in this Google Drive folder
Expected behavior
A clear and concise description of what you expected to happen.
The expected output was a json file with time stamps written out.
Environment overview (please complete the following information)
The environment is a local laptop installation
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: GPU model
This was run on a CPU, and not a GPU
The text was updated successfully, but these errors were encountered: