Skip to content

Latest commit

 

History

History
525 lines (449 loc) · 58 KB

CHANGELOG.md

File metadata and controls

525 lines (449 loc) · 58 KB

CHANGELOG

2.17.5

  • added dynamic_heads to transcribe() and align() (32235fa)
  • added pipeline_kwargs to load_hf_whisper() (024d7dc)
  • added "large-v3-turbo" and "turbo" to HF_MODELS (024d7dc)
  • updated Whisper requirement to >=20230314,<=20240930 (453013c, df8dace)
  • updated Whisper compatibility warning message (453013c, df8dace)
  • updated compatibility with Whisper v20240930 (df8dace)
  • updated align() and transcribe_stable() compatibility with latest Faster-Whisper commit (024d7dc)

2.17.4

  • deprecated vad_onnx (b309530)
  • added optional dependencies for Faster Whisper and Hugging Face (c541169)
  • added nonspeech_skip (888181f)
  • fixed #393 (1ee47ce)
  • fixed stabilization.utils.mask2timing() to handle edge cases (e0e7183)
  • fixed suppress_silence=False performing unnecessary compute when vad=True (888181f)
  • fixed typos in docstrings (e0e7183)
  • updated refine() docstring in README (3bc76b9)
  • updated vad to accept a dict of keyword arguments for loading VAD (b309530)

2.17.3

  • added pad() to result.WhisperResult (689fe5e)
  • added newline to merge_by_gap() and merge_by_punctuation() (689fe5e)
  • fixed verbose for adjust_by_silence() (f53f2ee)
  • fixed adjustment progress bar in non_whisper.transcribe_any() (48d70a8)
  • fixed error from using tag/--tag when output format is VTT and word_level=True (3997ef1)
  • fixed segment merging methods not working when the result contains only segment-level timestamps (689fe5e)
  • updated merge_by_gap() and merge_by_punctuation() docstrings with newline (3ab74e7)

2.17.2

  • changed SRT to start from index 1 (9f8db52)
  • changed reset() to be consistent for results produces by all transcribe() variants (864b76c)
  • fixed #357 (98923ea)
  • fixed refine() not working when verbose is not True (864b76c)
  • fixed progress bar warning for refine() (864b76c)

2.17.1

  • fixed #353 (66f8d13)
  • fixed align() error when audio segment contains no detectable nonspeech/silent sections (6d9a1ef)
  • fixed gap_padding causing unpredictable gaps or delays in the final timestamps for align() (6d9a1ef)
  • updated align() (6d9a1ef)

2.17.0

  • added min_silence_dur to align() and all variants of transcribe() (e2f9458)
  • added pad_or_trim() to whisper_compatibility (c4d42f2)
  • changed align() to ignore compatibility issues for Fast-Whisper models (c4d42f2)
  • changed align() to prioritize new timestamps within rounding error (5ca7ca5)
  • changed align() to prioritize timestamps that least overlap nonspeech timings (e2f9458)
  • changed silence suppression to be less aggressive (e2f9458)
  • changed silence suppression to treat nonspeech sections that overlap a word as individual sections (5ca7ca5)
  • dropped Whisper dependency for stable-ts-whisperless (c4d42f2)
  • fixed result.WordTIming.suppress_silence() by undoing changes in e2f9458 (0546d76)
  • fixed discrepancy between text and output for align() (e2f9458)
  • changed default of align() to presplit=False on faster-whisper models (850a19f)
  • updated README.md with setup instructions for stable-ts-whisperless (c4d42f2)
  • updated use_word_position=True to also take into account the index of each word (5ca7ca5)

2.16.0

  • deprecated suppress_attention (5513609)
  • deprecated ts_num and ts_noise (5513609)
  • added noisereduce as a supported denoisers (03bb83b)
  • added engine to load_model() (5513609)
  • added extra_models, to align() and transcribe() (5513609)
  • added presplit and gap_padding to align() (5513609)
  • fixed docstring of adjust_by_silence() (5513609)
  • fixed dfnet denoiser model to use specified device (5513609)
  • fixed error from progress=True when denoiser='noisereduce' (5513609)
  • fixed incorrect titles when downloading audio with yt-dlp(5513609)
  • changed 'demucs' and 'dfnet' denoisers to denoise in 2 channels when stream=False (5513609)
  • improved word timing by making gap_padding more effective (5513609)

2.15.11

  • fixed inaccurate progress bar in result.WhisperResult.suppress_silence() (ad013d7)
  • replaced update_all_segs_with_words() in the refine() with reassign_ids() (ad013d7)
  • updated --align to treat the argument as plain-text if the argument starts with 'text=' (ad013d7)

2.15.10

  • added --persist / -p to CLI (177bcc4)
  • added suppress_attention to transcribe() and align() for original Whisper (177bcc4)
  • fixed align() failing to predict nonspeech timings after skipping a nonspeech section (424f484)
  • fixed typo (#324) (dbee5c5)

2.15.9

  • changed WhisperResult to allow initialization without data (00ad4b4)
  • fixed Segment.copy() failing to initialize WordTiming when new_words=None and copy_words=False (00ad4b4)
  • fixed WhisperResult.duration to return 0.0 if result contains no segments (00ad4b4)
  • fixed WhisperResult.has_words to return False if result contains no segments (00ad4b4)

2.15.8

  • fixed Whisper.fill_in_gaps() (cbbad76)
  • removed end >= start requirement for Segment (cbbad76)
  • updated warning message for out of order timestamps (cbbad76)

2.15.7

  • deprecated Segment.update_seg_with_words() and WhisperResult.update_all_segs_with_words() (ff89e53)
  • changed start, end, text, tokens of Segment to properties (ff89e53)
  • deprecated and replace WordTiming.round_all_timestamps() with round_ts=True at initialization (ff89e53)
  • added progress bar for timestamps adjustments (ff89e53)
  • speed up splitting and merging of segments (ff89e53)
  • removed redundant parts of the default regrouping algorithm (ff89e53)

2.15.6

  • added pipeline to stable_whisper.load_hf_whisper() (c356491)
  • changed language, task, batch_size to optional parameters for the WhisperHF.transcribe() (c356491)
  • fixed English models not working for WhisperHF (c356491)
  • fixed get_device() for 'mps' (53272cb)

2.15.5

  • WhisperHF.transcribe() can now take generation parameters supported by Transformers (133f323)
  • added logic to replace None timestamps returned by Hugging Face Whisper models (8bbe0c5)
  • changed whisper_word_level.hf_whisper.load_hf_pipe() model loading method(a684fb4)

2.15.4

2.15.3

  • added support for Whisper on Hugging Face Transformers (9197b5c)
  • fixed non-speech suppression not working properly for transcribe_any() (9197b5c)

2.15.2

  • changed default to dtype=numpy.int32 for all Numpy int arrays (3886bc6)

2.15.1

  • removed shell=True in .audio.utils.get_metadata() (e8f72a3)

2.15.0

  • added "「" to prepend_punctuations and "」" to append_punctuations (9968a45)
  • added AudioLoader class for handling general audio loading (9968a45)
  • added NonSpeechPredictor class for handling non-speech detection (9968a45)
  • added default.py to hold global default states (9968a45)
  • added failure_threshold to align() (9968a45)
  • added stream to functions that use AudioLoader internally (9968a45)
  • added progress bars for VAD and Demucs operations (9968a45)
  • changed text normalization for align() (6d0746c)
  • changed WhisperResult to ignore segments with no words (6d0746c)
  • changed nonspeech_error default from 0.3 to 0.1 for all functions (9968a45)
  • changed nonspeech_skip default from 3.0 to 5.0 for align() (9968a45)
  • changed use_word_position behavior (9968a45)
  • changed to load Demucs into cache for reuse by default (9968a45)
  • deprecated and replaced demucs and demucs_options with denoiser and denoiser_options (9968a45)
  • dropped ffmpeg-python dependency (9968a45)
  • dropped dependencies: more-itertools, transformers (9968a45)
  • fixed align() producing empty word slices (6d0746c)
  • fixed refine() exceeding the max token count (#297) (f6d61c2)
  • fixed issues in transcribe_any() caused by unspecified samplerate (9968a45)
  • fixed vad=True causing first word of segment to be grouped with previous segment (9968a45)
  • refactored audio.py, stabilization.py, whisper_word_level.py into subpackages (9968a45)
  • removed demucs_output (9968a45)

2.14.4

  • added output_demo.mp4 (395c8a9)
  • fixed align() throwing UnsortedException (f9ca03b)
  • fixed original_split=True failing when there are more than one consecutive newlines (f9ca03b)
  • fixed (align() IndexError)(#292 (comment)) (f9ca03b)

2.14.3

  • added trust_repo=True for loading Silero-VAD (a6b2b05)
  • added 'master' to the branch for loading Silero-VAD (a6b2b05)
  • fixed align() failing for faster whisper with certain languages (677f233)
  • fixed result.WhisperResult.apply_min_dur() and result.Segment.apply_min_dur() to work as intended (be2985e)
  • removed resampling_method="kaiser_window" for all calls of torchaudio.functional.resample() (a6b2b05)

2.14.2

  • updated align() logic (738fd98)
  • added nonspeech_skip to align() (738fd98)
  • added show_unsorted to result.WhisperResult.__init__() and result.WhisperResult.raise_for_unsorted() (738fd98)
  • added use_word_position to methods that support non-speech/silence suppression (738fd98)
  • fixed result.WhisperResult.force_order() to handle data with multiple consecutive unsort timestamps (738fd98)
  • fixed empty segment removal to work as intend for result.WhisperResult (ef0a87e)
  • updated README.md to directly included the docstrings instead of hyperlinks (738fd98)
  • updated result.save_as_json() to include ensure_ascii=False as default (738fd98)
  • added kwargs to result.save_as_json() (738fd98)
  • updated demo videos (3524aa2)

2.14.1

  • fixed result.WhisperResult.force_order() causing IndexError (0430a31)
  • updated README.md (bc4601f)

2.14.0

  • added nonspeech_sections property to result.WhisperResult (191674b)
  • added nonspeech_error for silence suppression (191674b)
  • changed min_word_dur behavior for silence suppression (191674b)
  • changed silence suppression behavior (191674b)
  • updated README.md (191674b)

2.13.7

  • fixed result.WhisperResult.split_by_punctuation() not working if min_words/min_chars/min_dur are unspecified (d51edb6)

2.13.6

  • added show_regroup_history() to result.WhisperResult (df4a199)
  • added new attribute, regroup_history, to .result.WhisperResult (df4a199)
  • added min_words, min_chars, min_dur to result.WhisperResult.split_by_punctuation() (df4a199)
  • updated README.md (e86c571)

2.13.5

  • added get_content_by_time() to result.WhisperResult (900797a)
  • added get_result() to result.Segment (900797a)
  • added get_segment() to result.WordTiming (900797a)
  • added text_ouput.result_to_txt()/result.WhisperResult.to_txt() (900797a)
  • added editing methods to result.WhisperResult: remove_word(), remove_segment(), remove_repetition(), remove_words_by_str(), fill_in_gaps() (900797a)
  • added editing methods to list of 'method keys' in result.WhisperResult.regroup() (900797a)
  • changed result.Segment.to_display_str() to enclose segment text in double quotes (900797a)
  • implemented __getitem__ and __delitem__ for result.Segment and result.WhisperResult (900797a)
  • updated docstrings of whisper_word_level.load_model() and whisper_word_level.load_faster_whisper() (900797a)

2.13.4

  • added result.WhisperResult.split_by_duration() (71b9f1f)
  • fixed newline=True for result.WhisperResult._split_segments() (71b9f1f)
  • fixed docstring of result.WhisperResult.split_by_length() (71b9f1f)
  • updated Whisper to v20231117 (71b9f1f)

2.13.3

  • added --faster_whisper, -fw to CLI (a038ad1)
  • added --locate, -lc to CLI (a038ad1)
  • changed alignment.align() to be compatible with faster-whisper (a038ad1)
  • changed verbose behavior for alignment.locate() (a038ad1)
  • fixed inconsistent syntax and typo in docstrings (a038ad1)
  • removed assertions for checking timestamp order when using __add__() with result.Segment or result.WordTiming (a038ad1)

2.13.2

  • added newline to split_by_gap(), split_by_punctuation(), split_by_length() (b336735)
  • added progress_callback to whisper_word_level.load_faster_whisper.faster_transcribe() (b336735)
  • fixed #241 (5c512a1)
  • refactored _COMPATIBLE_WHISPER_VERSIONS, _required_whisper_ver, warn_compatibility_issues() (b336735)
  • updated README.md (3dfbd72)
  • updated --model for CLI to be compatible with checkpoint paths (b336735)
  • merge_all_segments() with faster logic (b336735)
  • updated verbose for .whisper_word_level.load_faster_whisper.faster_transcribe() (b336735)
  • updated whisper version to v20231106 (b336735)

2.13.1

  • added avg_prob_threshold to whisper_word_level.transcribe_stable() (58ece35)
  • added fast_mode to alignment.align() (58ece35)
  • added utils.UnsortedException (eb00d29)
  • added word_dur_factor and max_word_dur to alignment.align() (58ece35)
  • changed check_sorted for result.WhisperResult to also accept a path (eb00d29)
  • changed clip_start default to None for result.WhisperResult.clamp_max() (58ece35)
  • corrected docstrings of suppress_silence and suppress_word_ts (58ece35)
  • fixed timing.find_alignment_stable() returning negative timestamps (58ece35)

2.13.0

  • added alignment.locate() (a777206)
  • added utils.format_timestamp() and utils.make_safe() (a777206)
  • added utils.safe_print() (a777206)
  • added demucs, demucs_options, only_voice_freq to alignment.refine() (a777206)
  • added to_display_str() to result.Segment (a777206)
  • added demucs_options to whisper_word_level.load_faster_whisper.faster_transcribe() (a777206)
  • updated --output / -o (a777206)
  • changed audio to always expected to be 16kHz for torch.Tensor or numpy.ndarray (a777206)
  • fixed alignment.align() failing if text a result.WhisperResult without tokens (a777206)
  • fixed original_split=True by replacing line breaks with space (97a316d)
  • fixed result_to_ass() failing to return to base color when using tag (83ae509)
  • improved efficiency of segment splitting for alignment.align() when original_split=True (a777206)
  • refactored the audio preprocessing into audio.prep_audio() (a777206)
  • removed _is_whisper_repo_version from utils.py (a777206)
  • renamed original_spit to original_split for alignment.align() (a777206)
  • set action="extend" for all CLI keyword arguments that take multiple values (a777206)
  • changed demucs to also accept a Demucs model instance(a777206)
  • deprecated time_scale, input_sr, demucs_output, demucs_device (a777206)
  • updated docstrings (a777206)

2.12.3

  • updated alignment.align() to raise warning on failure (b9ac041)
  • changed language into a required parameter (b9ac041)
  • fixed alignment.align() endlessly looping (b9ac041)

2.12.2

  • added original_spit to alignment.align() (45bd3bc)
  • ignore DecodingOptions for alignment (1fb3009)

2.12.1

  • changed abs_dur_change default to None (dd1452e)
  • changed abs_prob_decrease default to 0.5 (dd1452e)
  • changed alignment.refine() allow durations to increase (dd1452e)
  • changed rel_prob_decrease default to 0.3 (dd1452e)
  • changed rel_rel_prob_decrease to optional (dd1452e)
  • changed the usage of original probability in alignment.refine() (dd1452e)
  • fixed CLI not using decode_options (9aba3dc)
  • fixed adjust_by_silence() throwing TypeError (92d51b9)
  • updated README.md 3643092)

2.12.0

  • added --align to CLI (c90ff06)
  • added alignment.refine() for refining timestamps (138cb6b)
  • added --refine and --refine_option to CLI (138cb6b)
  • added segment_id and id to result.WordTiming (138cb6b)
  • added description to transcription progress bar (138cb6b)
  • fixed align() not working when text is a result.WhisperResult (138cb6b)
  • fixed transcribe() throwing error if suppress_silence=False (138cb6b)
  • updated README.md (c90ff06)

2.11.7

  • fixed --debug not showing the first option (857df9a)
  • fixed demucs and only_voice_freq for transcribe_stable() (7f62a9d)
  • fixed demucs for transcribe_minimal() (857df9a)
  • fixed only_voice_freq for transcribe_minimal() (7f62a9d)
  • fixed progress bar for faster-whisper (7f62a9d)
  • updated transcribe_minimal() to accept more options (857df9a)
  • updated transcribe_stable() for faster-whisper models to accept more options (7f62a9d)

2.11.6

2.11.5

  • added 'us' as method key to WhisperResult.regroup() (da33bf5)
  • added --demucs_option, --model_option, --transcribe_option, --save_option to CLI (da33bf5)
  • added --transcribe_method to CLI (da33bf5)
  • added Segment.words_by_lock(), WhisperResult.all_words_by_lock() (da33bf5)
  • added strip to WhisperResult.lock() (e98c3d6)
  • fixed docstring of WhisperResult.lock() (05bba74)
  • improved --debug for CLI (da33bf5)
  • improved even_split=True for WhisperResult.split_by_length() (da33bf5)
  • updated docstring of WhisperResult.split_by_length() (da33bf5)

2.11.4

  • added lock() to WhisperResult (384fc3c)
  • added 'l' as method key to WhisperResult.regroup() (384fc3c)
  • added progress bar to transcription with faster-whisper (5ac6f5e)
  • updated --output_format to accept multiple formats (384fc3c)
  • updated WhisperResult.reset() to match its initialization (384fc3c)
  • updated regroup() to parse regroup_algo into dict (384fc3c)

2.11.3

  • added check_sorted to WhisperResult (4054ca1)
  • added check_sorted to transcribe_any() (07eaf9e)
  • added round_all_timestamps() to result.Segment and result.WordTiming (4a7e52b)
  • changed default to word_timestamps=True for faster_transcribe() (4a7e52b)
  • changed raise_for_unsorted() logic (4a7e52b)
  • fixed WhisperResult.force_order() to work as intended (4a7e52b)

2.11.2

  • fixed condition_on_previous_text (641cce7)
  • updated Whisper version to v20230918 (641cce7)

2.11.1

2.11.0

  • added Whisper.adjust_by_result() (6da3dd8)
  • added alignment.align() (6da3dd8)
  • added load_faster_whisper() (6da3dd8)
  • fixed encode_video_comparison() unable to encode more than two subtitle files (6da3dd8)
  • fixed verbose not working for transcribe_minimal() (6da3dd8)
  • refactored compatibility warning into warn_compatibility_issues() in utils.py (6da3dd8)
  • refactored post-inference silence suppress into WhisperResult.adjust_by_silence() (6da3dd8)

2.10.1

  • added demucs_options to transcribe() (91cf2b1)
  • added ignore_compatibility to transcribe() (91cf2b1)
  • changed compatibility warning to distinguish between mismatch version number and repo version (91cf2b1)
  • changed heuristic for identifying Whisper version number to avoid false positives (91cf2b1)

2.10.0

  • added transcribe_minimal() (ef8a7f1)
  • added force_order to result.WhisperResult (ef8a7f1)
  • added max_instant_words to transcribe() (ef8a7f1)
  • added progress_callback to transcribe() (ef8a7f1)
  • changed default to clip_start=True for WhisperResult.clamp_max() (ef8a7f1)
  • added logic to check if the installed Whisper version is compatible (e53f4be)
  • fixed tag for result_to_ass() to work as intended (ea8cac8)

2.9.0

  • added logic to ensure ascending timestamps in result.WhisperResult (fd78cd7)
  • updated default regroup algorithm (fd78cd7, 77dcfdf)
  • updated long form transcription logic (fd78cd7)
  • fixed skipping words (77dcfdf)
  • avoid computing higher temperatures on no_speech segments (fd78cd7)
  • removed any segments that contains only punctuations (fd78cd7)
  • removed segments with 50%+ instantaneous words (fd78cd7)
  • updated README.md (f5b4c22)

2.8.1

  • allow regroup_algo to be bool for regroup() (4984163)

2.8.0

  • added even_split to split_by_length() (7b867d6)
  • changed default behavior of split_by_length() (7b867d6)
  • changed default to verbose=False for clamp_max() (7b867d6)

2.7.2

  • ignore min_word_dur when missing words timestamps (e93c280)
  • fixed min_word_dur not working for word timestamps (e93c280)

2.7.1

  • added verbose to clamp_max() (70f092f)
  • fixed typo in examples\non-whisper.ipynb (70f092f)

2.7.0

  • added clamp_max() to WhisperResult and WordTiming (bfe93ab)
  • added cm as method key for clamp_max() (bfe93ab)
  • added non_whisper.transcribe_any() (789bb54)
  • changed default to suppress_ts_tokens=False (789bb54)
  • fixed hyperlinks in README.md not linking to the latest commit (87636ef)
  • fixed incorrect line numbers for docstring hyperlinks (52b8b7a)

2.6.4

  • fixed --regroup default (af5579e)

2.6.3

  • added string form custom regrouping algorithm (cc352cd)

2.6.2

2.6.1

2.6.0

  • added support for TSV output format (d30d0d1)
  • changed to VTT and ASS default output to use more efficient formats (d30d0d1)
  • fixed non-VAD suppression not working properly (d30d0d1)
  • improved language detection (d30d0d1)

2.5.3

2.5.2

2.5.1

  • added logic for loading audio with yt-dlp (8960922)
  • added only_ffmpeg to transcribe() and CLI (8960922)
  • added shell=True to subprocess call (a8df3b5)

2.5.0

  • added classes: SegmentMatch and WhisperResultMatches (1eabb37)
  • added fallback logic to word alignment (1eabb37)
  • added find() to result.WhisperResult (1eabb37)
  • added suppress_ts_tokens and gap_padding to transcribe() and CLI (1eabb37)
  • added shell=True to is_ytdlp_available() (d2b7f3f)
  • fixed NaN values in the logits (1eabb37)

2.4.1

  • added result_to_any() (eab8319)
  • changed rtl to reverse_text (eab8319)

2.4.0

  • added offset_time() to WhisperResult, Segment, WordTiming (1447a66)
  • added support for audio as URLs (1447a66)
  • fixed language detection for English models (1447a66)

2.3.1

  • added split_callback (44af5c4)
  • changed parameters of split_callback (c003ce4)
  • corrected the docstring for rtl (169e014)
  • fixed punctuation split/merge to work as intended (a84a346)

2.3.0

  • added regrouping list (a0021bd)
  • added --max_chars and --max_words to CLI (f913d6f)
  • added rtl #116 (f913d6f)
  • corrected VAD pytorch requirement (60f668d)
  • fixed visualize_suppression() error when max_width=-1 (918e3ba)
  • fixed out of range error (918e3ba)

2.2.0

  • added merge_all_segments() to result.WhisperResult (7c69535)
  • added split_by_length() to result.WhisperResult (7c69535)

2.1.3

  • fixed transcription logic (d44d287)

2.1.2

2.1.1

  • added mel_first (8fa5670)
  • fixed: to not apply min_dur on words if segments contains no words (8fa5670)
  • updated regroup demo video (e9932fe)

2.1.0

2.0.4

  • fixed timestamps to jump backwards (26918d5)

2.0.3

  • changed default strip=True for result_to_srt_vtt() (ce4c7b3)
  • keep segments when if segment has no words from the start (6ccfa17)
  • improved stabilization.audio2loudness() efficiency (db99d6b)
  • fixed regroup=True when word_timestamp=sFalse (6ccfa17)
  • fixed word_level=False failing output when word_timestamps=False (ce4c7b3)
  • fixed ASS output formatting (ce4c7b3)
  • updated README.md (f9f7c51)

2.0.2

  • fixed wav2mask() when suppress_silence=True (e884e38)
  • fixed typo (58006ec)

2.0.1

2.0.0

  • added segment-level and word-level support to SRT/VTT/ASS outputs (2248087)
  • added result.WhisperResult (2248087)
  • added Silero VAD support (2248087)
  • added visualize_suppression() (2248087)
  • added regrouping methods (2248087)
  • changed python requirement from 3.7+ to 3.8+ (2248087)
  • improved non-vad suppression (2248087)
  • improve word-level timestamps reliability (2248087)
  • updated README.md (eb5e68c)