Skip to content

Zeus-Labs/whisperx-replicate-speaklabs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building & Updating Pipeline

  1. Install python 3.12 virtual environment: python3 -m venv myenv and source myenv/bin/activate.
  2. Run the build.sh script except for the last cog run pythong line.
  3. Run: sudo cog push r8.im/vishjain/whisperx-replicate-speaklabs-v2 to build and push to the Replicate repository.

whisperX on Replicate

This repo is the codebase behind the following Replicate models, which we use at Upmeet:

  • victor-upmeet/whisperx : if you don't know which model to use, use this one. It uses a low-cost hardware, which suits most cases
  • victor-upmeet/whisperx-a40-large : if you encounter some memory issues with previous models, consider this one. It can happen when dealing with long audio files and performing alignment and/or diarization
  • victor-upmeet/whisperx-a100-80gb : if you encounter some memory issues with previous models, consider this one. It can happen when dealing with long audio files and performing alignment and/or diarization

Model Information

WhisperX provides fast automatic speech recognition (70x realtime with large-v3) with word-level timestamps and speaker diarization.

Whisper is an ASR model developed by OpenAI, trained on a large dataset of diverse audio. Whilst it does produces highly accurate transcriptions, the corresponding timestamps are at the utterance-level, not per word, and can be inaccurate by several seconds. OpenAI’s whisper does not natively support batching, but WhisperX does.

Model used is for transcription is large-v3 from faster-whisper.

For more information about WhisperX, including implementation details, see the WhisperX github repo.

Citation

@misc{bain2023whisperx,
      title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio}, 
      author={Max Bain and Jaesung Huh and Tengda Han and Andrew Zisserman},
      year={2023},
      eprint={2303.00747},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.4%
  • Shell 9.6%