The Massively Multilingual Speech (MMS) project expands speech technology from about 100 languages to over 1,000 by building a single multilingual speech recognition model supporting over 1,100 languages (more than 10 times as many as before), language identification models able to identify over 4,000 languages (40 times more than before), pretrained models supporting over 1,400 languages, and text-to-speech models for over 1,100 languages. The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages. The models and code are originally released here. There are the different models open sourced in the MMS project: Automatic Speech Recognition (ASR), Language Identification (LID) and Speech Synthesis (TTS).
This notebook demonstrates how to convert and run ASR and LID models using OpenVINO.
The tutorial consists of the following steps:
- Install and import prerequisite packages
- Download pretrained model and processor
- Prepare an example audio using https://huggingface.co/datasets/multilingual_librispeech
- Make inference with the original model
- Convert models to OpenVINO IR model and make inference
- Interactive demo with gradio
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.