This prototype demonstrates the potential of local AI models for speech-to-text transcription, offering a cost-effective and privacy-friendly solution. Running directly in the browser, it eliminates the need for complicated setups or expensive services. However, transcription can be slow when using larger models.
Transcribe is based on Whisper Web, built with Transformers.js, using ONNX Whisper models from Hugging Face. Whisper is a open-source speech recognition model developed by OpenAI.
Live Demo: https://stekhn.github.io/transcribe/
- Clone the repository
git clone [email protected]:stekhn/transcribe.git
- Install dependencies
npm install
- Start development server
npm run dev
- Build the website
npm run build
The project requires Node.js to run locally. The development server runs on http://localhost:5173/transcribe/.
Firefox users might need to change the dom.workers.modules.enabled
setting in about:config
to true
to enable Web Workers. Check out this issue for more details.
Configure the most important settings in the ./src/config.ts
file.
Update the list of available Whisper models and the default model:
export const DEFAULT_MODEL = "onnx-community/whisper-tiny";
export const MODELS: { [key: string]: number } = {
"onnx-community/whisper-tiny": 120,
"onnx-community/whisper-base": 206,
"onnx-community/whisper-small": 586,
};
The numeric value is the size of the model in Megabytes. Models must be provided as ONNX files. You can find suitable ONNX Whisper models on Hugging Face. Optimum is a great tool for converting models to ONNX. Additionally, the ONNX community provides great tutorials on how to create ONNX models from various machine learning frameworks.
Small warning: Using very large models (> 500 MB) will likely lead to memory issues.
Update the list of Whisper languages and update the default language:
export const DEFAULT_LANGUAGE = "en";
export const LANGUAGES: { [key: string]: string } = {
en: "english",
fr: "french",
de: "german",
es: "spanish",
};
See the full list of supported languages by Whisper. Though, it must me said that smaller languages are not well supported by small Whisper models, resulting in bad speech recognition quality. For those smaller languages or if performance is key, you might want to look into training your own Distil-Whisper model.
Create a production build of the web application:
npm run build
Add the build folder ./dist
to Git:
git add dist -f
Create a commit:
git commit -m "Add build"
Push local changes to Github:
git subtree push --prefix dist origin gh-pages