OpenAI Whisper Realtime

This is a quick experiment to achieve almost realtime transcription using Whisper.

How to use

Install the requirements:

pip install -r requirements.txt

Run the script:

python openai-whisper-realtime.py

Dependencies:

Python > 3.7
whisper
sounddevice
numpy
asyncio

A very fast CPU or GPU is recommended.

How it works

The systems default audio input is captured with python, split into small chunks and is then fed to OpenAI's original transcription function. It tries (currently rather poorly) to detect word breaks and doesn't split the audio buffer in those cases. With how the model is designed, it doesn't make the most sense to do this, but i found it would be worth trying. It works acceptably well.

ToDo:

Improve transcription performance
Improve detection of word breaks or pauses, split the buffer dynamically
Refactoring
Clean stdout

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
openai-whisper-realtime.py		openai-whisper-realtime.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Whisper Realtime

How to use

How it works

ToDo:

About

Releases

Packages

Languages

License

tobiashuttinger/openai-whisper-realtime

Folders and files

Latest commit

History

Repository files navigation

OpenAI Whisper Realtime

How to use

How it works

ToDo:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages