-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speaker-blind speech recognition #144
base: develop
Are you sure you want to change the base?
Conversation
….tune. Fix major bug in Optimizer
Is this feature considered implemented? |
@BlokusPokus it seemed to work last time I tried but I didn't merge because I wanted to include a faster implementation of Whisper and I needed to clean up the code. Feel free to try it out but it's a pretty old version of the library. I need to find some time to update this PR. If you feel like it, it would be an amazing contribution! |
Yeah we definitely need a faster-whisper / WhisperLive implementation. WhisperLive also integrated VAD and I see it has some overlapping features. |
Depends on #143
Adding a streaming ASR pipeline needed a big refactoring (that began with #143).
This PR continues this effort to allow a new type of pipeline that transcribes speech instead of segmenting it.
A default ASR model based on Whisper is provided, but the dependency is not mandatory.
Additional modifications were also needed to make Whisper compatible with batched inference.
Note that we do not condition Whisper on previous transcriptions here. I expected this to degrade transcription quality but I found it rather robust in my experiments with the microphone and spontaneous speech in various languages (English, Spanish and French).
The new
Transcription
pipeline can also use a segmentation model as a local VAD to skip non-voiced chunks. In my experiments, this worked better and faster than using Whisper'sno_speech_prob
.Transcription
is also compatible withdiart.stream
,diart.benchmark
,diart.tune
anddiart.serve
(hencediart.client
too).Still missing
Changelog
TBD