-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAD quality #53
Comments
I'm consistently getting results that seem to work no better than audio level detection, also when using very good audio hardware with integrated noise cancellation. It might be good to integrate a newer VAD model into this repository I guess. Maybe integrating (upgrading to) the latest webrtc model would be a good idea, but webrtc is not a project fully focused on VAD, it also does AGC and many other things so it might not necessarily be a good idea to preserve the current project based off it, if the goal is only to tear out the VAD feature for python. |
I think the following paragraph from the CMU Sphinx project sums it up quite nicely in terms of what to expect:
On the face of it, this VAD model like most other ones of its time, is okay at figuring out speech against stationary noise, but has little power in determining whether an episode of specific noise is speech or something entirely else, under which interpretation it is useful for quiet rooms and for cutting out voice segments from almost noise-less recordings, not so much for detecting speech "in the wild". |
please see the benchmarks - #68 |
The readme says:
However I was unable to witness any auspicious accuracy with any aggression level (0-3). Is this statement based on any kind of benchmark or publication? Have you experienced any useful accuracy levels in your setup, using py-webrtcvad?
The text was updated successfully, but these errors were encountered: