-
-
Notifications
You must be signed in to change notification settings - Fork 1
Comparing Speech Recognition Engines
Caleb Bassi edited this page Jan 1, 2021
·
1 revision
- is great for speech mode
- uses context based inference
- has a large vocabulary
- supports a lot of languages
- is very lightweight on resources since it runs on a remote server
- proprietary
- isn't free
- uses a remote server
- requires an internet connection
- adds a decent amount of latency
- isn't great for command mode
- doesn't allow you to specify a command graph
- allows you to specify preferred phrases which helps but isn't good enough
- uses context-based inference
- certain keywords having a harder time of being picked up depending on the context
- gives non-deterministic results based on the context
- gets a little pricey if using it a lot
- doesn't allow you to specify a command graph
- at the mercy of Google since it's service-based
An older but up-to-date speech recognition engine that is a DNN-HMM hybrid.
- there are a lot of models to choose from
- works great for command mode
- can dynamically set the grammar
- the medium size models work pretty well for speech mode
- the medium size models only use about 1 GB of memory
- the tiny models are missing too much vocabulary for speech mode
- the large models take up a lot of memory, like 2 to 3.5 GB
A newer DNN speech recognition engine from Facebook.
- seems a little unapproachable from an end-user perspective
- seems mainly tailored to researchers
- there doesn't seem to be a nice and simple Python API
- impractical to train models as a regular user since it needs a lot of GPUs
- very lightweight
- has a weird license
- doesn't give great results
A newer DNN speech recognition engine.
- doesn't give the best results
Deprecated in favor of using Kaldi with a lightweight model like one of these.
- proprietary
- only works on Windows and macOS
- it's possible to run it in a VM or on another computer and stream the results but this can be difficult to set up