Comparing Speech Recognition Engines

Jump to bottom

Caleb Bassi edited this page Jan 1, 2021 · 1 revision

Google Cloud Speech-to-Text

Pros

is great for speech mode
- uses context based inference
- has a large vocabulary
supports a lot of languages
is very lightweight on resources since it runs on a remote server

Cons

proprietary
isn't free
uses a remote server
- requires an internet connection
- adds a decent amount of latency
isn't great for command mode
- doesn't allow you to specify a command graph
  - allows you to specify preferred phrases which helps but isn't good enough
- uses context-based inference
  - certain keywords having a harder time of being picked up depending on the context
  - gives non-deterministic results based on the context
- gets a little pricey if using it a lot
at the mercy of Google since it's service-based

Kaldi

An older but up-to-date speech recognition engine that is a DNN-HMM hybrid.

Pros

there are a lot of models to choose from
works great for command mode
- can dynamically set the grammar
the medium size models work pretty well for speech mode
- the medium size models only use about 1 GB of memory

Cons

the tiny models are missing too much vocabulary for speech mode
the large models take up a lot of memory, like 2 to 3.5 GB

wav2letter

A newer DNN speech recognition engine from Facebook.

Pros

Cons

seems a little unapproachable from an end-user perspective
- seems mainly tailored to researchers
- there doesn't seem to be a nice and simple Python API
impractical to train models as a regular user since it needs a lot of GPUs

Picovoice Cheetah

Pros

very lightweight

Cons

has a weird license
doesn't give great results

Mozilla DeepSpeech

A newer DNN speech recognition engine.

Pros

Cons

doesn't give the best results

PocketSphinx

Deprecated in favor of using Kaldi with a lightweight model like one of these.

Dragon Dictation

Pros

Cons

proprietary
only works on Windows and macOS
- it's possible to run it in a VM or on another computer and stream the results but this can be difficult to set up