-
Notifications
You must be signed in to change notification settings - Fork 6
Installation
Here you will find basic instructions to install CMUSphinx tools. Although we overlap in some respect with the offical tutorial from CMUSphinx team, our aim is not to duplicate content but to expand on certain basic tasks and concepts. For fundamental information please refer to their page, relevant publications (Lamere et al 2003, Huggins-Daines et al. 2006) and their sourceforge forum.
The instructions in this page are currently tested for Ubuntu 16.04 and Mac OS 14.
The CMUSphinx project consists of multiple tools. They have different uses and applications. These tools are, as listed in their website:
-
pocketsphinx
— lightweight recognizer library written in C. -
sphinxbase
— support library required bypocketsphinx
andsphinxtrain
-
sphinx4
— adjustable, modifiable recognizer written in Java -
sphinxtrain
— acoustic model training tools
In this wiki we will discuss the installation steps of all except the Sphinx4.
For simple recognition tasks, pocketsphinx
is the tool you need, it is packaged and easy to setup for multiple operating systems. For development and advanced use you will need to download and compile sphinxbase
. If you want to do development in Java, sphinx4
; and finally for training acoustic models you will need sphinxtrain
.
In the next section, we explain how to setup pocketsphinx
for casual users. However for the developers there are multiple ways of downloading these tools:
- From sourceforge CMUSphinx downloads page
- From the subversion repository
- From the git repository
The best way depends on the tool and the need. As of November 2018, the commonly used tools are most up to date in github.
In order to start decoding speech either directly from your microphone or from files, the first tool to download is pocketsphinx
.
For debian based systems, you can install it simply using the apt
package manager.
sudo apt-get install pocketsphinx
You can install pocketsphinx
using brew
brew install cmu-sphinx
pocketsphinx
can not decode by itself, it needs language dependent resources. Namely:
- Acoustic model
- Language model
- Phonetic lexicon (dictionary)
Practically the language model and the lexicon are both single files, whereas the acoustic model consists of multiple files; usually all the resources are distributed together.
You can download our Catalan models from here. Also the CMUSphinx sourceforge downloads page already has some languages available.
Each language should have at least the three resources. The case for Spanish is as follows:
NOTE
For debian base systems, English models can be downloaded by
sudo apt-get install pocketsphinx-en-us
Nevertheless if you want to follow the tutorial using the Spanish models, you can download them here within which you will find the directory path structure similar to our Catalan models.
To start testing the speech recognition capabilities, it is possible to download speech files for a given language from the Voxforge website. Following the above example the Spanish speech files can be found here and specifically here.
NOTE
Each acoustic model is trained for a given audio sampling. The most common values are 16kHz and 8kHz. When decoding speech recordings, make sure the file sampling is consistent with what the acoustic model is trained for.
Now we can test the pocketsphinx
installation. To test using a file:
pocketsphinx_continuous -hmm <acoustic_model_path> -lm <language_model_file> -dict <dictionary_file> -infile <wave_file>
Or simply for the Catalan case:
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronounciation-dictionary.dict -infile scripts/test_wavs/test_ca-es.wav 2>/dev/null
The last part 2>/dev/null
is to hide the progress log messages.