Skip to content

Installation

Col·lectivaT edited this page Nov 28, 2018 · 15 revisions

Here you will find basic instructions to install CMUSphinx tools. Although we overlap in some respect with the offical tutorial from CMUSphinx team, our aim is not to duplicate content but to expand on certain basic tasks and concepts. For fundamental information please refer to their page, relevant publications (Lamere et al 2003, Huggins-Daines et al. 2006) and their sourceforge forum.

Requirements

The instructions in this page are currently tested for Ubuntu 16.04 and Mac OS 14.

Tools primer

The CMUSphinx project consists of multiple tools. They have different uses and applications. These tools are, as listed in their website:

  • pocketsphinx — lightweight recognizer library written in C.
  • sphinxbase — support library required by pocketsphinx and sphinxtrain
  • sphinx4 — adjustable, modifiable recognizer written in Java
  • sphinxtrain — acoustic model training tools

In this wiki we will discuss the installation steps of all except the Sphinx4.

For simple recognition tasks, pocketsphinx is the tool you need, it is packaged and easy to setup for multiple operating systems. For development and advanced use you will need to download and compile sphinxbase. If you want to do development in Java, sphinx4; and finally for training acoustic models you will need sphinxtrain.

In the next section, we explain how to setup pocketsphinx for casual users. However for the developers there are multiple ways of downloading these tools:

The best way depends on the tool and the need. As of November 2018, the commonly used tools are most up to date in github.

Basic installation

In order to start decoding speech either directly from your microphone or from files, the first tool to download is pocketsphinx.

Debian systems

For debian based systems, you can install it simply using the apt package manager.

sudo apt-get install pocketsphinx

Mac OS

You can install pocketsphinx using brew

brew install cmu-sphinx

Additional resources

Models and dictionary

pocketsphinx can not decode by itself, it needs language dependent resources. Namely:

  • Acoustic model
  • Language model
  • Phonetic lexicon (dictionary)

Practically the language model and the lexicon are both single files, whereas the acoustic model consists of multiple files; usually all the resources are distributed together.

You can download our Catalan models from here. Also the CMUSphinx sourceforge downloads page already has some languages available.

Each language should have at least the three resources. The case for Spanish is as follows:

NOTE

For debian base systems, English models can be downloaded by sudo apt-get install pocketsphinx-en-us

Transcribed audio files

Nevertheless if you want to follow the tutorial using the Spanish models, you can download them here within which you will find the directory path structure similar to our Catalan models.

To start testing the speech recognition capabilities, it is possible to download speech files for a given language from the Voxforge website. Following the above example the Spanish speech files can be found here and specifically here.

NOTE

Each acoustic model is trained for a given audio sampling. The most common values are 16kHz and 8kHz. When decoding speech recordings, make sure the file sampling is consistent with what the acoustic model is trained for.

Examples

Now we can test the pocketsphinx installation. To test using a file:

pocketsphinx_continuous -hmm  <acoustic_model_path> -lm <language_model_file> -dict <dictionary_file> -infile <wave_file>

Or simply for the Catalan case:

pocketsphinx_continuous -hmm  ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronounciation-dictionary.dict -infile scripts/test_wavs/test_ca-es.wav 2>/dev/null

The last part 2>/dev/null is to hide the progress log messages.