Example tasks

There are multiple things you can do using pocketsphinx toolkit. There is a good overview as a part of the CMUSphinx tutorial. But here are some practical examples, which are executable through the command line.

The exact examples are in Catalan and you will need the models and additional resources. If you want to try the models for Spanish, you can download the voxforge models with some audio resources here.

Long audio transcription

All of the modern cloud-based Automated Speech Recognition (ASR) solutions accept audio files of limited length, usually around 20 seconds. However by using pocketsphinx toolkit, we can transcribe audio of any length without the need for pre-segmentation.

pocketsphinx_continuous -hmm ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronunciation-dictionary.dict -infile wav/long/a01c8adcf0e8bfc137ee_0.000_64.032.wav 2>/dev/null
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronunciation-dictionary.dict -infile wav/long/6f2a15ff600687b7973b_0.000_658.968.wav 2>/dev/null

Even if the file is very long, audio is processed gradually, and the text appears immediately as the segment finishes processing.

Keyword spotting

This functionality checks whether a word has been uttered in an audio or not. It works relatively fast, and similar algorithms are used for modern voice activation products.

Let's find out in which of the files the word feminisme has been uttered.

wav/vallet/0055826ff62602db7b94_48.300_54.620.wav
wav/vallet/01ea2f6494ed37b6fdbd_421.660_429.020.wav
wav/vallet/01ea2f6494ed37b6fdbd_438.220_452.500.wav
wav/vallet/6f2a15ff600687b7973b_527.580_541.660.wav
wav/vallet/6f2a15ff600687b7973b_513.220_527.300.wav
wav/vallet/0d59f21b1beb27dbea45_106.900_114.660.wav

Using the command

pocketsphinx_continuous -hmm  ca-es/acoustic-model/ -keyphrase feminisme -dict ca-es/pronounciation-dictionary.dict -infile wav/vallet/0055826ff62602db7b94_48.300_54.620.wav 2>/dev/null

Keyword lists and pronunciation dictionary

pocketsphinx can look for more than one keyword phrase at once using a keyword list file. To create one, simply open your favorite text editor and write the words which you want to be searched, each at a separate line.

In this example we want to look for both the word no and s'esverin. To generate the kws file we can simply use the printf tool, if you have it:

printf "no\ns'esverin" > vallet.kws

and input this file to pocketsphinx_continous using the -kws option instead of language model.

pocketsphinx_continuous -hmm  ca-es/acoustic-model/ -kws vallet.kws -dict ca-es/pronounciation-dictionary.dict -infile wav/vallet/6f2a15ff600687b7973b_570.260_573.380.wav 2>/dev/null

This time pocketsphinx just returned no. The problem is not in the models or the tool, but is explained in the error. If you execute the same command but without the 2>/dev/null in the end you will see the ERROR:

ERROR: "kws_search.c", line 528: Word 's'esverin' in phrase 's'esverin' is missing in the dictionary

it is because the file ca-es/pronounciation-dictionary.dict is missing the word s'esverin. In order to do this search we need to add it, using the phonemes defined in the model. You can find a table here, which shows which International Phonetic Alphabet (IPA) character corresponds to which CMUSphinx phoneme. This table only applies to our Catalan model, if you are using any other language or model, the phoneme symbol that pocketsphinx uses would be different.

Now add the word s'esverin with its phonetic representation to the dictionary file and try the command again. You should end up with both no and s'esverin as an output.

If you were able to successfully execute this task, congratulations, you learned basics for building a voice activation front-end.

NOTE

The IPA representation of s'esverin is səsβˈɛɾin.

There are multiple free and open tools to get the phonetic writing of words. The most common one is espeak. The command in espeak would be: espeak -vca s\'esverin --ipa -q

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example tasks

Long audio transcription

Keyword spotting

Keyword lists and pronunciation dictionary

Clone this wiki locally