-
Notifications
You must be signed in to change notification settings - Fork 6
Example tasks
There are multiple things you can do using pocketsphinx
toolkit. There is a good overview as a part of the CMUSphinx tutorial. But here are some practical examples, which are executable through the command line.
The exact examples are in Catalan and you will need the models and additional resources. If you want to try the models for Spanish, you can download the voxforge models with some audio resources here.
All of the modern cloud-based Automated Speech Recognition (ASR) solutions accept audio files of limited length, usually around 20 seconds. However by using pocketsphinx
toolkit, we can transcribe audio of any length without the need for pre-segmentation.
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronunciation-dictionary.dict -infile wav/long/a01c8adcf0e8bfc137ee_0.000_64.032.wav 2>/dev/null
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -lm ca-es/language-model.lm.bin -dict ca-es/pronunciation-dictionary.dict -infile wav/long/6f2a15ff600687b7973b_0.000_658.968.wav 2>/dev/null
Even if the file is very long, audio is processed gradually, and the text appears immediately as the segment finishes processing.
This functionality checks whether a word has been uttered in an audio or not. It works relatively fast, and similar algorithms are used for modern voice activation products.
Let's find out in which of the files the word feminisme has been uttered.
wav/vallet/0055826ff62602db7b94_48.300_54.620.wav
wav/vallet/01ea2f6494ed37b6fdbd_421.660_429.020.wav
wav/vallet/01ea2f6494ed37b6fdbd_438.220_452.500.wav
wav/vallet/6f2a15ff600687b7973b_527.580_541.660.wav
wav/vallet/6f2a15ff600687b7973b_513.220_527.300.wav
wav/vallet/0d59f21b1beb27dbea45_106.900_114.660.wav
Using the command
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -keyphrase feminisme -dict ca-es/pronounciation-dictionary.dict -infile wav/vallet/0055826ff62602db7b94_48.300_54.620.wav 2>/dev/null
pocketsphinx
can look for more than one keyword phrase at once using a keyword list file. To create one, simply open your favorite text editor and write the words which you want to be searched, each at a separate line.
In this example we want to look for both the word no and s'esverin. To generate the kws file we can simply use the printf
tool, if you have it:
printf "no\ns'esverin" > vallet.kws
and input this file to pocketsphinx_continous
using the -kws
option instead of language model.
pocketsphinx_continuous -hmm ca-es/acoustic-model/ -kws vallet.kws -dict ca-es/pronounciation-dictionary.dict -infile wav/vallet/6f2a15ff600687b7973b_570.260_573.380.wav 2>/dev/null
This time pocketsphinx
just returned no. The problem is not in the models or the tool, but is explained in the error. If you execute the same command but without the 2>/dev/null
in the end you will see the ERROR:
ERROR: "kws_search.c", line 528: Word 's'esverin' in phrase 's'esverin' is missing in the dictionary
it is because the file ca-es/pronounciation-dictionary.dict
is missing the word s'esverin. In order to do this search we need to add it, using the phonemes defined in the model. You can find a table here, which shows which International Phonetic Alphabet (IPA) character corresponds to which CMUSphinx phoneme. This table only applies to our Catalan model, if you are using any other language or model, the phoneme symbol that pocketsphinx
uses would be different.
Now add the word s'esverin with its phonetic representation to the dictionary file and try the command again. You should end up with both no and s'esverin as an output.
If you were able to successfully execute this task, congratulations, you learned basics for building a voice activation front-end.
NOTE
The IPA representation of s'esverin is
səsβˈɛɾin
.There are multiple free and open tools to get the phonetic writing of words. The most common one is
espeak
. The command inespeak
would be:espeak -vca s\'esverin --ipa -q