Implementation of a desktop application for adapting to the voice of a single user (personalized) or environment and applying automatic speech recognition on a set of e-mails on english.
This project is developed as part of my thesis on ASR during my undergraduate studies at the school of Electrical and Computer Engineering of Aristotle University of Thessaloniki, Greece.
It is a desktop application that can be used for automatic speech recognition. Concretely, using this application, one can:
- Provide sample recordings of one's voice that will be used to adapt the speech recognition engine to one's voice.
- Provide a set of e-mails that will be used as a search corpus during the recognition.
- Dictate any sequence of words from within the provided corpus, and get the written transcript as a result.
The speech recognition engine consist of two parts. The ASR part and the Correction part.
For the ASR part, CMUSphinx is used. The provided sample recordings are used to adapt the default acoustic model of CMUSphinx to the user's voice. The provided e-mails are used to create a language model and a dictionary for CMUSphinx.
The Correction part is an algorithm designed to correct any errors in the output of the ASR part based on the corpus, the language model and the dictionary mentioned above.
The application is written in the Java programming language and the JavaFX library is used. The development and testing is done on Ubuntu 14.04.
- Java 8
- JavaFX
-
Ubuntu 14.04: The easies way to install Java 8 and JavaFX is to install Oracle JDK. To do that see this post.
-
Ubuntu 16.04: You can install Oracle JDK the same way you would install it on Ubuntu 14.04 but you can also install OpenJDK and OpenJFX from aptitude:
apt-get install openjdk-8-jdk apt-get install openjfx
-
Python version 2.7 or greater. You can install Python with the following command:
apt-get install python
-
autoconf, libtool, bison, python-dev, swig and wget packages. You can install these packages with the following command:
apt-get install autoconf libtool bison python-dev swig wget
-
Gradle version 2.4 or greater. Gradle is used to build the application from its sources. To automatically install Gradle, see the installation Steps.
After you have installed all the Prerequisites you are ready to install the application using the following commands:
-
Clone the repository:
git clone https://github.com/gouzouni625/personalized_automatic_speech_recognition.git
-
Run the setup.py script that will install CMUSphinx (Note that the installation will be done inside the directory personalized_automatic_speech_recognition, no files will be created or changed anywhere else on your file system):
cd personalized_automatic_speech_recognition ./setup.py
The setup script will look for Java at the location
/usr/lib/jvm/default-java
. If this is not the valid location of your Java installation, you should provide the correct path as an argument to the setup script like this:./setup.py --java-path /your/java/installation/path
If you installed Oracle JDK using a PPA, the java path will probably be:
./setup.py --java-path /usr/lib/jvm/java-8-oracle
If you don't have Gradle installed, the setup.py script can install it for you (the installation will be done inside the directory of the cloned repository) by passing the flag:
./setup.py --java-path /your/java/installation/path --gradle-install
After the installation script is done, you can check the setup.log file to make sure everything was installed correctly.
-
After that, you can run the application. A helper script has been created for this purpose. Simply run:
./start.sh
To get information about how to use the application, you can read the user guide.