A script that takes all of the mp3 files in a directory, transcribes them into .txt files, and places them in an output directory. This is especially useful for voice memos.
In order to install this package, type the following into the command line when at the project's directory:
$ python3 setup.py install --user
You also need ffmpeg: sudo apt install ffmpeg
The Vosk speech recognition API requires the use of a model. This is a usually a 40MB to 3GB zip file that gives the API what it needs to listen and interpret speech. This link has a list of all models compatible with the API. You can choose which one you would like to use based on your use case.
After downloading your model of choice, unzip it and rename it "model". It needs to be renamed model in order for the script to be able to find the model you downloaded.
The program has been written so that when you say a certain word (that you define) it takes everything before said word and makes it the title of the .txt file. This word should not be a common word, such as "the," but it also shouldn't be something too difficult to say and, consequently, for the program to interpret. Therefore pick a simple, uncommon word to signal the end of the title.
After downloading a model and renaming it, you need to edit a file named "info.json":
After doing this and setting the variables appropriately, you can now run the program:
$ transcribe-mp3