Osprey is a cross-platform voice typing program that allows you to use your computer and type with your voice. It can be used for coding, web browsing, dictating, or any other keyboard driven task.
Osprey is built on top of Dragonfly and Kaldi Active Grammar which uses Kaldi for speech recognition. Osprey is completely built on open source software and runs completely locally without depending on any proprietary speech recognition APIs or engines. The available speech recognition models are also built from open data sets.
Osprey is command based, rather than dictation based, which means users speak commands rather than sentences. These commands have associated actions such as key presses which get executed when the command is spoken.
Osprey works by providing a Python API that allows users to specify voice commands, from which Osprey builds a strict and limited grammar that specifies the possible commands a user can say. This grammar is then fed to a speech recognition engine, which then begins to transcribe microphone audio against the grammar by trying to match the audio to one of the commands.
Voice commands are specified in user-defined Python scripts which Osprey loads from its config directory on startup. No commands are included by default, instead an official starter pack of Osprey scripts exists at osprey-starter-pack which can be installed by cloning them to the config directory.
Official resources:
Other great resources for help getting started:
- Python 3.5+
- GTK3
- PortAudio
- Git
- A (decent) microphone
- A compatible Kaldi model that supports your language
- Around 0.5GB to 2GB of memory for the Kaldi model during runtime
- (optional) pipx for installation
- (preferable) some programming and Python experience
Install using pipx with:
pipx install git+https://github.com/osprey-voice/osprey
Osprey can later be upgraded with:
pipx upgrade osprey
It's highly recommended to clone the osprey-starter-pack repo to the Osprey config directory for some basic voice commands.
There are instructions for how to do this in that project's README.
Osprey depends on the following:
portaudio
gtk3
python3
git
Install the dependencies using your distro's package manager.
Install the dependencies using Homebrew with:
homebrew install portaudio gtk+3 python git
TODO
Download and extract one of the Kaldi models provided by kaldi-active-grammar here to the Osprey config directory.
Note: Kaldi models take up quite a bit of memory during runtime, ranging from about 0.5GB to 2GB depending on the size of the model, so you have to choose your model accordingly. Larger models have larger vocabulary, which means they can recognize more words, so you should use the largest model you can based on the available resources of your machine.
In order to simulate keypresses in Wayland, the current user needs to gain write access to /dev/uinput
.
To enable this:
- Copy the
40-uinput.rules
udev rule to/etc/udev/rules.d/
(requiressudo
) to change the permissions on/dev/uinput
to allow for members of theuinput
group to write to it - Run
sudo groupadd uinput
to create theuinput
group - Run
sudo usermod -a -G uinput $USER
to add the current user to theuinput
group - Then restart your computer to allow for the udev rule to go into effect
Run osprey
to start the daemon.
Osprey will load the Python scripts that it finds in the Osprey config directory.
For more info, check out the wiki.
A systemd service file is provided here to run Osprey as a systemd service.
To setup this up:
- Copy
osprey.service
to~/.config/systemd/user/
- Run
systemctl --user daemon-reload
From here, you can start the service by running:
systemctl --user start osprey
And you can enable the service to be started when you login with:
systemctl --user enable osprey