This tool is designed to recognize speech in real-time, convert it to text, and automatically copy the text to the system clipboard. The tool leverages API services for speech recognition and uses Python libraries for audio capture and clipboard management.
Before you begin, ensure you have the following ready:
- Python 3.8 or higher: The tool is written in Python, so you'll need Python installed on your system.
- API Key: You will need an API key from a speech recognition service (e.g., OpenAI/Whisper API or a compatible ASR API, such as FunAudioLLM/SenseVoiceSmall at siliconflow or xinference). Make sure you have the necessary credentials.
You can install asr2clip
directly from PyPI using pip
or pipx
:
# Install using pip
pip install asr2clip
# Alternatively, install using pipx (recommended for isolated environments)
pipx install asr2clip
- Clone the repository (if applicable):
git clone https://github.com/Oaklight/asr2clip.git
cd asr2clip
- Install the required Python packages:
pip install -r requirements.txt
If you are using Conda, you can create an environment using the provided environment.yaml
file:
conda env create -f environment.yaml
conda activate asr
- Set up your API key:
- Create a
asr2clip.conf
file in the root directory of the project or in your~/.config/
directory. A sample fileasr2clip.conf.example
is provided. - Add your API key to the
asr2clip.conf
file in YAML format:
- Create a
api_key: your_api_key_here
api_base_url: https://api.openai.com/v1
model_name: whisper-1
- Note for Linux users:
If you are using
pyperclip
on Linux, make sure to installxclip
orxsel
. You can install them using the following commands:
sudo apt-get install xsel # Basic clipboard functionality, same to asr2clip
sudo apt-get install xclip # More advanced functionality, same to asr2clip
- Run the tool:
asr2clip
-
Start speaking:
- The tool will start capturing audio from your microphone.
- It will send the audio to the API for speech recognition.
- The recognized text will be automatically copied to your system clipboard.
-
Stop the tool:
- Press
Ctrl+C
to stop the tool.
- Press
- Transcribe from a file:
You can transcribe an audio file directly by specifying the file path. The tool supports any audio format that
pydub
can handle (e.g., MP3, WAV, FLAC, AAC):
asr2clip --input /path/to/audio/file.mp3
- Read audio data from stdin: You can also pipe audio data directly into the tool:
cat /path/to/audio/file.wav | asr2clip --stdin
- Set recording duration: You can specify the duration of the recording in seconds:
asr2clip --duration 10
-
Output to file or stdout: You can redirect the transcribed text to a file or stdout instead of copying it to the clipboard. Use the
-o
or--output
option:- Output to a file (automatically creates the file or directory):
asr2clip --output /path/to/output.txt
- Output to stdout:
asr2clip --output -
- Output to a file (automatically creates the file or directory):
-
Generate configuration template: Generate a configuration file template and exit:
asr2clip --generate_config
- Quiet mode: Disable logging output:
asr2clip --quiet
- Specify configuration file: Use a custom configuration file path:
asr2clip --config /path/to/config.conf
$ ./asr2clip.py --duration 5
Recording for 5 seconds...
Recording complete.
Transcribing audio...
Transcribed Text:
-----------------
1,2,3,3,2,1. This is the English test.
The transcribed text has been copied to the clipboard.
- Audio not captured: Ensure your microphone is properly connected and configured.
- API errors: Check your API key and ensure you have sufficient credits or permissions.
- Clipboard issues: Ensure
pyperclip
is correctly installed and compatible with your operating system. Linux users need to installxclip
orxsel
. - File output issues: If the specified output file path contains a nonexistent directory, the tool will automatically create it. Please ensure you have the necessary permissions.
If you would like to contribute to this project, please fork the repository and submit a pull request. We welcome any improvements or new features!
This project is licensed under the GNU Affero General Public License v3.0. See the LICENSE file for details.