This project combines OpenVoice TTS (Text-to-Speech) with a local LLM (Language Model) to create an interactive voice assistant. You can speak to the assistant using your microphone, and it will transcribe your speech, generate a response using the local LLM, and then speak the response back to you using OpenVoice TTS.
- Prerequisites
- Installation
- Usage
- Configuration
- Tips and Advanced Usage
- Windows Installation (VS Code)
- License
- Acknowledgements
- Contributing
- Contact
Before running the code, make sure you have the following dependencies installed:
- Python 3.7+
- PyTorch
- OpenAI Whisper
- OpenVoice
- Simpleaudio
- Sounddevice
- Visit the LM Studio website and follow the installation instructions for your operating system.
- Make sure to set up the LM Studio API key and endpoint URL.
-
Clone the repository:
git clone https://github.com/your-username/openvoice-local-llm.git
-
Navigate to the project directory:
cd openvoice-local-llm
-
Install the required dependencies using pip:
pip install -r requirements.txt
-
Install OpenVoice:
conda create -n openvoice python=3.9 conda activate openvoice git clone [email protected]:myshell-ai/OpenVoice.git cd OpenVoice pip install -e .
-
Download and install the necessary models and checkpoints for OpenVoice and Whisper:
- OpenVoice TTS: Download the checkpoint from here and extract it to the
checkpoints
folder. - Whisper: The model will be automatically downloaded when you run the code for the first time.
- OpenVoice TTS: Download the checkpoint from here and extract it to the
-
Make sure your microphone is connected and properly configured.
-
Run the
test_ai.py
script:python test_ai.py
-
Press Enter to start recording your speech. Speak clearly into the microphone.
-
The assistant will transcribe your speech, generate a response using the local LLM, and then speak the response back to you using OpenVoice TTS.
-
To exit the program, type 'quit' and press Enter.
You can customize the following parameters in the test_ai.py
script:
base_url
: The URL of your local LLM server.api_key
: The API key for your local LLM server.config_path
: The path to the OpenVoice TTS configuration file.checkpoint_path
: The path to the OpenVoice TTS checkpoint file.whisper_model
: The name of the Whisper model to use for speech recognition.sample_rate
: The sample rate for audio recording.output_path
: The path where the generated audio files will be saved.
Please see demo_part1.ipynb for an example usage of how OpenVoice enables flexible style control over the cloned voice.
Please see demo_part2.ipynb for an example for languages seen or unseen in the MSML training set.
We provide a minimalist local gradio demo. Launch it with python -m openvoice_app --share
.
The base speaker model can be replaced with any model (in any language and style) that the user prefers. Use the se_extractor.get_se
function as demonstrated in the demo to extract the tone color embedding for the new base speaker.
There are many single or multi-speaker TTS methods that can generate natural speech and are readily available. By simply replacing the base speaker model with the model you prefer, you can push the speech naturalness to a level you desire.
Please use this guide if you want to install and use OpenVoice on Windows.
This project is open-source and available under the MIT License.
- OpenVoice - Open-source voice cloning project
- OpenAI Whisper - Automatic speech recognition (ASR) system
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
If you have any questions or inquiries, please contact the project maintainer at [email protected] - Preston McCauley
Happy interacting with your voice assistant!