This project is a local talking LLM (Large Language Model) application that allows users to interact with an AI assistant through voice commands. The assistant uses Whisper for speech-to-text conversion, Ollama for generating responses, and Bark for text-to-speech synthesis.
- Speech-to-Text: Converts user's voice input to text using Whisper.
- Text Generation: Generates responses using the Ollama language model.
- Text-to-Speech: Converts the generated text responses back to speech using Bark.
- Interactive Chat Interface: Displays the conversation in a chat bubble format for a more interactive experience.
- Python 3.8 or higher
- Node.js 14 or higher
- pip for Python package management
- npm for Node.js package management
- Clone the repository
git clone https://github.com/emilytin0206/audio-llm.git
- Virtual environment u can use miniconda3
- Install Python dependencies
pip install -r requirements.txt
- Run the backend server
python app.py
- Frontend setup
cd talking-llm-ui npm install npm start
- Recording: Press the record button and speak into your microphone.
- Transcription: The audio is sent to the backend server where Whisper transcribes it to text.
- Response Generation: The transcribed text is processed by Ollama to generate a response.
- Text-to-Speech: The response is converted back to speech using Bark and played back to the user.
- Chat Interface: The conversation is displayed in chat bubbles for easy readability.