Welcome to SpeakEasy, a simple yet useful project that combines off-the-shelf LLM models, automatic speech recognition (ASR), and text-to-speech (TTS) methods to create an interactive spoken chatbot.
With SpeakEasy, you can enhance your oral English skills by engaging in conversations with the LLM model. It operates in real-time and offline on your laptop or PC.
Tested on M2 16G MacBook Air / (4070Ti) + 13600KF Ubuntu / (4070Ti) + 13600KF Windows.
- Practice oral English through interactive conversations
- Real-time and offline functionality
- Utilizes ChatGLM3 6B 4-bit quantized model for chat interactions
- Accelerated automatic speech recognition (ASR) with whisper.cpp
- Text-to-speech (TTS) conversion using EfficientSpeech
- Follow the README of Chatglm.cpp to install chatglm.cpp
- Download ChatGLM3 6B-4bit model
- Install whisper.cpp and compile with BLAS / CUBLAS can speed up the inference process
- Install EfficientSpeech for real-time TTS
- Install requirements
pip install -r requirements.txt
- Create local TTS service
cd examples/efficientspeech/ && sh es_tts_service.sh
- Modify the model path of the script and run it:
vim examples/demo.sh cd examples && sh demo.sh
- Press 'SPACE' key to start recording your voice, and press it again to finish.
- If the input device is not your build-in device, modify the
--input-device
to your real input device index. Find the right device index via:import sounddevice as sd input_devices = sd.query_devices(kind='input')
- This project is greatly inspired by Chatglm.cpp.