This repository contains algorithms for real-time speaker recognition applications. It is implemented using either Gaussian Mixture Model or Convolutional Neural Network. For the GMM part, a dynamic threshold can be used to improve the recognition efficiency, but sharply increases the training time.
Enroll wav files into a model.out and then launch the python script RTSP.py:
cd ./GMM
python3 speaker_recognition.py -t enroll -i ./path/to/wav_files_folder/* -m ./your-output-models/model.out
python3 RTSP.py
A prediction is made every three seconds once the model is loaded, for 15 seconds in total. You can modify the duration by changing the while loop, line 103 (tmp < 5).
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.