Skip to content

Automatic Speaker Recognition analysis with deep neaural networks.

Notifications You must be signed in to change notification settings

tojoos/SpeakerRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Recognition using Deep Neural Networks

Introduction

This repository contains the code and results for my Master's thesis, which focuses on developing a robust speaker recognition system using deep neural networks. The goal of this project was to create a model capable of accurately identifying individuals based on their voice.

metodologia-diagram

Methodology

  • Data Preprocessing:
    • Conversion of raw audio signals into spectrograms using short-time Fourier transform (STFT).
  • Feature Extraction:
    • Experimentation with various feature extraction techniques, including:
      • Mel-Frequency Cepstral Coefficients (MFCC)
      • Spectral contrast
      • Mel spectrograms
    • Evaluation of different feature representations based on their ability to capture speaker-specific information.
  • Model Architecture:
    • Design of a custom convolutional neural network (BetterCNN) tailored for speaker recognition.
    • Comparison with the widely used ResNet50 architecture.
  • Experiments and Results:
    • Evaluation of the proposed model on multiple datasets (50_speakers, LibriSpeech, TIMIT).
    • Detailed analysis of the performance metrics (accuracy, F1-score).
    • Comparison of BetterCNN with ResNet50 and other baseline models.

image image image

Results

Mel spectrograms were found to be the most effective feature representation for speaker recognition in this study. The proposed BetterCNN model consistently outperformed ResNet50 on all datasets, demonstrating its superior ability to capture the nuances of human speech.

  • Key findings:
    • BetterCNN achieved an F1-score of 96.11% and accuracy of 96.24% on the 50_speakers dataset.
    • On the LibriSpeech dataset, BetterCNN reached an accuracy of over 99.75%.

Conclusion

This research highlights the effectiveness of deep learning techniques for speaker recognition. The proposed BetterCNN model offers a promising approach for developing accurate and efficient speaker identification systems. Future work could explore:

  • Larger datasets: Training on more diverse and larger datasets.
  • Advanced architectures: Exploring more complex neural network architectures (e.g., transformers).
  • Multimodal approaches: Combining audio with other biometric modalities (e.g., facial images).

image

About

Automatic Speaker Recognition analysis with deep neaural networks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published