This project focuses on classifying a dataset of 65,000 one-second audio utterances into 30 distinct words using a Gaussian Mixture Model (GMM). The classification task adheres to competition guidelines and utilizes classical machine learning techniques. Python scripts were developed to preprocess data, train the GMM model, and classify test audio files accurately.
- Audio Classification: Utilizes a Gaussian Mixture Model (GMM) to classify 65,000 one-second audio utterances into 30 distinct words.
- Data Processing: Implements classical machine learning techniques for data preprocessing and model creation.
- Python Scripts: Developed Python scripts ensure accurate classification of test audio files, emphasizing precision and scalability in audio recognition tasks.
- Gaussian Mixture Model (GMM): Selected for its suitability in handling audio data and achieving high classification accuracy.
- Feature Extraction: Utilizes Mel-frequency cepstral coefficients (MFCCs) and their derivatives for capturing audio features.
- Data Handling: Detailed steps for data preprocessing, feature extraction, and model training are included in the provided Python scripts.
Kaggle_2.py
: Python script for preparing the audio data, including feature extraction and normalization. It is also used for training the Gaussian Mixture Model (GMM) on the preprocessed data.script.py
: Script for classifying test audio files using the trained model.