Inspired by Le-LWTNet proposal [4], successfully implementing a Lifting wavelet transform approach for feature extraction, and the successful performance of classic MFCCs for the same task [5], we propose to combine these ideas with the filtering approaches presented in [3] and [2] in order to build a new general frontend for audio processing chains that can be used on resource restricted environments.
License is contained in LICENSE file, available in the root of this repository.
- Checkout submodules:
git submodule update --init --recursive
- Setup and activate the python environment
Hint: use pyenv to manage Python versions.
pyenv local 3.10
before running the next lines. Follow instructions here to set up your shell (zsh, bash).
FWICU torch version depends on the CUDA version, which you can figure out by calling nvidia-smi
In our case, current machines run CUDA 11.8, this means that you should install torch compiled for that version, this way:
python -m pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url --no-cache-dir
(unclear to me how to generate requirements.txt
so that it calls the right torch version...)
The proper way:
pip-compile -o requirements.txt
python -m piptools compile
From [1]:
Now that you have a requirements.txt, you can use pip-sync to update your virtual environment to reflect exactly what's in there. pip-sync is meant to be used only with a requirements.txt generated by pip-compile
pip-sync requirements.txt
To run a training:
python wavefront/ -c wavefront/config_model_dataset.yaml
for example:
python wavefront/ -c wavefront/config_MFCCNet_ESC50.yaml
To log and look at the progress, type in the bash:
tensorboard --bind_all --logdir lightning_logs
NOTE: To train on TIMIT we have preprocessed the dataset as proposed in [2].
[2] Mirco Ravanelli and Yoshua Bengio. Speaker recognition from raw waveform with
SincNet. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 1021–
1028, 2018.
[3] Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, and Marco Tagliasacchi.
Leaf: A learnable frontend for audio classification, 2021.
[4] Junchao Fan, Shizhan Tang, Han Duan, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo
Gao. Le-LWTnet: A learnable lifting wavelet convolutional neural network for heart
sound abnormality detection. IEEE Transactions on Instrumentation and Measurement, 72:1–14, 2023.
[5] Vibha Tiwari. Mfcc and its applications in speaker recognition. International journal
on emerging technologies, 1(1):19–22, 2010.