Inspired by Le-LWTNet proposal [4], successfully implementing a Lifting wavelet transform approach for feature extraction, and the successful performance of classic MFCCs for the same task [5], we propose to combine these ideas with the filtering approaches presented in [3] and [2] in order to build a new general frontend for audio processing chains that can be used on resource restricted environments.
License is contained in LICENSE file, available in the root of this repository.
- Checkout submodules:
git submodule update --init --recursive
- Setup and activate the python environment
Hint: use pyenv to manage Python versions.
Run
pyenv local 3.10
before running the next lines. Follow instructions here https://github.com/pyenv/pyenv to set up your shell (zsh, bash).
FWICU torch version depends on the CUDA version, which you can figure out by calling nvidia-smi
.
In our case, current machines run CUDA 11.8, this means that you should install torch compiled for that version, this way:
python -m pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
(unclear to me how to generate requirements.txt
from requirements.in
so that it calls the right torch version...)
The proper way:
pip-compile -o requirements.txt requirements.in
or
python -m piptools compile requirements.in
From [1]:
Now that you have a requirements.txt, you can use pip-sync to update your virtual environment to reflect exactly what's in there. pip-sync is meant to be used only with a requirements.txt generated by pip-compile
Do:
pip-sync requirements.txt
To run a training:
python wavefront/wavefront_main.py -c wavefront/config_model_dataset.yaml
for example:
python wavefront/wavefront_main.py -c wavefront/config_MFCCNet_ESC50.yaml
To log and look at the progress, type in the bash:
tensorboard --bind_all --logdir lightning_logs
NOTE: To train on TIMIT we have preprocessed the dataset as proposed in [2].
[1] https://pypi.org/project/pip-tools/
[2] Mirco Ravanelli and Yoshua Bengio. Speaker recognition from raw waveform with
SincNet. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 1021–
1028, 2018. https://github.com/mravanelli/SincNet
[3] Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, and Marco Tagliasacchi.
Leaf: A learnable frontend for audio classification, 2021.
[4] Junchao Fan, Shizhan Tang, Han Duan, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo
Gao. Le-LWTnet: A learnable lifting wavelet convolutional neural network for heart
sound abnormality detection. IEEE Transactions on Instrumentation and Measurement, 72:1–14, 2023.
[5] Vibha Tiwari. Mfcc and its applications in speaker recognition. International journal
on emerging technologies, 1(1):19–22, 2010.