This repository accompanies the paper Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters in order to improve reproducibility of the reported results.
If you just want to estimate tempo or key values using models from the paper, please take a look at the tempo-cnn and key-cnn repos. They hosts pre-trained models.
Unfortunately, because of size limitations imposed by GitHub as well as copyright issues, this repository does not contain all audio samples or extracted features. But you can download those and extract them yourself.
Download links:
- GTzan
- Ballroom
- Extended Ballroom
- GiantSteps Key
- GiantSteps Tempo
- GiantSteps MTG Key and Tempo
- LMD Key and Tempo
Should you use any of the datasets in your academic work, please cite the corresponding publications.
All necessary ground truth annotations are in the annotations folder. For easy parsing they are
formatted in a simple tab separated values (.tsv
) format, with columns id \t bpm \t key \t genre \n
. The class
GroundTruth is capable of reading and interpreting these files.
In a clean Python 3.5/3.6 environment:
git clone https:/github.com/hendriks73/directional_cnns.git
cd directional_cnns
python setup.py install
To extract features, you can use the code in feature_extraction.py
or the command line script mentioned below.
Depending on how you define sample identifiers, you may need to make some manual adjustments.
The created .joblib
files are simple dictionaries, containing strings as keys and a spectrograms as values.
Note that the extracted spectrograms for the key and the tempo task differ (CQT vs Mel).
After installation, you may run the extraction using the following command line script:
directional_cnn_extraction -a AUDIO_FILES_FOLDER [-g GROUND_TRUTH.tsv]
The ground truth file is optional. If given, only files that also occur in the ground truth are added
to the created feature .joblib
files.
You can run the code either locally or on Google ML Engine.
Running this locally only makes sense on a GPU and even then it will take very long.
To run the training/reporting locally, you can execute the script training.py or the command line script mentioned below with the following arguments (example for key):
--job-dir=./
--model-dir=./
--train-file=annotations/key_train.tsv --valid-file=annotations/key_valid.tsv
--test-files=annotations/giantsteps-key.tsv,annotations/gtzan_key.tsv,annotations/lmd_key_test.tsv
--feature-files=features/giantsteps_key.joblib,features/mtg_tempo_key.joblib,features/gtzan_key.joblib,features/lmd_key.joblib
After installation, you may run the training code using the following command line script:
directional_cnn_training [arguments]
To run the training/reporting remotely on Google ML Engine, you first need to sign up, upload all necessary feature- and annotation-files to Google storage and then adapt the provided scripts trainandpredict_key_ml_engine.sh and trainandpredict_tempo_ml_engine.sh accordingly.
This repository is licensed under CC BY 3.0. For attribution, please cite:
Hendrik Schreiber and Meinard Müller, Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters, In Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain, May 2019.