Skip to content

CS337 (AI-ML) course project | Music Transcription

Notifications You must be signed in to change notification settings

Atishay25/tones2notes

Repository files navigation

tones2notes

Automatic Music Transcription (AMT) refers to the task of transcribing a given audio into symbolic representations (musical notes or MIDI). In this project, the goal is to transcribe musical recordings into music note events with pitch, onset, offset, and velocity. It is a challenging task due to the high polyphony of music pieces and requires appropriate data processing for audio files. We have implemented and evaluated Deep Learning models for music transcription. The architectural design of models and data processing techniques are based on this paper.

Running Instructions

  • The dataset used is MAPS, which can be downloaded from here. After downloading it, store it in data/MAPS
  • Install the required python packages by
    pip install -r requirements.txt 
    
  • Loading the dataset, splitting it and storing in .h5 binaries -
    python3 features.py --dir data/MAPS --workspace $(pwd)
    
  • Training the model (includes both processing features and training)
    python3 src/main.py train --model_type='CRNN_Conditioning' --loss_type='regress_onset_offset_frame_velocity_bce' --batch_size=8 --max_note_shift=0 --learning_rate=5e-4 --reduce_iteration=10000 --resume_iteration=0 --early_stop=50000 --workspace=$(pwd) --cuda
    
    We have implemented 3 models, choose the model_type among ['CRNN', 'CCNN', 'CRNN_Conditioning']. Also, there are 2 loss functions available (regressed and non-regressed). Refer to the comments in run.sh for more info. The trained model will be stored at checkpoints in checkpoints folder with training stats in statistics folder
  • Infering the output probabilities on Test dataset and storing them in probs folder
    python3 src/results.py infer_prob --model_type='CRNN_Conditioning' --checkpoint_path=$CHECKPOINT_PATH --dataset='maps' --split='test' --post_processor_type='regression'  --workspace=$WORKSPACE --cuda 
    
  • Evaluating the Test dataset
    python3 src/results.py calculate_metrics --model_type='CRNN_Conditioning' --dataset='maps' --split='test' --post_processor_type='regression' --workspace=$WORKSPACE 
    

Also, there are some result plots present in notebooks/plots.ipynb and piano roll with MIDI notes of a transcripted audio present in transcription_plots.ipynb

For Transcribing a given Audio

python3 src/transcribe_and_play.py --audio_file <name of audio file>

It will transcribe the given audio using the best checkpoint model into MIDI, generate the MIDI file and also generate a video using synthviz library corresponding to the MIDI, displaying the notes played. Note that transcription requires ffmpeg backend and therefore does not work on gpu1.cse.iitb.ac.in, unless you install it with sudo permissions

Transcription Results

  • Piano Roll Comparison for an audio from MAPS Test dataset Piano Roll

  • Piano Roll of L theme (Death Note) Piano Roll

  • Fur elise. The original music is this

    fur_elise_transcripted.mp4
  • L theme (Death Note). The Original music is this

    L_original_transcripted.mp4
  • Nezuko Theme (Demon Slayer). The original music is this

    nezuko_transcripted.mp4
  • A musical piece from Aajkal tere mere pyar ke charche in Accordion. The original audio is this

    aajkal_transcripted.mp4
  • Nagin. Notice that there is a lot of noise due to multiple instruments being played together (polyphonic music)

    Nagin_transcripted.mp4

References

  • Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, and Yuxuan Wang. ”High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times.” arXiv preprint arXiv:2010.01815 (2020).
  • bytedance and kong's repositories for data processing technique's and model architecture
  • Valentin Emiya, Nancy Bertin, Bertrand David, Roland Badeau. MAPS - A piano database for multipitch estimation and automatic transcription of music
  • This repository for information about datasets and understanding transcription pipeline

About

CS337 (AI-ML) course project | Music Transcription

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •