GitHub - LqNoob/MelodyExtraction-MCDNN: ISMIR2016: Melody extraction on vocal segments using multi-column deep neural networks

LqNoob / MelodyExtraction-MCDNN Public

Notifications You must be signed in to change notification settings
Fork 2
Star 5

ISMIR2016: Melody extraction on vocal segments using multi-column deep neural networks

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SAVE_RESULTS		SAVE_RESULTS
ex_training		ex_training
model		model
viterbi		viterbi
Abstract_KON1.pdf		Abstract_KON1.pdf
MelodyExtraction_SCDNN.py		MelodyExtraction_SCDNN.py
README.txt		README.txt
VAD_DNN.py		VAD_DNN.py
main.py		main.py
making_multi_frame.py		making_multi_frame.py
making_multi_frame_VAD.py		making_multi_frame_VAD.py
melody_extraction_KON1.pdf		melody_extraction_KON1.pdf
myFeatureExtraction.py		myFeatureExtraction.py
mySelect_weight.py		mySelect_weight.py
pop1.wav		pop1.wav
viterbi.py		viterbi.py

Repository files navigation

README.txt

============================================================
** Contact Info 
============================================================
Sangeun Kum <[email protected]>
Changheun Oh <[email protected]>
Juhan Nam <[email protected]>

Korea Advanced Institute of Science and Technology 

============================================================
** Description 
============================================================
This is our submission to the 2016 MIREX melody extraction task.
The algorithm is a classification based approach using deep neural networks.
The file 'main.py' is the main function for calling the algorithm. 
It takes as parameter, input the full path string for the input file and output file.
If you want to know about this algorithms, 
please check https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/119_Paper.pdf

============================================================
** Platform and Requirements
============================================================
1. OS : LINUX 

2. Programming language : Python 2.7

3. Python Library : 
  1) Keras (Deep Learning library for Theano)
    >> http://keras.io/
  
  2) Theano (Backend of Keras)
    >> http://deeplearning.net/software/theano/install.html#install
    
  3) Librosa (for audio analysis such as laod,STFT,resampling)  
    >> http://librosa.github.io/librosa/

  4) ffmpeg 
    >> https://www.ffmpeg.org/
    >> for install : brew install ffmpeg 

  5) Numpy, SciPy

4. Hardware
  1) GPU : GeForce GTX 980 
    >> https://developer.nvidia.com/cuda-toolkit

5. Expected runtime : 2~3 seconds/song 
     
============================================================
** Use 
============================================================
The algorithm is called as follows: 

(to call from the command line)
>>python main.py <parameter> <input path> <ouput path>
ex) >>python main.py 0.2 '/home/keums/Melody/dataset/adc2004_full_set/file/pop3.wav' './SAVE_RESULTS/pop3.txt'

or

(to call from the shell)
>>main(param = 0.2, PATH_LOAD_FILE='/home/keums/Melody/dataset/adc2004_full_set/file/pop4.wav', PATH_SAVE_FILE='./SAVE_RESULTS/pop4.txt')

** default param = 0.2, 
if the voice recall rate is low, increaing the param would be effective (0 <= param <= 1 )