The goal of this project is to learn how to apply machine learning techniques to produce music. In this project, I trained and deployed two RNN models with different configurations using a dataset of pop/electronic melodies. The piano melodies were extracted from songs in MIDI format and converted into note sequences using one-hot encoding. The trained models are capable of generating monophonic melodies given a primer melody. The coolest part of the project is interacting with the model utilizing Magenta’s midi interface in Ableton. This setup enables you to generate AI music based on melodies played in real-time.
# Create a new environment for Magenta with Python 3.6.x as interpreter
conda create --name magenta python=3.6
# Then activate it
conda activate magenta
# Then you can install Magenta 2.1.2 and the dependecies
pip install magenta=2.1.2 visual_midi tables
In this project I'll use the "The Lakh MIDI Dataset v0.1" and matched content from "The Million Song Dataset."
I'll fetch a song's genre using the Last.fm API
-
LMD-matched - A subset of 45,129 files from LMD-full which have been matched to entries in the Million Song Dataset.
-
Match scores - A json file which lists the match confidence score for every match in LMD-matched and LMD-aligned.
-
Dataset not provided in this repo
- Instrument Class of the entire dataset
-
Extract only Piano tracks of Pop and Electronic
-
Distribution of Piano lengths of Pop and Electronic Songs
-
Melody RNN (basic configuration)
- This configuration acts as a baseline for melody generation with an LSTM model. It uses basic one-hot encoding to represent extracted melodies as input to the LSTM. For training, all sequence examples are transposed to the MIDI pitch range [48,84] and outputs will also be in this range.
-
Melody RNN (attention configuration)
- Attention allows the model to more easily access past information without having to store that information in the RNN cell's state. This allows the model to more easily learn longer term dependencies, and results in melodies that have longer arching themes.
- I'll generate melodies by priming the baseline and Attention models with 2.5 seconds of main melody of Uptown Funk.
- As you can see here, the primer MIDI and how the attention was able to generate a longer arching theme from the primer.