To continue test wav? #11

lukezos · 2017-03-27T20:39:14Z

First: your work is absolutely awesome! The fact that model is capable to generate reasonably sounding signal for many seconds (and hundreds of thousands samples) is awesome.

Second: my question:
I have trained models on music dataset successfully. I would like to see (hear) continuation of given wav file generated by the model. Basically to find out how well and how long model is able to continue input sound.
Please give me a hint how to do that,
thanks!
Lukas

Cortexelus · 2017-03-27T20:57:25Z

Change N_SECS=5 to something longer in two_tier.py or three_tier.py

lukezos · 2017-03-27T21:10:10Z

Thank you, however I think this will just generate longer sequence initialised by (from two_tier.py):

First half zero, others fixed random at each checkpoint

h0 = numpy.zeros(
        (N_SEQS-fixed_rand_h0.shape[0], N_RNN, H0_MULT*DIM),
        dtype='float32'
)
h0 = numpy.concatenate((h0, fixed_rand_h0), axis=0)

My point is to continue "real" (i.e. from test/validate/train npy file) sequence instead and see how model is able to continue current note, tempo, etc. (for music database).
Should I just replace the above initialisation with feeding of "real" sequence?

thanks,
Lukas

kundan2510 · 2017-03-28T00:00:05Z

Basically, you'll use your audio to compute hidden states of the RNN and then you'll use them as initial hidden state when you start generating.

This would amount to inserting a loop like https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L685 before this loop, but samples here will contain the audio that you have (with all preprocessing) but you'll not have this line(https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L702) i.e. you'll not be updating your seeded audio but only get the updated hidden states.

Then, you can use the generation loop with the new hidden states to generate audio.

Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio.

lukezos · 2017-03-29T21:16:05Z

Hi!

Thank you!
I went with the second option: "Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio."

For best results what should be the length of seeded audio, with default running parameters for for three_tier and two_tier models?

kundan2510 · 2017-03-29T22:59:28Z

The longer, the better. In my opinion having around 1-2 seconds should be sufficient to capture the texture of audio, however, it depends on many other things, like kind of data on which model was originally trained on, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To continue test wav? #11

To continue test wav? #11

lukezos commented Mar 27, 2017

Cortexelus commented Mar 27, 2017 via email •

edited

Loading

lukezos commented Mar 27, 2017

kundan2510 commented Mar 28, 2017

lukezos commented Mar 29, 2017

kundan2510 commented Mar 29, 2017

To continue test wav? #11

To continue test wav? #11

Comments

lukezos commented Mar 27, 2017

Cortexelus commented Mar 27, 2017 via email • edited Loading

lukezos commented Mar 27, 2017

First half zero, others fixed random at each checkpoint

kundan2510 commented Mar 28, 2017

lukezos commented Mar 29, 2017

kundan2510 commented Mar 29, 2017

Cortexelus commented Mar 27, 2017 via email •

edited

Loading