Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To continue test wav? #11

Open
lukezos opened this issue Mar 27, 2017 · 5 comments
Open

To continue test wav? #11

lukezos opened this issue Mar 27, 2017 · 5 comments

Comments

@lukezos
Copy link

lukezos commented Mar 27, 2017

First: your work is absolutely awesome! The fact that model is capable to generate reasonably sounding signal for many seconds (and hundreds of thousands samples) is awesome.

Second: my question:
I have trained models on music dataset successfully. I would like to see (hear) continuation of given wav file generated by the model. Basically to find out how well and how long model is able to continue input sound.
Please give me a hint how to do that,
thanks!
Lukas

@Cortexelus
Copy link

Cortexelus commented Mar 27, 2017 via email

@lukezos
Copy link
Author

lukezos commented Mar 27, 2017

Thank you, however I think this will just generate longer sequence initialised by (from two_tier.py):

First half zero, others fixed random at each checkpoint

h0 = numpy.zeros(
        (N_SEQS-fixed_rand_h0.shape[0], N_RNN, H0_MULT*DIM),
        dtype='float32'
)
h0 = numpy.concatenate((h0, fixed_rand_h0), axis=0)

My point is to continue "real" (i.e. from test/validate/train npy file) sequence instead and see how model is able to continue current note, tempo, etc. (for music database).
Should I just replace the above initialisation with feeding of "real" sequence?

thanks,
Lukas

@kundan2510
Copy link
Collaborator

Basically, you'll use your audio to compute hidden states of the RNN and then you'll use them as initial hidden state when you start generating.

This would amount to inserting a loop like https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L685 before this loop, but samples here will contain the audio that you have (with all preprocessing) but you'll not have this line(https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L702) i.e. you'll not be updating your seeded audio but only get the updated hidden states.

Then, you can use the generation loop with the new hidden states to generate audio.

Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio.

@lukezos
Copy link
Author

lukezos commented Mar 29, 2017

Hi!

Thank you!
I went with the second option: "Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio."

For best results what should be the length of seeded audio, with default running parameters for for three_tier and two_tier models?

@kundan2510
Copy link
Collaborator

The longer, the better. In my opinion having around 1-2 seconds should be sufficient to capture the texture of audio, however, it depends on many other things, like kind of data on which model was originally trained on, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants