-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding convolution kernels in dilation layers #392
Comments
Answering this for myself from looking through the literature, yes it looks like there are in fact two distinct dilated convolutions passed to the 'gated activation unit'- the original wavenet paper diagrams appear misleading. |
@redwrasse, I agree that the original paper misses some details here and there. Take a look at (Gated) PixelCNN by WaveNet's main author (https://arxiv.org/pdf/1606.05328.pdf) and you will find that he "copies" the gated activation from there. Also, it seems like they stacked them along the output function dims to spare a conv1d. For the later, have a look here |
Thanks @cheind, I'll take a look. A side project I'd like to get back into. |
@redwrasse, same for me :) I just figured that it works nicely on 2D images as well (without the special architecture of PixelCNN, just plane WaveNet with unrolled images). In addition, once you have the joint distribution the model estimates, you might start to query all kind of things from the model (like given a wavenet conditioned on the speaker id, what is the probability that this speech was spoken by speaker X). In case you are interested, I have a quite elaborate presentation + code here The branch will be closed soon and merged to main, so I leave a perm-link |
Hi @ibab,
I'm a bit late to the wavenet paper implementation party, but I'm reading the paper and your code and trying to understand where dilated convolution kernels are present. Your ASCII diagram shows
The Wavenet paper diagram shows a single 'Dilated Conv' fed into both tanh and sigmoid functions.
From your ASCII diagram and code (which agree), it seems there is in fact not not one dilated convolution but two dilated convolutions, one for the tanh (defining the 'filter'), and one for the sigmoid (defining the 'gate'). Is this correct and is this what was in fact intended in the Wavenet paper?
Additionally could you give justification for the parameter choices not mentioned in the paper?
Thanks in advance.
The text was updated successfully, but these errors were encountered: