Understanding convolution kernels in dilation layers #392

redwrasse · 2020-02-10T04:52:43Z

Hi @ibab,
I'm a bit late to the wavenet paper implementation party, but I'm reading the paper and your code and trying to understand where dilated convolution kernels are present. Your ASCII diagram shows


               |-> [gate]   -|        |-> 1x1 conv -> skip output
               |             |-> (*) -|
        input -|-> [filter] -|        |-> 1x1 conv -|
               |                                    |-> (+) -> dense output
               |------------------------------------|

        Where `[gate]` and `[filter]` are causal convolutions with a
        non-linear activation at the output. Biases and global conditioning
        are omitted due to the limits of ASCII art.

The Wavenet paper diagram shows a single 'Dilated Conv' fed into both tanh and sigmoid functions.
From your ASCII diagram and code (which agree), it seems there is in fact not not one dilated convolution but two dilated convolutions, one for the tanh (defining the 'filter'), and one for the sigmoid (defining the 'gate'). Is this correct and is this what was in fact intended in the Wavenet paper?

Additionally could you give justification for the parameter choices not mentioned in the paper?

  '''Implements the WaveNet network for generative audio.

    Usage (with the architecture as in the DeepMind paper):
        dilations = [2**i for i in range(N)] * M
        filter_width = 2  # Convolutions just use 2 samples.
        residual_channels = 16  # Not specified in the paper.
        dilation_channels = 32  # Not specified in the paper.
        skip_channels = 16      # Not specified in the paper.
        net = WaveNetModel(batch_size, dilations, filter_width,
                           residual_channels, dilation_channels,
                           skip_channels)
        loss = net.loss(input_batch)
    '''

Thanks in advance.

The text was updated successfully, but these errors were encountered:

redwrasse · 2020-12-08T21:33:45Z

Answering this for myself from looking through the literature, yes it looks like there are in fact two distinct dilated convolutions passed to the 'gated activation unit'- the original wavenet paper diagrams appear misleading.

cheind · 2021-12-05T17:18:01Z

@redwrasse, I agree that the original paper misses some details here and there. Take a look at (Gated) PixelCNN by WaveNet's main author (https://arxiv.org/pdf/1606.05328.pdf) and you will find that he "copies" the gated activation from there. Also, it seems like they stacked them along the output function dims to spare a conv1d.

For the later, have a look here
https://github.com/cheind/autoregressive/blob/e1f9b72b0f9764f9b4d6b6f65f028cd50db6940e/autoregressive/wave.py#L63

redwrasse · 2021-12-06T03:20:35Z

Thanks @cheind, I'll take a look. A side project I'd like to get back into.

cheind · 2021-12-07T17:49:19Z

@redwrasse, same for me :) I just figured that it works nicely on 2D images as well (without the special architecture of PixelCNN, just plane WaveNet with unrolled images). In addition, once you have the joint distribution the model estimates, you might start to query all kind of things from the model (like given a wavenet conditioned on the speaker id, what is the probability that this speech was spoken by speaker X).

In case you are interested, I have a quite elaborate presentation + code here
https://github.com/cheind/autoregressive/tree/image-support

The branch will be closed soon and merged to main, so I leave a perm-link
https://github.com/cheind/autoregressive/tree/23701bd503843a1de82c6a32ba5bd6e8ad6965a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding convolution kernels in dilation layers #392

Understanding convolution kernels in dilation layers #392

redwrasse commented Feb 10, 2020 •

edited

Loading

redwrasse commented Dec 8, 2020

cheind commented Dec 5, 2021 •

edited

Loading

redwrasse commented Dec 6, 2021

cheind commented Dec 7, 2021 •

edited

Loading

Understanding convolution kernels in dilation layers #392

Understanding convolution kernels in dilation layers #392

Comments

redwrasse commented Feb 10, 2020 • edited Loading

redwrasse commented Dec 8, 2020

cheind commented Dec 5, 2021 • edited Loading

redwrasse commented Dec 6, 2021

cheind commented Dec 7, 2021 • edited Loading

redwrasse commented Feb 10, 2020 •

edited

Loading

cheind commented Dec 5, 2021 •

edited

Loading

cheind commented Dec 7, 2021 •

edited

Loading