Reporting bugs/errors in `lecture3_flow_models_demos.ipynb` #7

vinsis · 2020-07-12T08:38:22Z

First of all thank you for making all of the lectures and other content public. This is really helpful.

I took a look at the demo implementations for lecture 3 and found some bugs which I am reporting here:

1. In `Demo 3`, the `.flow()` method in `class ConditionalMixtureCDFFlow(nn.Module):` has the following signature:

def flow(self, x, cond):

However when .flow() is called by .invert() method, the condition cond is not passed to .flow():

def invert(self, z, cond):
        # Find the exact x via bisection such that f(x) = z
        results = []
        for z_elem in z:
            def f(x):
                # SHOULD PASS `cond` in the line below
                return self.flow(torch.tensor(x).unsqueeze(0))[0] - z_elem
            x = bisect(f, -20, 20)
            results.append(x)
        return torch.tensor(results).reshape(z.shape)

2. In `Demo 4` the `.forward()` method of `MaskConv2d` never uses `cond` or `batch_size`:

class MaskConv2d(nn.Conv2d):
  def __init__(self, mask_type, *args, **kwargs):
    assert mask_type == 'A' or mask_type == 'B'
    super().__init__(*args, **kwargs)
    self.register_buffer('mask', torch.zeros_like(self.weight))
    self.create_mask(mask_type)

  def forward(self, input, cond=None):
    # batch_size AND cond ARE NEVER USED
    batch_size = input.shape[0]
    out = F.conv2d(input, self.weight * self.mask, self.bias, self.stride,
                   self.padding, self.dilation, self.groups)
    return out

So it has no effect when it is called by the .forward() method of AutoregressiveFlowPixelCNN like so:

      if isinstance(layer, MaskConv2d):
        out = layer(out, cond=cond)
      else:
        out = layer(out)

3. In `Demo 4` the `.nll()` method of AutoregressiveFlowPixelCNN does not take exponential of `log_prob` and use `weights` when calculating `log_det_jacobian`:

    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

I think it should be something like:

log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp() * weights

I actually have lots of questions about why .nll() is implemented the way it is. Why the need to .unsqueeze(1).repeat(...) rather than just multiplying it the standard way? Where is the base_dist that forces the output of the transformed variables to have a known distribution?

Looking at the .sample() method it seems the weights are used to select mean and var for a sample. But how are they learnt? Regardless of how it's being used, the weights are not used in the .nll() function and thus should not be calculated.

Please let me know if I am missing something here.

The text was updated successfully, but these errors were encountered:

vinsis · 2020-07-14T02:50:34Z

Where is the base_dist that forces the output of the transformed variables to have a known distribution?

I think I understand this part. Since z is uniform, its log_prob is a constant. Thus log x = constant + log_det_jacobian. Thus maximizing log x is the same as maximizing log_det_jacobian.

suryabulusu · 2020-09-11T19:45:11Z

Hi @vinsis - some thoughts on the bugs mentioned above.

I think cond can be used when you have labels, such as digit labels for MNIST. There is an example in Autoregressive models demo on this idea.
Yes, since log (Uniform) is constant, it is not considered in nll().
Sampling made no sense to me. What are we sampling here? Is it z or x? I'm assuming it is x we are sampling because the sampled images look very close to the original. Had we sampled z, it would (should!) look like points sampled from Unif(0, 1) X Unif(0, 1) but that's not the case here.
-- the function samples from torch.normal() instead of NormalCDF(). Why? Shouldn't we sample from MixtureCDFNormal()? And here, we get z, not x
-- to get x, we have to implement invert_flow() (and therefore, flow() in AutoRegressivePixelCNN class) and use scipy.optimise.bisect() on z obtained previously.
I'm definitely missing something here -- because the output of demo4_save_results() is strikingly good.

I think the model is just PixelCNN + Mixture of Gaussian. It's not a "flow" as such. In the loss function, log_det_jacobian is not really related to any Jacobian, but just nll( MixOfGaussian() ). If this is the case, the sampling makes sense too. There is no z.

In general, I am curious if flows are ever used to sample from original distribution at all! They are good for inference, and plotting pdf(x) (pdf(z) * det J). Maybe we should look at Inverse Autoregressive Flows?

Pinging @wilson1yan @alexlioralexli for help. Thanks!

wilson1yan · 2020-09-12T01:41:09Z

Yes, I believe the weights need to be accounted for when computing the log probability (using logsumexp would probably be the easiest / most numerically stable manner)

For 4., it is sampling from the learned distribution of x. You're right in that under this flow formulation (with z ~ Uniform([0,1}) + outputting parameters for a MixtureCDF), it's equivalent to optimizing maximum likelihood. So, you can sample by either sampling from an autoregressive model the normal way, or sample by sampling z ~ Uniform([0,1]) and then inverting f using bisection (like you mentioned).

Side-note: It is a valid flow, and the terms do relate to the actual jacobian diagonal terms, as the log_derivative of the CDF is the log of the PDF. Note that if our target z distribution was not uniform, and, say, normal, then it would not be equivalent to maximum likelihood anymore.

I'm not sure what you mean by original distribution - is that z or x? x is probably the more common one like samples shown in RealNVP / Glow / VideoFlow. But flows have other uses such as learning more complex distributions (i.e. Inverse Autoregressive Flows and Variational Lossy Autoencoder)

vinsis · 2021-02-09T17:09:11Z

The way log prob is calculated seems faulty to me @wilson1yan

After the step loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1), each of loc, log_scale and weight_logits has the shape (batch_size, n_components, num_channels, height, width)

But when calculating the log_prob, x is manipulated like so:

x.unsqueeze(1).repeat(1,1,self.n_components,1,1)

which makes the shape of x to (batch_size, 1, n_components * num_channels, height, width) (assuming initially x had shape (batch_size, num_channels, height, width))

This makes n_components multiplied twice in two dimensions due to broadcasting. I believe just the operation x.unsqueeze(1) should suffice.

I implemented this flow myself and am struggling to get decent results. So I won't be surprised if I missed something elementary.

Edit: Just ran the notebook myself and x.unsqueeze(1) indeed works. Initially I was getting poor results for another reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb` #7

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb` #7

vinsis commented Jul 12, 2020

vinsis commented Jul 14, 2020

suryabulusu commented Sep 11, 2020 •

edited

Loading

wilson1yan commented Sep 12, 2020 •

edited

Loading

vinsis commented Feb 9, 2021 •

edited

Loading

Reporting bugs/errors in lecture3_flow_models_demos.ipynb #7

Reporting bugs/errors in lecture3_flow_models_demos.ipynb #7

Comments

vinsis commented Jul 12, 2020

1. In Demo 3, the .flow() method in class ConditionalMixtureCDFFlow(nn.Module): has the following signature:

2. In Demo 4 the .forward() method of MaskConv2d never uses cond or batch_size:

3. In Demo 4 the .nll() method of AutoregressiveFlowPixelCNN does not take exponential of log_prob and use weights when calculating log_det_jacobian:

vinsis commented Jul 14, 2020

suryabulusu commented Sep 11, 2020 • edited Loading

wilson1yan commented Sep 12, 2020 • edited Loading

vinsis commented Feb 9, 2021 • edited Loading

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb` #7

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb` #7

1. In `Demo 3`, the `.flow()` method in `class ConditionalMixtureCDFFlow(nn.Module):` has the following signature:

2. In `Demo 4` the `.forward()` method of `MaskConv2d` never uses `cond` or `batch_size`:

3. In `Demo 4` the `.nll()` method of AutoregressiveFlowPixelCNN does not take exponential of `log_prob` and use `weights` when calculating `log_det_jacobian`:

suryabulusu commented Sep 11, 2020 •

edited

Loading

wilson1yan commented Sep 12, 2020 •

edited

Loading

vinsis commented Feb 9, 2021 •

edited

Loading