Question about the shape of data in model #22

sIHURs · 2024-07-10T09:47:18Z

Hi,

This is my first project on point cloud deep learning, so there are many aspects I don't quite understand, especially about minkowski engine.

From the dataloader we get a x_full with shape (B, N=180000, C=3) and x_part with shape (B , N=18000, C=3) , and they will be turned to ME.TensorField with shape (B x N, C). x_part should be input into the condition encoder and produce a TensorField with shape like (BxN, C=256), the dim features C will be expended to 256 but the first dim should be as same as the input x_part? but after i tested on code ( with init weight and B=2), i got this shape torch.Size([9598, 256]), i not sure am I understanding correct or not, or its because .sparse() changes tensor's shape?

and for the noise predictor, it should return a TensorField with shape ( BxN, C=3) right? Because this forward calculation seems to require a lot of memory, my computer's GPU cannot handle it, and running it on CPU also fails, so I can't see the output. I'm not sure how much RAM is needed for the process?

thanks very much
yifan

The text was updated successfully, but these errors were encountered:

nuneslu · 2024-07-10T12:47:46Z

Hi,

You are right, the point cloud comes from the dataloader with the shape BxNxC, with B=2, N=180000 and C=3. However, MinkowskiEngine does a hashing over the data and stores the batch dimension as one more coordinate. So, the data gets the form of NxC with N being the total number of points over all the batched point clouds and C=4 (b,x,y,z with b as the batch id).

The reason why you've got torch.Size([9598, 256]) is due to the sparse() function. MinkowskiEngine voxelizes the data to compute the convolutions over it, so when using sparse() it converts the point cloud to a voxelized representation, where you may end up having fewer points than the original point cloud depending on your voxel resolution. From the voxelized data, you can return to the original point cloud shape using the slice() function.

Lastly, regarding the GPU usage. Yes, the point cloud processing takes a lot of memory. You can try to "fix" by doing the following over the 'config.yaml' file: reduce the batch_size to process fewer point clouds per batch, reduce the max_range to have smaller point clouds, and reduce the num_points to have fewer points over your point clouds. Hopefully by changing those parameters you will be able to run the code.

sIHURs · 2024-07-10T13:48:02Z

Thank you very much for your quick reply and advises!

i have another question about the code in the pipeline:

def completion_loop(self, x_init, x_t, x_cond, x_uncond):
        self.scheduler_to_cuda()

        for t in tqdm.tqdm(range(len(self.dpm_scheduler.timesteps))):
            t = self.dpm_scheduler.timesteps[t].cuda()[None]

            noise_t = self.classfree_forward(x_t, x_cond, x_uncond, t) 
            input_noise = x_t.F.reshape(t.shape[0],-1,3) - x_init # the noise added in the x_feats = scan + torch.randn(scan.shape, device=self.device) 
            x_t = x_init + self.dpm_scheduler.step(noise_t, t, input_noise)['prev_sample']
            x_t = self.points_to_tensor(x_t)

            x_cond, x_uncond = self.reset_partial_pcd(x_cond, x_uncond)
            torch.cuda.empty_cache()

        return x_t.F.cpu().detach().numpy()

in this function x_init is the partial pcd which is sampled from GT but with shape (180000, 3), and x_t is the pcd obtained by adding noise to the partial pct x_init. We get predicted noise from noise_t = self.classfree_forward(x_t, x_cond, x_uncond, t), but I'am not quite understand the rest, Is the following two lines code do the denoise operation in line 4 of the sampling process in DDPM?

nuneslu · 2024-07-10T18:28:26Z

The input_noise = x_t.F.reshape(t.shape[0],-1,3) - x_init gives us the noise added to x_init. Then, self.dpm_scheduler.step(noise_t, t, input_noise)['prev_sample'] will compute the noise at x_{t-1} given x_t and the initial noise added to x_init. This is the DDPM process implemented by huggingface which also allows us to reduce the number of denoising steps during inference. It is a bit different from the algorithm in the image you showed because in the original diffusion formulation x_t is a mixed distribution between the target data and the noise sampled, in our formulation we are just adding the noise to each point as an offset instead of computing a mixed distribution between the target data and the noise sampled. In the main paper we explain it better with more details.

sIHURs changed the title ~~Question about the shape of data within through Unet~~ Question about the shape of data in model Jul 10, 2024

sIHURs closed this as completed Jul 12, 2024

sIHURs reopened this Jul 12, 2024

sIHURs closed this as completed Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the shape of data in model #22

Question about the shape of data in model #22

sIHURs commented Jul 10, 2024

nuneslu commented Jul 10, 2024

sIHURs commented Jul 10, 2024

nuneslu commented Jul 10, 2024

Question about the shape of data in model #22

Question about the shape of data in model #22

Comments

sIHURs commented Jul 10, 2024

nuneslu commented Jul 10, 2024

sIHURs commented Jul 10, 2024

nuneslu commented Jul 10, 2024