Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the shape of data in model #22

Closed
sIHURs opened this issue Jul 10, 2024 · 3 comments
Closed

Question about the shape of data in model #22

sIHURs opened this issue Jul 10, 2024 · 3 comments

Comments

@sIHURs
Copy link

sIHURs commented Jul 10, 2024

Hi,

This is my first project on point cloud deep learning, so there are many aspects I don't quite understand, especially about minkowski engine.

From the dataloader we get a x_full with shape (B, N=180000, C=3) and x_part with shape (B , N=18000, C=3) , and they will be turned to ME.TensorField with shape (B x N, C). x_part should be input into the condition encoder and produce a TensorField with shape like (BxN, C=256), the dim features C will be expended to 256 but the first dim should be as same as the input x_part? but after i tested on code ( with init weight and B=2), i got this shape torch.Size([9598, 256]), i not sure am I understanding correct or not, or its because .sparse() changes tensor's shape?

and for the noise predictor, it should return a TensorField with shape ( BxN, C=3) right? Because this forward calculation seems to require a lot of memory, my computer's GPU cannot handle it, and running it on CPU also fails, so I can't see the output. I'm not sure how much RAM is needed for the process?

thanks very much
yifan

@sIHURs sIHURs changed the title Question about the shape of data within through Unet Question about the shape of data in model Jul 10, 2024
@nuneslu
Copy link
Collaborator

nuneslu commented Jul 10, 2024

Hi,

You are right, the point cloud comes from the dataloader with the shape BxNxC, with B=2, N=180000 and C=3. However, MinkowskiEngine does a hashing over the data and stores the batch dimension as one more coordinate. So, the data gets the form of NxC with N being the total number of points over all the batched point clouds and C=4 (b,x,y,z with b as the batch id).

The reason why you've got torch.Size([9598, 256]) is due to the sparse() function. MinkowskiEngine voxelizes the data to compute the convolutions over it, so when using sparse() it converts the point cloud to a voxelized representation, where you may end up having fewer points than the original point cloud depending on your voxel resolution. From the voxelized data, you can return to the original point cloud shape using the slice() function.

Lastly, regarding the GPU usage. Yes, the point cloud processing takes a lot of memory. You can try to "fix" by doing the following over the 'config.yaml' file: reduce the batch_size to process fewer point clouds per batch, reduce the max_range to have smaller point clouds, and reduce the num_points to have fewer points over your point clouds. Hopefully by changing those parameters you will be able to run the code.

@sIHURs
Copy link
Author

sIHURs commented Jul 10, 2024

Thank you very much for your quick reply and advises!

i have another question about the code in the pipeline:

def completion_loop(self, x_init, x_t, x_cond, x_uncond):
        self.scheduler_to_cuda()

        for t in tqdm.tqdm(range(len(self.dpm_scheduler.timesteps))):
            t = self.dpm_scheduler.timesteps[t].cuda()[None]

            noise_t = self.classfree_forward(x_t, x_cond, x_uncond, t) 
            input_noise = x_t.F.reshape(t.shape[0],-1,3) - x_init # the noise added in the x_feats = scan + torch.randn(scan.shape, device=self.device) 
            x_t = x_init + self.dpm_scheduler.step(noise_t, t, input_noise)['prev_sample']
            x_t = self.points_to_tensor(x_t)

            x_cond, x_uncond = self.reset_partial_pcd(x_cond, x_uncond)
            torch.cuda.empty_cache()

        return x_t.F.cpu().detach().numpy()

in this function x_init is the partial pcd which is sampled from GT but with shape (180000, 3), and x_t is the pcd obtained by adding noise to the partial pct x_init. We get predicted noise from noise_t = self.classfree_forward(x_t, x_cond, x_uncond, t), but I'am not quite understand the rest, Is the following two lines code do the denoise operation in line 4 of the sampling process in DDPM?

Screenshot from 2024-07-10 15-41-08

@nuneslu
Copy link
Collaborator

nuneslu commented Jul 10, 2024

The input_noise = x_t.F.reshape(t.shape[0],-1,3) - x_init gives us the noise added to x_init. Then, self.dpm_scheduler.step(noise_t, t, input_noise)['prev_sample'] will compute the noise at x_{t-1} given x_t and the initial noise added to x_init. This is the DDPM process implemented by huggingface which also allows us to reduce the number of denoising steps during inference. It is a bit different from the algorithm in the image you showed because in the original diffusion formulation x_t is a mixed distribution between the target data and the noise sampled, in our formulation we are just adding the noise to each point as an offset instead of computing a mixed distribution between the target data and the noise sampled. In the main paper we explain it better with more details.

@sIHURs sIHURs closed this as completed Jul 12, 2024
@sIHURs sIHURs reopened this Jul 12, 2024
@sIHURs sIHURs closed this as completed Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants