Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU normal estimation does not work with CUDA 12 #5846

Closed
jenadzitsiuk opened this issue Oct 16, 2023 · 14 comments
Closed

GPU normal estimation does not work with CUDA 12 #5846

jenadzitsiuk opened this issue Oct 16, 2023 · 14 comments
Labels

Comments

@jenadzitsiuk
Copy link

jenadzitsiuk commented Oct 16, 2023

PCL 1.13
Cuda 12.2
Ubuntu 22.04

Is there a way to get normal's curvature for GPU module, similar how its done on CPU? (there its stored in Normal::curvature)
This thread suggests to use data[3] from PointXYZ, hovewer it does not give me any results (zeros there)

pcl::gpu::NormalEstimation gpu_ne;
pcl::gpu::Feature::PointCloud gpu_cloud;
pcl::gpu::Feature::Normals gpu_normals;
gpu_cloud.upload(input_cloud->points);
gpu_ne.setInputCloud(gpu_cloud);
gpu_ne.setRadiusSearch(0.04f, 50);
gpu_ne.compute(gpu_normals);

std::vector<pcl::PointXYZ> downloaded_normals;
gpu_normals.download(downloaded_normals);

// Convert downloaded data to pcl::Normal format
pcl::PointCloud<pcl::Normal>::Ptr normals(new pcl::PointCloud<pcl::Normal>);
normals->resize(downloaded_normals.size());
for (size_t i = 0; i < downloaded_normals.size(); ++i)
{
    normals->points[i].normal_x = downloaded_normals[i].x;
    normals->points[i].normal_y = downloaded_normals[i].y;
    normals->points[i].normal_z = downloaded_normals[i].z;

    normals->points[i].curvature = downloaded_normals[i].data[3]; // looks so wrong.. https://github.com/PointCloudLibrary/pcl/issues/2419

    LOG_INFO("normal curvature " << downloaded_normals[i].data[3]);

}
@jenadzitsiuk jenadzitsiuk added the status: triage Labels incomplete label Oct 16, 2023
@mvieth
Copy link
Member

mvieth commented Oct 17, 2023

@jenadzitsiuk Are the normals okay? Is only the curvature zero?

@mvieth mvieth added module: gpu and removed status: triage Labels incomplete labels Oct 17, 2023
@jenadzitsiuk
Copy link
Author

jenadzitsiuk commented Oct 17, 2023

@mvieth oh, you are right.. I did not even check. All normals are (1, 0, 0). Looks like the above code did not work for normals computation.
I double-checked that the input_cloud has valid data.

And the following code works fine on CPU for that point cloud:

   pcl::NormalEstimation<pcl::PointXYZ, pcl::Normal> normal_estimation;
   normal_estimation.setInputCloud(input_cloud);
   pcl::PointCloud<pcl::Normal>::Ptr cloud_normals(new pcl::PointCloud<pcl::Normal>);
   pcl::search::KdTree<pcl::PointXYZ>::Ptr tree(new pcl::search::KdTree<pcl::PointXYZ>);
   normal_estimation.setSearchMethod(tree);
   normal_estimation.setRadiusSearch(0.04);  // cm
   normal_estimation.compute(*cloud_normals);

Also, this code confirmed my points uploaded fine to GPU:

    gpu_cloud.upload(input_cloud->points);

    // double check if points have data
    pcl::PointCloud<pcl::PointXYZ>::Ptr downloaded_cloud(new pcl::PointCloud<pcl::PointXYZ>);
    gpu_cloud.download(downloaded_cloud->points);
    for (size_t i = 0; i < downloaded_cloud->points.size(); ++i) {
       pcl::PointXYZ point = downloaded_cloud->points[i];
       std::cout << "Point " << i << ": x = " << point.x << ", y = " << point.y << ", z = " << point.z << std::endl;
    }

Anything I'm missing on GPU normals computation code?

@mvieth
Copy link
Member

mvieth commented Oct 17, 2023

It seems to work if you use setSearchSurface, for example like this:

pcl::gpu::Feature::PointCloud gpu_surface;
gpu_surface.upload(input_cloud->points);
gpu_ne.setSearchSurface(gpu_surface);

This might be a bug, will have to check

@jenadzitsiuk
Copy link
Author

jenadzitsiuk commented Oct 17, 2023

if I do both setSearchSurface and setInputCloud, I still get (1, 0, 0), (-1, 0, 0) normals.
If I do only setSearchSurface I get error about radiusSearch config.

To eliminate the possibility of bad input, I did this, full code:

pcl::PointCloud<pcl::PointXYZ>::Ptr generateSphere(float radius, int numPoints)
{
    pcl::PointCloud<pcl::PointXYZ>::Ptr sphere_cloud(new pcl::PointCloud<pcl::PointXYZ>);
    
    for (int i = 0; i < numPoints; ++i)
    {
        float theta = 2.0f * M_PI * (float)i / (float)numPoints;
        float phi = M_PI * (float)i / (float)numPoints;

        pcl::PointXYZ point;
        point.x = radius * sin(phi) * cos(theta);
        point.y = radius * sin(phi) * sin(theta);
        point.z = radius * cos(phi);

        sphere_cloud->points.push_back(point);
    }

    sphere_cloud->width = sphere_cloud->points.size();
    sphere_cloud->height = 1;
    sphere_cloud->is_dense = true;

    return sphere_cloud;
}
pcl::PointCloud<pcl::Normal>::Ptr computeNormalsOnGPU(const pcl::PointCloud<pcl::PointXYZ>::Ptr& input_cloud_not_used)
{
   pcl::gpu::NormalEstimation gpu_ne;
   pcl::gpu::Feature::PointCloud gpu_cloud;
   pcl::gpu::Feature::Normals gpu_normals;

   pcl::PointCloud<pcl::PointXYZ>::Ptr sphere_cloud = generateSphere(0.5, 5000);

   gpu_cloud.upload(sphere_cloud->points);
   pcl::gpu::Feature::PointCloud gpu_surface;
   gpu_surface.upload(sphere_cloud->points);
   gpu_ne.setSearchSurface(gpu_surface);

   gpu_ne.setInputCloud(gpu_cloud);
   gpu_ne.setRadiusSearch(0.04f, 50);
   gpu_ne.compute(gpu_normals);

   std::vector<pcl::PointXYZ> downloaded_normals;
   gpu_normals.download(downloaded_normals);

   pcl::PointCloud<pcl::Normal>::Ptr normals(new pcl::PointCloud<pcl::Normal>);
   normals->resize(downloaded_normals.size());
   for (size_t i = 0; i < downloaded_normals.size(); ++i)
   {
      normals->points[i].normal_x = downloaded_normals[i].x;
      normals->points[i].normal_y = downloaded_normals[i].y;
      normals->points[i].normal_z = downloaded_normals[i].z;

      normals->points[i].curvature =
        downloaded_normals[i].data[3];  // https://github.com/PointCloudLibrary/pcl/issues/2419

      if (i < 1000)
      {
         LOG_INFO("normal x " << downloaded_normals[i].x << " y " << downloaded_normals[i].y << " z "
                              << downloaded_normals[i].z);
         LOG_INFO("normal curvature " << downloaded_normals[i].data[3]);
      }
   }
   LOG_INFO("GPU normals size " << normals->size());
   return normals;
}

@mvieth does the code above work fine for you?

Side question: what does this max_result param mean in setRadiusSearch?

@jenadzitsiuk jenadzitsiuk reopened this Oct 17, 2023
@mvieth
Copy link
Member

mvieth commented Oct 18, 2023

Okay, something very weird is going on with the gpu octree. Unfortunately, I am not very familiar with it, so I will need some time for further testing. The problem might be somehow related to which GPU is used.
If the search surface does not have more than 4864 points, then it seems to work for me. No idea why specifically that number.

Side question: what does this max_result param mean in setRadiusSearch?

That is the maximum number of close points in the results for each query point. It is necessary to specify this because of the memory layout.

Possibly related: #5731

@jenadzitsiuk
Copy link
Author

Thanks for looking into this! Could you convert it to a bug, in order to address it? Thanks!

@larshg
Copy link
Contributor

larshg commented Oct 18, 2023

I just tested on windows, with CUDA 11.7.
And seems to work fine, however its not really a sphere that you generate:
This is by CPU normal estimation:
billede
And by GPU:
billede

I can try later on ubuntu with 12.2, when I get the time.

Edit:
I just set the inputcloud - internally it sets inputsurface to inputcloud, if not set.

Full code:


#include <pcl/gpu/containers/device_array.h>
#include <pcl/gpu/features/features.hpp>
#include <pcl/features/normal_3d.h>
#include <pcl/point_cloud.h>
#include <pcl/point_types.h>

#include <pcl/visualization/pcl_visualizer.h>

#include <iostream>
#include <vector>

  pcl::PointCloud<pcl::PointXYZ>::Ptr
generateSphere(float radius, int numPoints)
{
  pcl::PointCloud<pcl::PointXYZ>::Ptr sphere_cloud(new pcl::PointCloud<pcl::PointXYZ>);

  for (int i = 0; i < numPoints; ++i) {
    float theta = 2.0f * M_PI * (float)i / (float)numPoints;
    float phi = M_PI * (float)i / (float)numPoints;

    pcl::PointXYZ point;
    point.x = radius * sin(phi) * cos(theta);
    point.y = radius * sin(phi) * sin(theta);
    point.z = radius * cos(phi);

    sphere_cloud->points.push_back(point);
  }

  sphere_cloud->width = sphere_cloud->points.size();
  sphere_cloud->height = 1;
  sphere_cloud->is_dense = true;

  return sphere_cloud;
}
pcl::PointCloud<pcl::Normal>::Ptr
computeNormalsOnGPU(const pcl::PointCloud<pcl::PointXYZ>::Ptr& sphere_cloud)
{
  pcl::PointCloud<pcl::Normal>::Ptr normals(new pcl::PointCloud<pcl::Normal>);
  pcl::NormalEstimation<pcl::PointXYZ,pcl::Normal> ne;
  ne.setInputCloud(sphere_cloud);
  ne.setRadiusSearch(0.04);
  ne.compute(*normals);

  return normals;
}

pcl::PointCloud<pcl::Normal>::Ptr
computeNormalsOnCPU(const pcl::PointCloud<pcl::PointXYZ>::Ptr& sphere_cloud)
{
  pcl::gpu::NormalEstimation gpu_ne;
  pcl::gpu::Feature::PointCloud gpu_cloud;
  pcl::gpu::Feature::Normals gpu_normals;

  gpu_cloud.upload(sphere_cloud->points);
  //pcl::gpu::Feature::PointCloud gpu_surface;
  //gpu_surface.upload(sphere_cloud->points);
  //gpu_ne.setSearchSurface(gpu_surface);

  gpu_ne.setInputCloud(gpu_cloud);
  gpu_ne.setRadiusSearch(0.04f, 50);
  gpu_ne.compute(gpu_normals);

  std::vector<pcl::PointXYZ> downloaded_normals;
  gpu_normals.download(downloaded_normals);

  pcl::PointCloud<pcl::Normal>::Ptr normals(new pcl::PointCloud<pcl::Normal>);
  normals->resize(downloaded_normals.size());
  for (size_t i = 0; i < downloaded_normals.size(); ++i) {
    normals->points[i].normal_x = downloaded_normals[i].x;
    normals->points[i].normal_y = downloaded_normals[i].y;
    normals->points[i].normal_z = downloaded_normals[i].z;

    normals->points[i].curvature =
        downloaded_normals[i]
            .data[3]; // https://github.com/PointCloudLibrary/pcl/issues/2419

    if (i < 1000) {
      PCL_INFO("normal x %f y %f z %f \n",downloaded_normals[i].x,
                               downloaded_normals[i].y,
                               downloaded_normals[i].z);
      PCL_INFO("normal curvature %f \n",downloaded_normals[i].data[3]);
    }
  }
  PCL_INFO("GPU normals size %u \n",normals->size());
  return normals;
}

int
main()
{
  pcl::PointCloud<pcl::PointXYZ>::Ptr sphere_cloud = generateSphere(0.5, 5000);

  auto cpuNormals = computeNormalsOnCPU(sphere_cloud);

  auto gpuNormals = computeNormalsOnGPU(sphere_cloud);

  for (int i = 0; i < cpuNormals->size(); i++) {
    std::cout << cpuNormals->points[i] << "\n";
    std::cout << gpuNormals->points[i] << "\n";
  }

  std::cout << cpuNormals->size() << "\n";
  std::cout << gpuNormals->size() << "\n";

  pcl::visualization::PCLVisualizer vis;
  vis.addCoordinateSystem();
  vis.addPointCloud(sphere_cloud, "just_points");
  //vis.addPointCloudNormals<pcl::PointXYZ,pcl::Normal>(sphere_cloud, cpuNormals,10,0.05f);
  vis.addPointCloudNormals<pcl::PointXYZ, pcl::Normal>(sphere_cloud, gpuNormals, 10, 0.05f);
  vis.spin();
}

@mvieth mvieth added the kind: bug Type of issue label Oct 19, 2023
@mvieth
Copy link
Member

mvieth commented Oct 19, 2023

@larshg Just a heads-up: computeNormalsOnGPU in your code is actually the CPU version, and vice-versa. Just wanted to mention it before it leads to confusion 😄

Here is code to generate a sphere:

  for(int i = 0; i < numPoints; ++i) {
    const float z = 2.0 * static_cast<float>(rand()) / RAND_MAX - 1.0; // Need z in [-1, 1]
    const float theta = 2.0f * M_PI * static_cast<float>(rand()) / RAND_MAX; // Need theta in [0, 2 pi]
    const float tmp = std::sqrt(1.0 - z*z);
    sphere_cloud->emplace_back(radius * tmp * std::cos(theta),
                               radius * tmp * std::sin(theta),
                               radius * z);
  }

On Ubuntu 23.04, Quadro RTX 3000, CUDA 11.8, Thrust 1.17.2, it works fine.
On Ubuntu 22.04, Quadro T2000, CUDA 12.2, Thrust 2.2.0, the error occurs.
This might also be related to #4700 , because when I remove the thrust::sort here, it works fine on 22.04 (otherwise the thrust_::sort_by_key used while building the octree returns garbage -- the 4864 points I mentioned in a previous comment may be a threshold when deciding between different sorting implementations)

@larshg
Copy link
Contributor

larshg commented Oct 20, 2023

Woops, thanks for the headsup 🤣
So could be a newly introduced bug or change in thrust?

@mvieth
Copy link
Member

mvieth commented Oct 20, 2023

Update:
On Ubuntu 22.04, Quadro T2000, CUDA 11.8, Thrust 1.15.1, succeeds
On Ubuntu 22.04, Quadro T2000, CUDA 12.0, Thrust 2.0.1, fails

So could be a newly introduced bug or change in thrust?

Yes, might be.
I also tried building PCL specifically for the cuda architecture I need (7.5, called 75 in CMake), but that didn't change anything.

@larshg
Copy link
Contributor

larshg commented Oct 20, 2023

I read in some of the other threads linked to, that if they compiled for a lower architecture, ie. 60, it still worked. But not all experienced the same.

@mvieth
Copy link
Member

mvieth commented Oct 23, 2023

Building for architecture 60 instead did not solve the problem for me.
So I think we can conclude that all Thrust 1.x.y versions work, and all Thrust 2.x.y versions fail (meaning that usually CUDA < 12.0 should work).
I think the problem only appears if the executable is linked to two libraries, where the first library uses thrust::sort (in PCL this is libpcl_gpu_features in uniq_inds.cu), and the second library uses thrust::sort_by_key (in PCL this is libpcl_gpu_octree in octree_builder.cu). Then some binary symbols seem to get mixed up during linking, some functions used inside thrust::sort_by_key throw Program hit cudaErrorMissingConfiguration (error 52) due to "__global__ function call is not configured" on CUDA API call to cudaGetLastError, and the whole sorting fails.
That would mean that this problem shouldn't appear on Windows, since MSVC uses hidden symbol visibility as default. @larshg If you have the time, it would be interesting if you could test this (CUDA >= 12.0)
I managed to create a Thrust-only minimal reproducible example: thrust_test.zip. I guess I should create an issue in the Thrust repo when I have time.
On the PCL side, #5779 could solve this in the long run. Removing thrust::sort in uniq_inds.cu is a workaround, but makes the computeUniqueIndices function and consequently pcl::gpu::FPFHEstimation return garbage.

@larshg
Copy link
Contributor

larshg commented Oct 23, 2023

I just updated CUDA to v12.3 and I can't reproduce any errors. Neither in the PCL code nor your thrust only example. CUDA 12.3 comes with Thrust 2.2.0.
Currently compiling with CUDA Architecture 6.1.
GPU:
GTX 1060

So it seems your assumptions is correct 👍

@mvieth
Copy link
Member

mvieth commented Nov 4, 2023

I created an issue for the Thrust developers, and apparently this was fixed very recently on their side. The fix was not yet included in the 2.2.0 release, but I would assume that it will be included in the next release.
So to summarise:

  • Using any CUDA 11 version works
  • Future CUDA versions will also work again, as soon as the Thrust fix is released and included in the CUDA toolkit
  • Removing thrust::sort in uniq_inds.cu is a workaround on the PCL side (for all CUDA and Thrust versions), if you don't need the computeUniqueIndices function or pcl::gpu::FPFHEstimation

@mvieth mvieth closed this as completed Nov 4, 2023
@mvieth mvieth changed the title Get normal curvature on GPU GPU normal estimation does not work with CUDA 12 Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants