Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem & questions #55

Open
Da1sypetals opened this issue Jul 27, 2024 · 6 comments
Open

Problem & questions #55

Da1sypetals opened this issue Jul 27, 2024 · 6 comments

Comments

@Da1sypetals
Copy link

Da1sypetals commented Jul 27, 2024

  1. This code ran into compilation errors when I tried to use spmv (while solve works well):
#include <iostream>
#include <Eigen/Eigen>
#include <muda/muda.h>
#include <muda/ext/linear_system.h>

using namespace muda;


void run_tests() {

    int N = 3;

    // define a N*N matrix A and b
    DeviceTripletMatrix<float, 1> A;
    DeviceDenseVector<float> b(N);
    DeviceDenseVector<float> x(N);
    DeviceDenseVector<float> y(N);

    // reserve for triplets
    A.resize_triplets(N * N);
    A.reshape(N, N);

    std::cout << "sizes:\n";
    std::cout << A.row_indices().size()
              << "  " << A.col_indices().size()
              << "  " << A.values().size() << std::endl;

    ParallelFor(256).apply(N * N, [row_idx = A.row_indices().viewer(),
            col_idx = A.col_indices().viewer(),
            val = A.values().viewer(),
            b = b.viewer(), N]__device__(int i)mutable {

        row_idx(i) = i % N;
        col_idx(i) = i / N;
        val(i) = static_cast<float>(i * i);

        if (i < N) {
            b(i) = static_cast<float>(i);
        }

    });

    LinearSystemContext ctx;

    DeviceCOOMatrix<float> A_coo;
    ctx.convert(A, A_coo);
    DeviceCSRMatrix<float> A_csr;
    ctx.convert(A_coo, A_csr);


    ctx.solve(x.view(), A_csr.cview(), b.cview());

    std::cout << "solve done\n";
    ctx.spmv(A_csr.cview(), x.cview(), y.view());

    Eigen::VectorXf hx(N);
    x.copy_to(hx);
    for (int i = 0; i < N; i++) {
        std::cout << hx.coeff(i) << "  ";
    }
    std::cout << std::endl;

}

int main() {
    run_tests();
    return 0;
}

terminal output:

/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(15): error: argument of type "const cusparseDnVecDescr *" is incompatible with parameter of type "cusparseDnVecDescr_t"
          detected during:
            instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr *, const T &, cusparseDnVecDescr_t) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here
            instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView<T>, muda::CDenseVectorView<T>, const T &, muda::DenseVectorView<T> &) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here
            instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView<T>, muda::CDenseVectorView<T>, muda::DenseVectorView<T>) [with T=float]"
/mnt/a/dev/muda/muda-template/src/main.cu(101): here

/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(20): error: argument of type "const cusparseDnVecDescr *" is incompatible with parameter of type "cusparseDnVecDescr_t"
          detected during:
            instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr *, const T &, cusparseDnVecDescr_t) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here
            instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView<T>, muda::CDenseVectorView<T>, const T &, muda::DenseVectorView<T> &) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here
            instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView<T>, muda::CDenseVectorView<T>, muda::DenseVectorView<T>) [with T=float]"
/mnt/a/dev/muda/muda-template/src/main.cu(101): here

2 errors detected in the compilation of "/mnt/a/dev/muda/muda-template/src/main.cu".
gmake[2]: *** [CMakeFiles/hello_muda.dir/build.make:92: CMakeFiles/hello_muda.dir/src/main.cu.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:286: CMakeFiles/hello_muda.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2

I wonder if there is any problem in mu Muda code or the problem is caused somewhere else.


  1. What is typically used (best practice) when it comes to small vector linear algebra on device (like float3, float3x3 and dot, outer product, etc.)?
    Thanks a lot in advance!
@MuGdxy
Copy link
Owner

MuGdxy commented Jul 27, 2024

  1. Cuda change it's API, in 11.4, maybe you need to update to >=11.6
  2. I just use Eigen.

@Da1sypetals
Copy link
Author

Da1sypetals commented Jul 27, 2024

I switched to cuda 12.4 and now runtime error occured when converting triplet sparse matrix to COO format.
code:

int N = 3;

DeviceTripletMatrix<float, 1> A;
DeviceDenseVector<float> b(N);
DeviceDenseVector<float> x(N);
DeviceDenseVector<float> y(N);

A.resize_triplets(N * N);
A.reshape(N, N);

std::cout << "sizes:\n";
std::cout << A.row_indices().size()
          << "  " << A.col_indices().size()
          << "  " << A.values().size() << std::endl;

ParallelFor(256).apply(N * N, [row_idx = A.row_indices().viewer(),
        col_idx = A.col_indices().viewer(),
        val = A.values().viewer(),
        b = b.viewer(), N]__device__(int i)mutable {

    row_idx(i) = i % N;
    col_idx(i) = i / N;
    val(i) = static_cast<float>(i * i);

    if (i < N) {
        b(i) = static_cast<float>(i);
    }

});

std::cout << "Filled A and b\n";

LinearSystemContext ctx;

std::cout << "Context created\n";

DeviceCOOMatrix<float> A_coo;
ctx.convert(A, A_coo); // cuda error triggers here

terminal output:

CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion) "cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)"
terminate called after throwing an instance of 'muda::cuda_error<cudaError>'
  what():  CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion)cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)
[1]    29733 IOT instruction  ./hello_muda

Is specific version of cuda required? Could you please list your configurations or a make a list of version requirements?

@MuGdxy
Copy link
Owner

MuGdxy commented Jul 27, 2024

https://github.com/MuGdxy/muda-app/tree/linear_system

I test your code in debug and release mode, in the following platform:

  • Windows

    • MSVC 19.39.33522.0/CUDA 12.3.52
  • Linux

    • GNU 11.4.0/CUDA12.4.131

but not get any error.

@Da1sypetals
Copy link
Author

Da1sypetals commented Jul 27, 2024

I cloned the repo you provided but got the same runtime error 😭
on WSL, using GNU 11.4.0 and cuda 12.4.99, cmake 3.29.3
cmake configure output:

-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.4.99
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.4/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.4/targets/x86_64-linux/include (found version "12.4.99")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (20.1s)
-- Generating done (0.1s)
-- Build files have been written to: /mnt/a/dev/muda/muda-app/build

Also failed with the same runtime error on a remote archlinux machine with GNU 13.2.0 and cuda 12.5.82

all commands I executed:

git clone [email protected]:MuGdxy/muda-app.git
cd muda-app
git submodule update --init
git checkout linear_system
mkdir build && cd build
cmake -S .. -B . -DCMAKE_BUILD_TYPE=Debug
cmake --build . --config Debug -j8

cmake configure output on archlinux machine:

-- The CXX compiler identification is GNU 13.2.0
-- The CUDA compiler identification is NVIDIA 12.5.82
...
-- Found CUDAToolkit: /opt/cuda-12.5/targets/x86_64-linux/include (found version "12.5.82")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (0.9s)
-- Generating done (0.0s)

@Da1sypetals
Copy link
Author

Da1sypetals commented Jul 27, 2024

Trying the next option: container

@Da1sypetals
Copy link
Author

Finally problem was resolved with docker. A starter project with Muda, SFML and Eigen, working with sparse matrix storage and solving runs on the container without runtime errors. Currently the image is built with docker commit and later I will create a Dockerfile for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants