Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Multiplication example using MdSpan #2317

Merged

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Jul 18, 2024

This matrix multiplication example is using mdspan data structure for passing the data to/from kernel and for filling the data in the host conveniently.

By using mdspan passing data size and pitch values to the kernel for each 2d array is not needed. Filling the host data is not difficult as well.

If these workshop examples are found to be useful for the users of Alpaka in general; this PR can be merged as well.

  • must be squashed before merge

example/matrixAddWithMdspan/CMakeLists.txt Outdated Show resolved Hide resolved
example/matrixAddWithMdspan/src/matrixAddMdSpan.cpp Outdated Show resolved Hide resolved
@@ -0,0 +1,53 @@
#
# Copyright 2023 Erik Zenker, Benjamin Worpitz, Jan Stephan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be modiefied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok changed to 2024 and SPDX-License-Identifier: MPL-2.0

example/matrixMulWithMdspan/src/matrixMulMdSpan.cpp Outdated Show resolved Hide resolved
SimeonEhrig
SimeonEhrig previously approved these changes Jul 18, 2024
// Needed for running example for all backends available; one by one
#include <alpaka/example/ExecuteForEachAccTag.hpp>

#include <experimental/mdspan>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be wrapped in a check on the C++ standard, so if we are using c++23 we take <mdspan> and otherise if we are using the version that comes with alpaka we take "experimental/mdspan" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree because the version shipped with alpaka is different to the standard implementation in an important point. The alpaka version uses the operator() [1] to access elements. The standard implementation uses the operator[] because since C++ 23 more than one argument is allowed. Therefore we would need to add a preporcessor if-else or wrapper function each time the memory is accessed.

[1] The experimental implementation provides also the operator[] if we use C++ 23. But we also support C++17 and 20.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, I see.

On the other hand, alpaka (the library) does not use the mdspan objects, it only returns them.

I think that, if somebody uses alpaka in a c++23 environment, they should get the official mdspan, not the experimental version (like we do for boost::atomic_ref vs std::atomic_ref).

Can we #if __cplusplus the examples to use () for c++17/c++20, and [] for c++23 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think this can be a little bit more work.

The implementation we use creates a defines MDSPAN_USE_BRACKET_OPERATOR. Than it looks like this: https://github.com/SimeonEhrig/MyCpp/blob/28d20632657697533184f95c883739718f8397ad/features/23/mdspan/hello_mspan/main.cpp#L44
I expect, that the define will be not created by the standard library and we need to create it by our self.

Another point is, that our implementation annotates the functions with __host__ and __device__ to make it compatible to CUDA and HIP. At the moment, Nvidia does not provide a stl implementation and I'm not sure, if AMD does it: https://en.cppreference.com/w/cpp/compiler_support#cpp23
But Clang provides already an implementation. Therefore, in the worst case wit detect the stl implementation of Clang, which is maybe not annotated and tries to compile it with the CUDA or HIP backend. It this case, we need a workaround selection mechanism, if the mdspan implementation is not annotated.

Long story short, I agree that we should be forward compatible but I think we need to solve this in an extra PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long story short, I agree that we should be forward compatible but I think we need to solve this in an extra PR.

OK for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i will open a related issue not to forget, or @SimeonEhrig could you open since you know the details...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened it already: #2319

//! \param C Output matrix where the result of A * B will be stored
//! \param K The shared dimension between A and B
template<typename TAcc, typename MdSpan>
ALPAKA_FN_ACC void operator()(TAcc const& acc, MdSpan A, MdSpan B, MdSpan C, Idx K) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't K be derived from A and B, with a check that the dimensions match ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks done.

Comment on lines 132 to 162
// Wait for the kernel to finish
alpaka::wait(queue);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep this and added a comment saying "if the queue is blocking this is not needed."

Copy link
Contributor

@fwyzard fwyzard Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this wait is never needed, also is the queue is not blocking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait below (after the memcpy) is indeed needed, and it should always be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the last wait (after the memcpy), if the queue is blocking the host is blocked by memcpy hence no need to wait after memcpy? (and the queue is blocking-queue using Queue = alpaka::Queue<Acc, alpaka::Blocking>;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

If a queue is non-blocking, it is necessary to do alpaka::wait(queue) after the memcpy and before accessing the data on the host.

If a queue is blocking, it is not necessary to do alpaka::wait(queue).
However, calling alpaka::wait(queue) should be cheap, though not completely free.
So maybe we can keep it anyway ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I used wait only after memcpy statements and added no comments... Although the queue is blocking, as you said it is not expensive.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 18, 2024

If these workshop examples are found to be useful for the users of Alpaka in general; this PR can be merged as well.

I like the idea of having examples with mdspan, though I would prefer to see if the types can be used more explicitly.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Jul 18, 2024

If these workshop examples are found to be useful for the users of Alpaka in general; this PR can be merged as well.

I like the idea of having examples with mdspan, though I would prefer to see if the types can be used more explicitly.

By "Explicitly" you mean while passing to the kernel and instead of auto in the line auto mdHostA = alpaka::experimental::getMdSpan(bufHostA); using mdspan ? I will try, but getMdSpan does a lot of work it seems :

 ALPAKA_FN_HOST static auto getMdSpan(TView& view)
                {
                    constexpr auto dim = Dim<TView>::value;
                    using Element = Elem<TView>;
                    auto extents = detail::makeExtents(view, std::make_index_sequence<dim>{});
                    auto* ptr = reinterpret_cast<std::byte*>(getPtrNative(view));
                    auto const strides = toArray(getPitchesInBytes(view));
                    layout_stride::mapping<decltype(extents)> m{extents, strides};
                    return std::experimental::mdspan<Element, decltype(extents), layout_stride, detail::ByteIndexedAccessor<Element>>{
                        ptr,
                        m};
                }

@fwyzard
Copy link
Contributor

fwyzard commented Jul 18, 2024

By "Explicitly" you mean while passing to the kernel and instead of auto in the line auto mdHostA = alpaka::experimental::getMdSpan(bufHostA); using mdspan ?

No, I only meant that instead of

template<typename TAcc, typename MdSpan>
ALPAKA_FN_ACC void operator()(TAcc const& acc, MdSpan A, MdSpan B, MdSpan C) const { ... }

I would prefer something like

template<typename TAcc,
    typename T,
    typename Extents,
    typename LayoutPolicy,
    typename AccessorPolicy>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> A,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> B,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> C) const
{ ... }

However, I acknowledge that this is a lot more verbose, and maybe easier to write once we enable c++20 and concepts.


//! Specialization for mdspan with four template arguments
template<typename ElementType, typename Extents, typename LayoutPolicy, typename AccessorPolicy>
struct isMdspan<std::experimental::mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy>> : std::true_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename this to IsMdspan ?

and add

template <typename T>
inline constexpr bool is_mdspan = IsMdspan<T>::value;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Jul 19, 2024

By "Explicitly" you mean while passing to the kernel and instead of auto in the line auto mdHostA = alpaka::experimental::getMdSpan(bufHostA); using mdspan ?

No, I only meant that instead of

template<typename TAcc, typename MdSpan>
ALPAKA_FN_ACC void operator()(TAcc const& acc, MdSpan A, MdSpan B, MdSpan C) const { ... }

I would prefer something like

template<typename TAcc,
    typename T,
    typename Extents,
    typename LayoutPolicy,
    typename AccessorPolicy>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> A,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> B,
    mdspan<T, Extents, LayoutPolicy, AccessorPolicy> C) const
{ ... }

However, I acknowledge that this is a lot more verbose, and maybe easier to write once we enable c++20 and concepts.

Your suggestion is good and much expressive, but for simplicity i pass TMdSpan then check whether it is exactly std::experimental::mdspan.

template<typename TAcc, typename TMdSpan>
   ALPAKA_FN_ACC void operator()(TAcc const& acc, TMdSpan A, TMdSpan B, TMdSpan C) const
   {
       // compile time check
       static_assert(isMdspan<TMdSpan>::value, "The type TMdSpan should be an std mdspan");

       auto const i = alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[0];
       auto const j = alpaka::getIdx<alpaka::Grid, alpaka::Threads>(acc)[1];

       if(i < A.extent(0) && j < A.extent(1))
       {
           C(i, j) = A(i, j) + B(i, j);
       }
   }

@fwyzard
Copy link
Contributor

fwyzard commented Jul 19, 2024

for simplicity i pass TMdSpan then check whether it is exactly std::experimental::mdspan.

Good idea, that's a good compromise.

@mehmetyusufoglu mehmetyusufoglu force-pushed the matrixExamplesMdSpan branch 2 times, most recently from cdda967 to e80097e Compare July 23, 2024 13:15
Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase to use getValidWorkDivKErnel

auto mdDevC = alpaka::experimental::getMdSpan(bufDevC);

// Let alpaka calculate good block and grid sizes given our full problem extent.
auto const workDiv = alpaka::getValidWorkDiv<Acc>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use getValidWorkDivKernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased, done, thanks.

@mehmetyusufoglu mehmetyusufoglu force-pushed the matrixExamplesMdSpan branch 3 times, most recently from d1d1c95 to aa6783e Compare July 25, 2024 10:19
@psychocoderHPC psychocoderHPC added this to the 1.2.0 milestone Jul 25, 2024
@psychocoderHPC
Copy link
Member

offline discussed with @mehmetyusufoglu : we will remove the add example because it is 99.9% equal to the multiplication and IMO the code duplication is not required.

@mehmetyusufoglu mehmetyusufoglu changed the title [For workshop] Add 2 MdSpan examples: Matrix Addition and Multiplication [For workshop] Add a MdSpan example: Matrix Multiplication Jul 25, 2024
@mehmetyusufoglu mehmetyusufoglu changed the title [For workshop] Add a MdSpan example: Matrix Multiplication Matrix Multiplication example using MdSpan Jul 25, 2024
@mehmetyusufoglu mehmetyusufoglu force-pushed the matrixExamplesMdSpan branch 2 times, most recently from 693d146 to a3d1443 Compare July 25, 2024 12:46

std::cout << "Multiplication of matrices of size " << M << "x" << K << " and " << K << "x" << N << " using mdspan "
<< (success ? "succeeded" : "failed") << "!" << std::endl;
return EXIT_SUCCESS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if success is false you should end with EXIT_FAILURE

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added. Thanks a lot. (Tested by ctest for both true-positive, true-negative cases. ) Other examples like convolution1d,2d,2dwithMdSpan are correct in this sense.

@psychocoderHPC psychocoderHPC merged commit d1cc2e0 into alpaka-group:develop Jul 26, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants