ExtraBuffers: revamping of the idea #802

albestro · 2023-02-17T08:28:19Z

This is a revamping of a concept we already tested in #386. Due to the important changes happened to the codebase in the meanwhile, together with the fact that it was just a feasibility test, we opted for the easier path of just re-implementing the idea.

ExtraBuffers is a utility that comes handy for the "reduce" pattern, i.e. multiple tasks wants to write on the same memory. ExtraBuffers gives you a set of "buffer" tiles that tasks can use to write in parallel, and at the end the reduction is done on request.

Note
Differently from the original one, this PR went for a simpler implementation, where "partial computations" are stored exclusively on ExtraBuffers tiles, without using the tile where the result will be reduced afterwards.

In this PR there is also a first usage example, in W2 computation inside ReductionToBand.

TODO

Evaluate if reduction output tile has to be resetted or not
Fix and improve the test
Fix GPU

(used by both local and dist)

not yet as continuation

albestro · 2023-02-22T17:53:39Z

include/dlaf/eigensolver/reduction_to_band/impl.h

+  ex::start_detached(tile::set0(dlaf::internal::Policy<B>(), w2.readwrite_sender(LocalTileIndex(0, 0))));
+  ex::start_detached(buffers.reduce(w2.readwrite_sender(LocalTileIndex(0, 0))));


Probably better to create a continuation with a single task?

albestro · 2023-02-22T17:57:10Z

include/dlaf/matrix/extra_buffers.h

+  [[nodiscard]] auto reduce(TileSender tile) {
+    namespace ex = pika::execution::experimental;
+
+    std::vector<ex::any_sender<pika::shared_future<matrix::Tile<const T, D>>>> buffers;


The type is so cumbersome, probably @msimberg is aware of this and is going to say that it is something we will manage when we will make the tile sender mechanism pika::future free.

I don't know if there is any workaround available, but AFAIK currently our Unwrapping facilities does not unwrap vector, so we just get it sent as it is even using our dlaf::internal::transform.

I see your complaint, but typedef exists 😉 I see that complex eventually being a helper typedef inside e.g. Matrix, or at least in the dlaf::matrix namespace and then you'll be able to write std::vector<ReadOnlyTileSender> or something like that. Would that partially or completely take care of your concerns here? This is what I'm planning on doing in #766 (because currently it's also dealing with long unwieldy types).

albestro · 2023-02-22T18:00:01Z

include/dlaf/matrix/extra_buffers.h

+    return ex::when_all(std::move(tile), ex::when_all_vector(std::move(buffers))) |
+           dlaf::internal::transform(dlaf::internal::Policy<DefaultBackend_v<D>>(),
+                                     [](const matrix::Tile<T, D>& tile,
+                                        const std::vector<pika::shared_future<matrix::Tile<const T, D>>>&
+                                            buffers,
+                                        auto&&... ts) {
+                                       for (const auto& buffer : buffers) {
+                                         if constexpr (D == Device::CPU) {
+                                           static_assert(sizeof...(ts) == 0,
+                                                         "Parameter pack should be empty for MC.");
+                                           dlaf::tile::internal::add(T(1), buffer.get(), tile);
+                                         }
+#ifdef DLAF_WITH_GPU
+                                         else if constexpr (D == Device::GPU) {
+                                           dlaf::tile::internal::add(T(1), buffer.get(), tile, ts...);
+                                         }
+#endif
+                                         else {
+                                           DLAF_STATIC_UNIMPLEMENTED(T);
+                                         }
+                                       }
+                                     });
+  }


This is really not nice...15 lines for saying

for (const auto& buffer : buffers) dlaf::tile::internal::add(T(1), buffer.get(), <optional whip::stream_t>);

I know we can go with something like

dlaf::tile::internal::add(T(1), buffer.get(), ts...);

but we will loose a bit of static checking...

do we have any better alternative?

albestro added this to the Optimizations milestone Feb 17, 2023

albestro self-assigned this Feb 17, 2023

albestro force-pushed the alby/new-extra-buffers branch from 343d087 to 4e52fcf Compare February 22, 2023 17:46

albestro added 2 commits February 22, 2023 18:51

basic implementation

e049ff7

first usage of extra buffers for W2 in red2band

1c06b55

(used by both local and dist)

albestro force-pushed the alby/new-extra-buffers branch from 4e52fcf to f2d49df Compare February 22, 2023 17:51

fix gpu and do not implicitly set0 the result tile

d9da590

not yet as continuation

albestro force-pushed the alby/new-extra-buffers branch from f2d49df to d9da590 Compare February 22, 2023 17:52

albestro commented Feb 22, 2023

View reviewed changes

simplify code for custom reduce kernel

0d18332

albestro added the Priority:on hold label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtraBuffers: revamping of the idea #802

ExtraBuffers: revamping of the idea #802

albestro commented Feb 17, 2023 •

edited

Loading

albestro Feb 22, 2023

albestro Feb 22, 2023

msimberg Feb 23, 2023

albestro Feb 22, 2023

		ex::start_detached(tile::set0(dlaf::internal::Policy<B>(), w2.readwrite_sender(LocalTileIndex(0, 0))));
		ex::start_detached(buffers.reduce(w2.readwrite_sender(LocalTileIndex(0, 0))));

ExtraBuffers: revamping of the idea #802

Are you sure you want to change the base?

ExtraBuffers: revamping of the idea #802

Conversation

albestro commented Feb 17, 2023 • edited Loading

albestro Feb 22, 2023

Choose a reason for hiding this comment

albestro Feb 22, 2023

Choose a reason for hiding this comment

msimberg Feb 23, 2023

Choose a reason for hiding this comment

albestro Feb 22, 2023

Choose a reason for hiding this comment

albestro commented Feb 17, 2023 •

edited

Loading