-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cudadev] Macro based SoA (followup of #PR211) #287
base: master
Are you sure you want to change the base?
[cudadev] Macro based SoA (followup of #PR211) #287
Conversation
… trivially copyable. The requirement for trivially constructible prevented creating structures on the host side and then memory copying it to the device side.
The SoA store divides a user pre-allocated buffer in adjacent columns of values. The columns are bytes aligned to a setable alignment and their length is determined at run time. The SoA views allow grouping columns from multiple stores in a logical entity. For example when some columns are used on device-only and otheres transmitted to host. The two groups are on two different stores, but joined together in a logic view. Views can also provide access to a subset of a store.
…umns. The Eigen columns, present in the stores are explicitly not supported (assertion).
…e to const view. Create a const view type, and renamed the access method return type and name.
…SOAStore. The rename reflect the fact that the buffers are owned by the class.
The templation allows defining compile time alignment choice, alignement enforcement, and alignment hinting for the compiler.
The SoA macros now generate templates, adapted code and added using clauses to select the proper variant. Introduced separate, specifically const constViews as const views can be worked around by copying, and const correctness is insufficient. Applied the previous to the SiPixelROCsStatusAndMappingWrapper product. Moved alignment value from runtime to compile time parameter. Code simplifications.
Added views where necessary to replace store accesses. Renamed variables to "layout" in anticipation of the renaming of the stores concept to layout.
…aming from store to layout.
…sized arrays. The struct is passed around and mapped into a view only for use.
Added symbolic names for cache line sizes/default alignment.
Added more notes in the SoA.md plans section.
Views can now only be defined from layouts and view of the same size (if multiple).
62a86a2
to
a2643bf
Compare
@fwyzard, @makortel, this version does not contain all the intended features, but could be considered for merging into master. The missing features I intend to cover later are:
I tried merging with the current master and this compiles and runs fine. |
A silly thing: could you run it all through |
GENERATE_SOA_CONST_VIEW(DeviceConstViewTemplate, | ||
SOA_VIEW_LAYOUT_LIST(SOA_VIEW_LAYOUT(DeviceView, deviceView)), | ||
SOA_VIEW_VALUE_LIST( | ||
SOA_VIEW_VALUE(deviceView, moduleStart), // index of the first pixel of each module | ||
SOA_VIEW_VALUE(deviceView, clusInModule), // number of clusters found in each module | ||
SOA_VIEW_VALUE(deviceView, moduleId), // module id of each module | ||
|
||
// originally from rechits | ||
SOA_VIEW_VALUE(deviceView, clusModuleStart) // index of the first cluster of each module | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this "derived" from DeviceView
and not from DeviceLayout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "idea" was that the class has its mutable view internally for initialization, and then down grades it to a const view for usage by the user in member function view()
. Both path are possible, but we need to use the type we will pass to the constructor. The "hard work" is in the layout constructor, and in both cases, we're copying pointers (and the compiler will optimize this copying things out).
That makes the name of the SOA_VIEW_LAYOUT_LIST
macro too restrictive, maybe. What about SOA_VIEW_DATA_SOURCE_LIST
(and accordingly for SOA_VIEW_LAYOUT
)?
723e70e
to
a839a44
Compare
a839a44
to
24926af
Compare
Re-enabled Eigen parts of the test.
Removed unused macro.
Eigen/Core should be included before the SoA headers to enable support.
…layout This also changes the function from a static to a non static one. A sample output is: SoA1LayoutTemplate(10000 elements, byte alignement= 128, @0x7f7a8d525080): sizeof(SoA1LayoutTemplate): 168 Column x at offset 0 has size 80000 and padding 0 Column y at offset 80000 has size 80000 and padding 0 Column z at offset 160000 has size 80000 and padding 0 Column sum at offset 240000 has size 80000 and padding 0 Column prod at offset 320000 has size 80000 and padding 0 Eigen value a at offset 400000 has dimension (3 x 1) and per column size 80000 and padding 0 Eigen value b at offset 640000 has dimension (3 x 1) and per column size 80000 and padding 0 Eigen value r at offset 880000 has dimension (3 x 1) and per column size 80000 and padding 0 Column color at offset 1120000 has size 20000 and padding 96 Column value at offset 1140096 has size 40000 and padding 64 Column py at offset 1180160 has size 80000 and padding 0 Column count at offset 1260160 has size 40000 and padding 64 Column anotherCount at offset 1300224 has size 40000 and padding 64 Scalar description at offset 1340288 has size 8 and padding 120 Scalar someNumber at offset 1340416 has size 4 and padding 124 Final offset = 1340544 computeDataSize(...): 1340544
The test validates multiple multiple SoA use cases, host and device side. It validates buffer sharing for multiple layouts. An an Alpaka native sub-buffer operation (device side memset) was used. This type of operations can be used in the future to copy partial columns. This version is a port of cms-patatrack#287 to Alpaka.
|
||
#include <cuda_runtime.h> | ||
|
||
class SiPixelClustersCUDA { | ||
public: | ||
SiPixelClustersCUDA() = default; | ||
GENERATE_SOA_LAYOUT(DeviceLayoutTemplate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be simplified using GENERATE_SOA_LAYOUT_VIEW_AND_CONST_VIEW
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this code predates the introduction of this utility macro.
byteAlignment / sizeof(CPP_TYPE::Scalar);) \ | ||
if constexpr (alignmentEnforcement == AlignmentEnforcement::Enforced) \ | ||
if (reinterpret_cast<intptr_t>(BOOST_PP_CAT(NAME, _)) % byteAlignment) \ | ||
throw std::out_of_range("In layout constructor: misaligned column: " #NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why out_of_range
for an alignment error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find a perfect match in std::exceptions
and felt this could be the closest (maybe std::length_error
? ). Otherwise, we could just create our own exceptions, but we mostly rely on std::
ones, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with a generic std::runtime_error
.
curMem += (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * byteAlignment * \ | ||
CPP_TYPE::RowsAtCompileTime * CPP_TYPE::ColsAtCompileTime; \ | ||
BOOST_PP_CAT(NAME, Stride_) = (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * \ | ||
byteAlignment / sizeof(CPP_TYPE::Scalar);) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if byteAlignment
is not a multiple of sizeof(CPP_TYPE::Scalar)
?
what happens if byteAlignment
is less than sizeof(CPP_TYPE::Scalar)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, bad things might happen. Usual values are powers of 2 and a typical byte alignment is bigger than a built-in scalar, but this is not guaranteed. I think asserting (compile-time) that both sizeof(CPP_TYPE::Scalar)
and byteAlignment
are powers of 2 and aligning the columns to the biggest of the two should forbid any unwanted scenario while remaining a reasonable constraint on the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment must always be a power of 2.
The size of a type can be anything, but c++ (almost) mandates that it is a multiple of its own alignment.
But "scalar" in this context does not necessarily mean a primitive type - it could also be an aggregate like class or an array (e.g. char name[1024]
).
Then the stride becomes 0, which I don't think is the intended behaviour ? (no it doesn't, it becomes nElements_
)
If sizeof(CPP_TYPE::Scalar)
is greater that the alignment, the stride should just be nElements_ * sizeof(CPP_TYPE::Scalar)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, shouldn't the stride simply be (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * byteAlignment
, without the / sizeof(CPP_TYPE::Scalar)
part ?
Here are some ideas after reading most of the PR and the discussions of the past couple of days. My understanding is that the basic SoA-based structure would look similar to // define the SoALayout and SoAView types
GENERATE_SOA_LAYOUT_VIEW_AND_CONST_VIEW(
SoALayoutTemplate,
SoAViewTemplate,
SoAConstViewTemplate,
...
);
using SoALayout = SoALayoutTemplate<>;
using SoAView = SoAViewTemplate<>;
class SoA {
public:
SoA(size_t size) :
storage_{std::make_unique_ptr<std::byte[]>(SoALayout::computeDataSize(size))},
layout_{storage_.get(), size)},
view_{layout_}
{}
SoAView view() { return view_; } // return by value or by reference ?
SoAConstView view() const { return const_view(); }
SoAConstView const_view() const { return SoAConstView(layout_); } // can we create the const view from the view instead ?
private:
std::unique_ptr<std::byte[]> storage_; // use a different storage type for GPUs, etc.
SoALayout layout_;
SoAView view_;
}; Then, I would have suggestions to simplify the usage. Make the
|
I attempted to achieve this, but failed in the current 2 classes model as the This schema can be achieved nevertheless if we create a third type of generated class ( |
This PR follows up and replaces #211. Its goal is to implement #281.
The current status of SoA (with future plans) is described in SoA.md