-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split reads and writes for certain accumulation patterns #329
Conversation
779e3be
to
de31d96
Compare
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/329/index.html |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #329 +/- ##
==========================================
+ Coverage 95.36% 95.38% +0.01%
==========================================
Files 173 175 +2
Lines 36914 37039 +125
==========================================
+ Hits 35204 35330 +126
+ Misses 1710 1709 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
5eb2d3e
to
4974f6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is really neat (and many thanks to @lukasm91 for pointing this out in the first place!). However, the implementation appears a bit too complicated and costly and I'm suggesting an alternative in the comments. Please have a look if that makes sense.
|
||
.. code-block:: fortran | ||
|
||
!$loki split read-write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the "command" should maybe be one word, as this makes for a more intuitive grammar for the directives. I.e., in this case maybe use split-read-write
?
for region in FindNodes(ir.PragmaRegion).visit(routine.body): | ||
if is_loki_pragma(region.pragma, starts_with='split read-write'): | ||
|
||
# find assignments inside pragma region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this requires quite a lot of IR traversals for something that should be doable in a single pass. For small regions that shouldn't make a big difference, but if there are many then it might pile up. I'm counting at least 6 passes over the region body (in a quick eyeball-estimate).
How about a single tree walk that in-place updates the assignments that need splitting, and collects all assignments at the end?
Drafting here a bit of pseudo code to illustrate what I mean:
class SplitReadWriteWalk(Transformer):
def __init__(self, **kwargs):
self.write_assignments = []
kwargs['inplace'] = True
super().__init__(**kwargs)
def visit_Loop(self, o, **kwargs):
# This collects the relevant loop dimensions in kwargs['dim_nest'] during the
# tree walk
dim = [d for d in self.dimensions if d.index == o.variable]
dim_nest = kwargs.pop('dim_nest', [])
return super().visit_Node(o, dim_nest=dim_nest + dim, **kwargs)
def visit_Assignment(self, o, **kwargs):
if isinstance(o.lhs, Array) and a.lhs.name in o.rhs:
parent_loop_dims = kwargs.pop('dim_nest', [])
# <do the logic for shape etc here>
tmp_var = <...>
self.write_assignments += [ir.Assignment(lhs=o.lhs, rhs=tmp_var)
o._update(lhs=tmp_var)
return o
for region in ...:
transformer = SplitReadWriteWalk()
region.body = transformer.visit(region.body)
if transformer.write_assignments:
region.body.append(transformer.write_assignments)
Not sure if I'm overlooking something, but it feels to me like that should be sufficient and much more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tree-walk is definitely the way to do it! The use of FindScopes
to discover a parent loop nest felt a bit clunky to me too but the idea of implementing this as a visitor never occured to me. Many thanks for the suggestion 🙏
57068e3
to
f4f1496
Compare
Many thanks again @reuterbal for the great tip! I've now implemented the @mlange05 could you also please review the PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great utility and very neatly done with the single-pass Transformer
. I've only left a few small nit-pick comments about docstrings (and an "out of interest" question), but otherwise this is ready, I think. Well done, and many thanks to both, @awnawab and @reuterbal
""" | ||
A :any:`Transformer` class to traverse the IR, in-place replace read-write | ||
assignments with reads, and build a transformer map for the corresponding writes. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: Could we please add the constructor argument to the docstring? This ones takes a lot of arguments, and their meaning is not immediately obvious without context.
item_filter = (ProcedureItem,) | ||
|
||
def __init__(self, dimensions): | ||
self.dimensions = as_tuple(dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - could we please add the dimension
arg to the docstring? It's not immediately obvious why this is needed, and what dimension is required here from users.
@@ -186,7 +186,9 @@ def convert( | |||
mode = mode.replace('-', '_') # Sanitize mode string | |||
|
|||
# Write out all modified source files into the build directory | |||
file_write_trafo = FileWriteTransformation(builddir=build, mode=mode) | |||
file_write_trafo = scheduler.config.transformations.get('FileWriteTransformation', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[no action] Out of interest, what corner case makes this necessary? Or is this just a piggy-backed clean-up step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ecWAM I need to enable the include_module_var_imports
option in the FileWriteTransformation
. I'm assuming global_var_offload
will eventually be removed as an argument from convert
as we further simplify loki_transform.py
, so I thought I might as well make the FileWriteTransformation
also configurable.
Many thanks @mlange05 for the review and the feedback 😄 The constructor arguments were indeed unclear, and I've now added documentation for them. |
a391277
to
e4180ae
Compare
e4180ae
to
ad1d51a
Compare
Thanks, this looks great now. Going in... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good now, going in.
Lukas M. demonstrated that for accumulation patterns of the type:
a compiler cannot rule out the possibility that
n1
andn2
do not in fact alias the same location. In such a case, it is unable to run the loads and stores out-of-order.This PR contributes a pragma assisted transformation to split the above into separate reads and writes, thereby removing the dependency between subsequent loads and stores and allowing the compiler to optimise more effectively.