Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce 'flatten_arrays()' (to overcome pointer hack) #199

Merged
merged 5 commits into from
Dec 20, 2023

Conversation

MichaelSt98
Copy link
Collaborator

Usable and tested as drop in replacement for the CLOUDSC Loki C transpilation

  • do we want to keep the original pointer hack as another option or just replace it?

Convert e.g.,

  INTEGER, INTENT(INOUT) :: x1(l1)
  INTEGER, INTENT(INOUT) :: x2(l2, l1)
  INTEGER, INTENT(INOUT) :: x3(l3, l2, l1)
  INTEGER, INTENT(INOUT) :: x4(l4, l3, l2, l1)
  DO i1=1,l1
    x1(i1) = ...
    DO i2=1,l2
      x2(i2, i1) = ...
      DO i3=1,l3
        x3(i3, i2, i1) = ...
        DO i4=1,l4
          x4(i4, i3, i2, i1) = ...
        END DO
      END DO
    END DO
  END DO

to:

  INTEGER, INTENT(INOUT) :: x1(l1)
  INTEGER, INTENT(INOUT) :: x2(l2*l1)
  INTEGER, INTENT(INOUT) :: x3(l3*l2*l1)
  INTEGER, INTENT(INOUT) :: x4(l4*l3*l2*l1)
  DO i1=1,l1
    x1(i1) = ...
    DO i2=1,l2
      x2(i2 + l2*(i1 + -1)) = ...
      DO i3=1,l3
        x3(i3 + l3*(i2 + l2*(i1 + -1) + -1)) = ...
        DO i4=1,l4
          x4(i4 + l4*(i3 + l3*(i2 + l2*(i1 + -1) + -1) + -1)) = ...
        END DO
      END DO
    END DO
  END DO

However:

currently not handled are arrays like

NTEGER, INTENT(INOUT) :: x4(0:l4, 0:l3, 0:l2, 0:l1)

This complicates things a lot and the question is how to handle this especially regarding the f2c transpilation... thus, whether this utility should handle that or whether another utility/transformation before flattening the arrays is more appropriate.

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/199/index.html

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (2f5158a) 92.23% compared to head (b2fef9b) 92.23%.

Files Patch % Lines
loki/transform/transform_array_indexing.py 95.45% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #199      +/-   ##
==========================================
- Coverage   92.23%   92.23%   -0.01%     
==========================================
  Files          95       95              
  Lines       16972    17009      +37     
==========================================
+ Hits        15654    15688      +34     
- Misses       1318     1321       +3     
Flag Coverage Δ
lint_rules 96.21% <ø> (-0.01%) ⬇️
loki 92.21% <95.74%> (+<0.01%) ⬆️
transformations 91.40% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, this is a great addition and clean solution to the original pointer hack. I think there's no need to retain the original pointer variant.

I have left a few remarks throughout, minor things mostly. The testing could benefit from a sanity check on the IR level directly, and I've proposed a potential solution for range-based shapes.

Comment on lines 525 to 574
assert order in ['F', 'C']
if order == 'C':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions)[::-1], list(var.shape)[::-1]))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
elif order == 'F':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions), list(var.shape)))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve our error reporting, I'd suggest not to use assert here but instead throw an exception in an else branch:

Suggested change
assert order in ['F', 'C']
if order == 'C':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions)[::-1], list(var.shape)[::-1]))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
elif order == 'F':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions), list(var.shape)))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
if order == 'C':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions)[::-1], list(var.shape)[::-1]))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
elif order == 'F':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions), list(var.shape)))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
}
else:
raise ValueError(f'Unsupported array order "{order}")

array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions)[::-1], list(var.shape)[::-1]))
for var in FindVariables().visit(routine.body)
if isinstance(var, sym.Array) and var.shape and len(var.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ratehr unconventional to increase indentation here since you are not entering a new block/scope

}
elif order == 'F':
array_map = {
var: var.clone(dimensions=new_dims(list(var.dimensions), list(var.shape)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid the conversion to list here and the other uses. Instead, use a tuple for _dim and replace the call to extend in new_dims by new_dim += _dim which allows you to work directly with the tuples.

loki/transform/transform_array_indexing.py Outdated Show resolved Hide resolved
tests/test_transform_array_indexing.py Show resolved Hide resolved
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for this! I've left a few housekeeping remarks but otherwise this looks very promising.

Note that you may have to rebase over main again to avoid the regression tests from failing.

loki/transform/transform_array_indexing.py Outdated Show resolved Hide resolved
loki/transform/fortran_c_transform.py Outdated Show resolved Hide resolved
tests/test_transform_array_indexing.py Outdated Show resolved Hide resolved
@MichaelSt98 MichaelSt98 force-pushed the nams_f2c_index_arithmetic branch 2 times, most recently from ff67104 to f70cc6e Compare December 19, 2023 16:42
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, everything looks fantastic now.

@mlange05 do you want to confirm this is fine in offline regression tests?

@reuterbal reuterbal added the ready for merge This PR has been approved and is ready to be merged label Dec 20, 2023
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had a brief look at code changes, but played with CLODUSC regression test. All seems good, so GTG from me. :shipit:

@MichaelSt98
Copy link
Collaborator Author

  • Old/"pointer cast hack" version
    • compiles and runs with Intel
    • won't compile with NVHPC
  • New/"index arithmetic" version
    • compiles and runs with Intel
    • compiles with NVHPC and runs for NPROMA=1,2
      • however, floating point exception for different NPROMA
      • works with arbitrary NPROMA compiling with -O0 instead of -O2 (haven't checked for -O1)

@reuterbal
Copy link
Collaborator

    * however, floating point exception for different `NPROMA`
    * works with arbitrary `NPROMA` compiling with `-O0` instead of `-O2` (haven't checked for `-O1`)

This suggests to me that we see one of the recurring vectorization-related issues. Does this occur in the kernel itself or the validation routine?

@MichaelSt98
Copy link
Collaborator Author

    * however, floating point exception for different `NPROMA`
    * works with arbitrary `NPROMA` compiling with `-O0` instead of `-O2` (haven't checked for `-O1`)

This suggests to me that we see one of the recurring vectorization-related issues. Does this occur in the kernel itself or the validation routine?

Yes ... must be kernel itself, as the rest (including validation) is still (the original) Fortran?!

@reuterbal reuterbal merged commit 556db2e into main Dec 20, 2023
12 checks passed
@reuterbal reuterbal deleted the nams_f2c_index_arithmetic branch December 20, 2023 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants