Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LFRic] [PSyAD] Missing halo exchange and immediate cleaning of halos when using setval_random with redundant calculations #2805

Open
DrTVockerodtMO opened this issue Nov 27, 2024 · 13 comments

Comments

@DrTVockerodtMO
Copy link
Collaborator

DrTVockerodtMO commented Nov 27, 2024

We are testing some LFRic branches that run the adjoint tests in parallel. We find that when using setval_random there is a missing halo exchange in the PSy layer code after the field values have been randomised. This causes some failures in adjoint tests that would otherwise pass if the halo is properly initialised.

With redundant calculations, the halo exchange would be missed because after the field is randomised, the halo is set to dirty but then immediately set to clean. This isn't present when using setval_random normally. However, there is also no halo exchange after setval_random is used before it enters the forward kernels, which when run in parallel causes floating invalid errors.

Ultimately, we need to perform a halo exchange between the use of setval_random and the forward kernel in order to get the tests to work properly, whilst still randomising the annexed dofs.

As an example, looking at atlt_vorticity_advection_alg_mod we have the following invoke:

call invoke( setval_random(r_u), setval_x(r_u_input, r_u), &
             setval_random(wind), setval_x(wind_input, wind), &
             setval_random(vorticity), setval_x(vorticity_input, vorticity), &
             setval_random(ls_wind), setval_random(ls_vorticity), &
             tl_vorticity_advection_kernel_type(r_u, wind, vorticity, ls_wind, ls_vorticity, chi, panel_id, qr_xyoz), &
             x_innerproduct_x(r_u_inner_prod, r_u), &
             x_innerproduct_x(wind_inner_prod, wind), &
             x_innerproduct_x(vorticity_inner_prod, vorticity) )

which when compiled using LFRic parallel optimisations and the options described above yields

      !
      ! Set-up all of the loop bounds
      !
      loop0_start = 1
      loop0_stop = r_u_proxy%vspace%get_last_dof_halo(1)
      loop1_start = 1
      loop1_stop = r_u_input_proxy%vspace%get_last_dof_annexed()
      loop2_start = 1
      loop2_stop = wind_proxy%vspace%get_last_dof_halo(1)
      loop3_start = 1
      loop3_stop = wind_input_proxy%vspace%get_last_dof_annexed()
      loop4_start = 1
      loop4_stop = vorticity_proxy%vspace%get_last_dof_halo(1)
      loop5_start = 1
      loop5_stop = vorticity_input_proxy%vspace%get_last_dof_annexed()
      loop6_start = 1
      loop6_stop = ls_wind_proxy%vspace%get_last_dof_halo(1)
      loop7_start = 1
      loop7_stop = ls_vorticity_proxy%vspace%get_last_dof_halo(1)
      loop8_start = 1
      loop8_stop = ncolour
      loop9_start = 1
      loop10_start = 1
      loop10_stop = r_u_proxy%vspace%get_last_dof_owned()
      loop11_start = 1
      loop11_stop = wind_proxy%vspace%get_last_dof_owned()
      loop12_start = 1
      loop12_stop = vorticity_proxy%vspace%get_last_dof_owned()
      !
      ! Call kernels and communication routines
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop0_start,loop0_stop
        CALL RANDOM_NUMBER(r_u_data(df))
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL r_u_proxy%set_dirty()
      CALL r_u_proxy%set_clean(1)
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop1_start,loop1_stop
        r_u_input_data(df) = r_u_data(df)
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL r_u_input_proxy%set_dirty()
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop2_start,loop2_stop
        CALL RANDOM_NUMBER(wind_data(df))
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL wind_proxy%set_dirty()
      CALL wind_proxy%set_clean(1)
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop3_start,loop3_stop
        wind_input_data(df) = wind_data(df)
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL wind_input_proxy%set_dirty()
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop4_start,loop4_stop
        CALL RANDOM_NUMBER(vorticity_data(df))
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL vorticity_proxy%set_dirty()
      CALL vorticity_proxy%set_clean(1)
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop5_start,loop5_stop
        vorticity_input_data(df) = vorticity_data(df)
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL vorticity_input_proxy%set_dirty()
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop6_start,loop6_stop
        CALL RANDOM_NUMBER(ls_wind_data(df))
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL ls_wind_proxy%set_dirty()
      CALL ls_wind_proxy%set_clean(1)
      !
      ! End of set dirty/clean section for above loop(s)
      !
      !$omp parallel default(shared), private(df)
      !$omp do schedule(static)
      DO df=loop7_start,loop7_stop
        CALL RANDOM_NUMBER(ls_vorticity_data(df))
      END DO
      !$omp end do
      !$omp end parallel
      !
      ! Set halos dirty/clean for fields modified in the above loop(s)
      !
      CALL ls_vorticity_proxy%set_dirty()
      CALL ls_vorticity_proxy%set_clean(1)
      !
      ! End of set dirty/clean section for above loop(s)
      !
      IF (chi_proxy(1)%is_dirty(depth=1)) THEN
        CALL chi_proxy(1)%halo_exchange(depth=1)
      END IF
      !
      IF (chi_proxy(2)%is_dirty(depth=1)) THEN
        CALL chi_proxy(2)%halo_exchange(depth=1)
      END IF
      !
      IF (chi_proxy(3)%is_dirty(depth=1)) THEN
        CALL chi_proxy(3)%halo_exchange(depth=1)
      END IF
      !
      IF (panel_id_proxy%is_dirty(depth=1)) THEN
        CALL panel_id_proxy%halo_exchange(depth=1)
      END IF
      !
      DO colour=loop8_start,loop8_stop
        !$omp parallel default(shared), private(cell)
        !$omp do schedule(static)
        DO cell=loop9_start,last_halo_cell_all_colours(colour,1)
          !
          CALL tl_vorticity_advection_code(nlayers, r_u_data, wind_data, vorticity_data, ls_wind_data, ls_vorticity_data, &
&chi_1_data, chi_2_data, chi_3_data, panel_id_data, ndf_w2, undf_w2, map_w2(:,cmap(colour,cell)), basis_w2_qr_xyoz, ndf_w1, &
&undf_w1, map_w1(:,cmap(colour,cell)), basis_w1_qr_xyoz, ndf_aspc9_chi, undf_aspc9_chi, map_aspc9_chi(:,cmap(colour,cell)), &
&basis_aspc9_chi_qr_xyoz, diff_basis_aspc9_chi_qr_xyoz, ndf_adspc3_panel_id, undf_adspc3_panel_id, &
&map_adspc3_panel_id(:,cmap(colour,cell)), np_xy_qr_xyoz, np_z_qr_xyoz, weights_xy_qr_xyoz, weights_z_qr_xyoz)
        END DO
        !$omp end do
        !$omp end parallel
      END DO
      !
      ! Set halos dirty/clean for fields modified in the above loop
      !
      CALL r_u_proxy%set_dirty()
      !
      ... ...

where I have omitted the inner product calculations for brevity. Running this as-is causes a floating invalid error in the forward code.

@arporter
Copy link
Member

setval_random by itself should not trigger a halo exchange, this should be performed before whatever reads the field that has been written to by that kernel. Is the metadata for the subsequent kernel correct? (Performing redundant computation in setval_random might then remove this subsequent halo exchange BTW.)

@DrTVockerodtMO
Copy link
Collaborator Author

setval_random by itself should not trigger a halo exchange, this should be performed before whatever reads the field that has been written to by that kernel. Is the metadata for the subsequent kernel correct? (Performing redundant computation in setval_random might then remove this subsequent halo exchange BTW.)

I believe the metadata is fine because the next part of the invokes are the forward kernels. I just had a look at some PSy layer code and remembered the actual problem so I will edit the description above.

@DrTVockerodtMO DrTVockerodtMO changed the title [LFRic] [PSyAD] Missing halo exchange when using setval_random [LFRic] [PSyAD] Immediate cleaning of halos when using setval_random Nov 27, 2024
@DrTVockerodtMO DrTVockerodtMO changed the title [LFRic] [PSyAD] Immediate cleaning of halos when using setval_random [LFRic] [PSyAD] Immediate cleaning of halos when using setval_random with redundant calculations Nov 27, 2024
@DrTVockerodtMO DrTVockerodtMO changed the title [LFRic] [PSyAD] Immediate cleaning of halos when using setval_random with redundant calculations [LFRic] [PSyAD] Missing halo exchange and immediate cleaning of halos when using setval_random with redundant calculations Nov 27, 2024
@arporter
Copy link
Member

There must be something special going on here as, from a PSyclone point of view, there's nothing special about setval_random c.f. the other setval_* builtins. They've been in use for a long time with no problems reported about missing halo exchanges. However, you mention 'annexed dofs' and that might be relevant. If you are running PSyclone with the 'annexed dofs' option set to True (https://psyclone.readthedocs.io/en/latest/dynamo0p3.html#annexed-dofs) then all builtins will (or should) compute out to the L1 halo to ensure that annexed dofs are clean. Does that shed any light on the situation?

@arporter
Copy link
Member

arporter commented Nov 27, 2024

Is this a new issue or was it present in the 2.5.0 release do you know? (I could have broken something in the recent work I did on the halo-exchange logic.)

@DrTVockerodtMO
Copy link
Collaborator Author

Is this a new issue or was it present in the 2.5.0 release do you know? (I could have broken something in the recent work I did on the halo-exchange logic.)

I am not entirely certain I'm afraid, sorry!

@arporter
Copy link
Member

YGM on Teams.

@TeranIvy
Copy link
Collaborator

Hi all! @DrTVockerodtMO, I am not sure what version of PSyclone you are using. Is it test environment or 2.5.0? I modified test environment for Josh's testing of operator changes. If you are using test environment, what is the LFRic branch you are using? In any case, I will reinstall PSyclone test environment from master and I would suggest double-checking that everything works.

@TeranIvy
Copy link
Collaborator

TeranIvy commented Nov 27, 2024

@arporter, as far as I can see the current LFRic Core and Apps trunk are fine with 2.5.0.

@arporter
Copy link
Member

Thanks Iva, from discussion with Terry, the problem seems to be that this is the first time anyone has tried to run the PSyAD test harness in parallel.

@DrTVockerodtMO
Copy link
Collaborator Author

Hi all! @DrTVockerodtMO, I am not sure what version of PSyclone you are using. Is it test environment or 2.5.0? I modified test environment for Josh's testing of operator changes. If you are using test environment, what is the LFRic branch you are using? In any case, I will reinstall PSyclone test environment from master and I would suggest double-checking that everything works.

This is not the test environment no.

@arporter
Copy link
Member

arporter commented Nov 28, 2024

The metadata for matrix-vector is:

  type, public, extends(kernel_type) :: matrix_vector_kernel_type
    private
    type(arg_type) :: meta_args(3) = (/                                    &
         arg_type(GH_FIELD,    GH_REAL, GH_INC,  ANY_SPACE_1),             &
         arg_type(GH_FIELD,    GH_REAL, GH_READ, ANY_SPACE_2),             &
         arg_type(GH_OPERATOR, GH_REAL, GH_READ, ANY_SPACE_1, ANY_SPACE_2) &
         /)
    integer :: operates_on = CELL_COLUMN
  end type

so it does perform GH_INC on a field on a continuous (or potentially continuous) function space. This will require annexed dofs to be clean on entry. This would normally trigger a halo exchange but, when using COMPUTE_ANNEXED_DOFS (s the MO do by default), the annexed dofs are always assumed to be clean and no halo exchange is added. The annexed dofs are made clean by ensuring that the setval_* kernels always (redundantly) update them - note that this will still leave parts of the L1 halo dirty. However, because matrix-vector operates on cells, it will still be accessing 'halo' dofs, even if the results are never used. Therefore, if those 'halo' dofs contain invalid data, we will get f.p. exceptions. The recommended way to ensure this doesn't happen is to always perform redundant computation to the L1 halo for all setval_ kernels. Once we see the generated PSy-layer, we'll be able to see definitively whether this is being done.

@arporter
Copy link
Member

Which version of PSyclone are you using @DrTVockerodtMO? I would have expected to see comments in the PSy-layer identifying each of the builtins? The problem seems to be that some of the builtins are only doing redundant updates for the annexed dofs rather than the whole L1 halo. This then means the GH_INC access in the kernel is accessing uninitialised data.

@DrTVockerodtMO
Copy link
Collaborator Author

Which version of PSyclone are you using @DrTVockerodtMO? I would have expected to see comments in the PSy-layer identifying each of the builtins? The problem seems to be that some of the builtins are only doing redundant updates for the annexed dofs rather than the whole L1 halo. This then means the GH_INC access in the kernel is accessing uninitialised data.

I am using PSyclone 2.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants