Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle collision in array wrapper (that breaks CI intermittently) #222

Open
smiet opened this issue Jul 1, 2024 · 0 comments
Open

Handle collision in array wrapper (that breaks CI intermittently) #222

smiet opened this issue Jul 1, 2024 · 0 comments

Comments

@smiet
Copy link
Contributor

smiet commented Jul 1, 2024

I am encountering a sporadic handle collision for the python-side cache of f90wrapped fortran arrays.

This is because the pywrap keeps a cache of accessed arrays that are indexed by handle, which is the memory location of the array. When arrays are deallocated and then re-allocated, especially on memory-constrained systems, there is a small but not insignificant chance of them being placed in the exact same location. This would happen sporadically in our CI runners resulting in very hard-to-diagnose failures, that were very hard to reproduce on our user systems.

The pywrap for an array resolves to:

    @property
    def <name>(self)
        """
        Defined at \
            <loc> \
            line xxx
        
        """
        array_ndim, array_type, array_shape, array_handle = \
            _spec_f90wrapped.f90wrap_<mod_name>__array__<name>(f90wrap.runtime.empty_handle)
        if array_handle in self._arrays:
            <name> = self._arrays[array_handle]
        else:
            <name> = f90wrap.runtime.get_array(f90wrap.runtime.sizeof_fortran_t,
                                    f90wrap.runtime.empty_handle,
                                    _spec_f90wrapped.f90wrap_<mod_name>__array__<name>)
            self._arrays[array_handle] = <name>
        return <name>

The f90wrapped code for the array looks like

subroutine f90wrap_allglobal__array__allrzrz(dummy_this, nd, dtype, dshape, dloc)
    use constants
    use typedefns
    use allglobal, only: allglobal_allrzrz => <NaMe>
    use, intrinsic :: iso_c_binding, only : c_int
    implicit none
    integer, intent(in) :: dummy_this(2)
    integer(c_int), intent(out) :: nd
    integer(c_int), intent(out) :: dtype
    integer(c_int), dimension(10), intent(out) :: dshape
    integer*8, intent(out) :: dloc
    
    nd = x
    dtype = y
    if (allocated(<array_name>)) then
        dshape(1:x) = shape(<array_name>)
        dloc = loc(<array_name>)
    else
        dloc = 0
    end if
end subroutine f90wrap_<mod_name>__array__<array_name>

Could a different handle generator be used than the pointer, like a hash (though this would change if the data in the array changes)? Or could the cache be invalidated if an array is deallocated?

Though such collisions are unlikely, they are more likely to pop up in CI runners because they are: a) more memory constrained and b) they run similar tests many times, de-alllocating and allocating the same arrays over and over again.

What will hopefully solve this for us is manually clearing the modules' cache when instantiating a class that uses the f90wrapped code with my_mod._arrays = {}, but the error we found seems to hint at a deeper issue

P.S. f90wrap is being used with much effect to wrap magneto hydrodynamic equilibrium codes and optimize stellarator fusion reactors, thanks for the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant