-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nc_create_par triggers HDF5 error stack message related to H5FDunregister #2990
Comments
Thanks for highlighting this, I haven't tested against HDF5 1.15.0 yet. Do you know offhand if this issue also happens when using HDF5 1.14.x or earlier? |
I have seen this before. I will take a look.
Ward, I hope you're not doing the release this weekend or early next week.
There are a few things I'm stuck on but they might be important enough to
get into the next release...
Ed
…On Fri, Aug 23, 2024 at 1:24 PM Ward Fisher ***@***.***> wrote:
Thanks for highlighting this, I haven't tested against HDF5 1.15.0 yet. Do
you know offhand if this issue also happens when using HDF5 1.14.x or
earlier?
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCSXXGAOOKULVR345Q4JN3ZS6D7RAVCNFSM6AAAAABNAXMR7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGY4DINRRHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
rc2 was ideally going to come out next week, after rc1 for netCDF-Fortran
today/early next week, but that has already slipped due to fires that
cropped up this morning. So no worry, no window for the v4.9.3 release
today or early next week.
|
I confirmed that it also happens with 1.14.4-3. |
I've walked back to old combinations of hdf5 and netcdf which definitely did not have this issue, but I am observing it now. I will continue to poke around. |
I am also seeing the issue in the pure-h5 test |
I'm not 100% certain it's just a change in HDF5. Testing against HDF5 On Ubuntu On Ubuntu This doesn't rule out 100% a change in HDF5, either, but I wonder about changes in the newer gcc/mpicc versions. The errors reported using gcc
|
What I have been seeing is these error messages in parallel I/O tests, but they do not error out, they just print. |
Interesting/frustrating. Something is going on, that much is certain. Let me double-check against the latest HDF5 again. |
For completeness sake, the issue I'm seeing (with old versions like
|
@WardF @edhartnett Hi all, I believe what may be going on here is an ordering issue where HDF5 has already closed the ID used for the HTTP VFD before getting to the part where the if (H5FD_HTTP_g && (H5Iis_valid(H5FD_HTTP_g) > 0))
H5FDunregister(H5FD_HTTP_g); https://github.com/Unidata/netcdf-c/blob/main/libhdf5/H5FDhttp.c#L259-L261 Alternatively, the code to unregister the ID could be moved to |
That sounds very plausible. When is H5FD_http_term called? |
That function should be called when HDF5 is terminating and closing IDs and gets around to releasing the ID that was registered for the VFD by the VFD itself. In that sense, calling https://github.com/HDFGroup/hdf5/blob/develop/src/H5FDmulti.c#L254-L261 |
Well there should be no reason we can't follow what other VFDs do. |
I believe it should be fine to just reset the |
I agree. There should be nothing particularly different in the H5FDhttp.c code. |
So I believe I am seeing something different from what was originally reported by OP; I will try to sort it out, but it boils down to issues with mpich (gcc) 13.x+; even rolling back to old versions of hdf5 and netCDF, which worked previously, now give errors (not messages which can be otherwise ignored). Putting this information here in case it is connected in a way that is obvious to folk more familiar with MPI. Otherwise, I'll continue sorting it out on my end.
|
This was closed as part of the PR merges that went in; I recognize it may be premature and will re-open if folk are still observing this. We've incorporated #3012 which was testing for the issue (thanks @edwardhartnett) and #3013 which incorporated a fix (thanks @jhendersonHDF), after which the tests went from failing to passing. I'm still seeing MPI related issues with the most recent Ubuntu, but I will open a new issue. |
To report a non-security related issue, please provide:
the version of the software with which you are encountering an issue
v4.9.3-rc1
environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
Frontier (ORNL), Cray clang version 17.0.0 (b59b7a8e9169719529cf5ab440f3c301e515d047)
a description of the issue with the steps needed to reproduce it
A call to
status = nc_create_par(output_path, NC_NETCDF4 | NC_CLOBBER, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);
is returning the HDF5 error message
It does not cause an error; it is just a distraction since all the ranks print the message. It seems to be triggered by H5FD_http_finalize
The text was updated successfully, but these errors were encountered: