-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SWMR in HDF5 #1448
base: main
Are you sure you want to change the base?
Support SWMR in HDF5 #1448
Conversation
Current status:
|
My first thought is that performance needs to be checked with and without this. I am particularly concerned with the flushes. This would eleiminate any benefit from buffering writes... |
I should probably remove the flushing from NC4_put_vars, and leave the responsibility of flushing to the user. A user opening a file in swmr-mode, should make sure to call sync_netcdf4_file (or another suitable function) regularly instead. |
I think the idea is good, in general. Is there a way for this to be always-on without breaking anything else? In other words, does this have to be a mode flag - could it just be applied to every file? |
I would suggest that you build with --enable-benchmarks, and you can very easily see any performance impact. Another question is how does this interact with parallel I/O, if at all? |
@@ -50,6 +50,10 @@ nc4_create_file(const char *path, int cmode, size_t initialsz, | |||
NC_HDF5_FILE_INFO_T *hdf5_info; | |||
NC_HDF5_GRP_INFO_T *hdf5_grp; | |||
|
|||
#ifdef HAVE_H5PSET_LIBVER_BOUNDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever not have H5PSET_LIBVER_BOUNDS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was introduced in HDF5-1.8.0
high = H5F_LIBVER_V18; | ||
if ((cmode & NC_HDF5_SWMR)) { | ||
low = H5F_LIBVER_LATEST; | ||
high = H5F_LIBVER_LATEST; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this create files that are unreadable by older versions of HDF5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that H5F_LIBVER_V18
was introduced in 1.10.2 so maybe the HAVE_H5PSET_LIBVER_BOUNDS
is checking for that version and not for existance of H5Pset_libver_bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the files will be unreadable with 1.8.X clients. Here is a blurb from 1.10.2 release notes:
When it is used in application linked with HDF5 1.10.0, it will enable new chunk indexing for Single Writer/Multiple Reader (SWMR) access and Virtual Dataset storage.
What does this change mean to an HDF5 application?
When an HDF5 application linked with HDF5 1.10.2 specifies H5F_LIBVER_LATEST as a value for the “high” parameter, the application may produce files that are not compatible with the HDF5 1.8.* file format. For example, new chunk indexing will be used that was not known to HDF5 1.8.. This means that an application linked with HDF5 1.8. libraries may not be able to read such files.
When an HDF5 application linked with HDF5 1.10.2 specifies H5F_LIBVER_V18 as a value for the “high” parameter, the application will produce files fully compatible with HDF5 1.8., meaning that any application linked with the HDF5 1.8. libraries will be able to read such files.
if (nc_close(ncid2)) ERR; | ||
|
||
} | ||
SUMMARIZE_ERR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent test, thanks!
SWMR does not work with a parallel writer or reader. Currently it is a serial writer, serial reader(s) only capability. However, I do see merit in supporting this as it is very confusing for people to not be able to look at their files as they are being written if in NetCDF4 format when they have had no issues doing this for NetCDF3 files... Hopefully HDF5 group will get parallel-writer, serial reader SWMR working at some point. I think it is planned for 2020 or 2021... |
I am thinking about the interactions of this with our internal metadata |
I believe we do not keep track of the maximum extent of unlimited vars. When we need to know, we check at that time. |
Another, perhaps even more important, feature of this is that it will allow the user to get the best HDF5 performance. For many users, inability to be read by 1.8 version of HDF5 would not be a big problem. The 1.10 series has been out for years and years now. Not many people should be stuck on 1.8. In light of that, perhaps the mode flag should be something like NC_HDF5_LATEST. |
Seems to be failing a test called tst_zero_len_var.sh. |
@edhartnett Agree that performance is (sometimes) better with 1.10.X, but there are also many instances where 1.8 is faster which is some of the reason why 1.8.X is still under development. THG is working on finding the performance regressions in 1.10. Also, there are several clients still using 1.8 (Paraview is/was until latest release) and many systems still ship with 1.8 libraries. I wish this weren't true but have tracked down many issues due to 1.10.X files not being usable in downstream applications. But, 1.10.X gives some opportunities for vastly improving parallel performance and I definitely advocate for the use of 1.10.X; just some caveats in doing so. |
Support for HDF5 1.10.x features that live in the I certainly support increasing the minimum required version, but I agree with @gsjaardema that there are still too many areas where 1.10.x is not on par with 1.8.x; we aren't ready to tell our community that they must switch. |
Thank you for all the positive response! If I then have understood correctly, I should add ifdefs such that the swmr-code is not compiled with hdf 1.8. And probably remove the call to flush after variable writes (but then I will need some help to expose H5DFlush to the user). Then run benchmarks. Am I on the right track? |
I think this would be good to merge, if it passes tests, which looks like they need to be re-run... |
A couple of things:
|
I haven't reviewed this yet, but it would need to be rebased against the current |
My only real objection is that it used up another mode flag. |
This would be a really useful feature. What's needed to get it over the line, just the option flag adding to |
I can confirm this would be a really useful feature, especially in HPC (i.e. supercomputer) applications. If many multiple processes can each open a file for reading, that would be a tremendous savings in complexity. |
I've merged in There's also some subtlety in how to use SWMR -- all the groups and variables must be defined before enabling it. With this PR, the only mechanism for enabling SWMR is through the file open flag Another option might be to not pass the flag to HDF5, but call Also, |
This PR is an attempt to add support for the SWMR feature found in HDF5 >= 1.10.
This should allow concurrent access to an hdf5-file from multiple processes, as long as there is a single writer process [1,2]. The usecase is quite common, where a simulation service is writing results to a file, and one or more "readers" wish to get the latest results from the same file. Without swmr-support, this usecase may lead to corrupt files. There have been some requests about this feature in the past [3,4].
The PR is not complete, but it would be very helpful with some feedback at this point. First of all, is SWMR-support something you would like to include in netcdf-c?
According to the SWMR programming model, there are only minor changes required in netcdf-c:
a) Create the file with H5F_LIBVER_LATEST
b) Calling H5F_start_swmr_write in the writer process
c) Open the file with flag H5F_ACC_SWMR_READ in the reader process
d) Calling H5Dflush regularly in the writer process
Looking forward to hear your opinion about this.
[1] https://support.hdfgroup.org/HDF5/docNewFeatures/SWMR/HDF5_SWMR_Users_Guide.pdf
[2] https://support.hdfgroup.org/HDF5/Tutor/swmr.html
[3] Unidata/netcdf4-python#862
[4] https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg13717.html