Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hdf5 swm rmode #395

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Hdf5 swm rmode #395

wants to merge 3 commits into from

Conversation

Thalos12
Copy link

The SWMR feature of HDF5 has been implemented in the HDFBackend and is activated setting the swmr argument to True. The process reading the file should not only use the SWMR mode but also set the environment variable HDF5_USE_FILE_LOCKING="FALSE".

Let me know if I should change anything I have added.

@Thalos12
Copy link
Author

I can see that the build of the documentation failed, but in the details I can only see a warning about the swmr.ipynb not being in any toctree. I am not familiar with sphinx, therefore I do not know how to fix this error...

@Thalos12
Copy link
Author

Nevermind, it was easy enough that I finally figured it out.

@dfm
Copy link
Owner

dfm commented Jul 15, 2021

@Thalos12: Thanks for this!! I'll need a couple of days to get to this, but as a zeroth order comment: how hard do you think it would be to add a unit test for this feature? Also it would be good to put the version requirements for h5py somewhere - what were they again?

@Thalos12
Copy link
Author

@dfm: Regarding the unit test, I have to say that I am not very experienced in writing it, but I will try my best nevertheless using those already present as examples. For the version requirements of h5py, there are none (explicitly), the requirements are on the underlying HDF5 library (>=1.10), and internally h5py already checks them, so i chose not to add duplicate code. I can mention these requirements in the tutorial notebook, if you think it would be a good idea! I can also add them to the HDFBackend docstring.

@dfm
Copy link
Owner

dfm commented Jul 15, 2021

Awesome! Yes - please add some comments about versions to the docstring and tutorial notebook.

For the unit test: no stress! I think it'll be tricky to write exactly the one we want (we'll probably need to launch multiple processes) so I'm happy to do that as I'm reviewing. Do you want to write one that just checks that the backend at least works with swm mode enabled?

@Thalos12
Copy link
Author

Thank you, I will add the requirements in both places then! Also, I suppose I should be able to make the test for the backend, I'll add a commit as soon as I have implemented it.

@znicholls
Copy link
Contributor

@dfm and @Thalos12 is there anything that can be done to help this? This feature looks super useful so I'd be glad to pitch in.

@Thalos12
Copy link
Author

Hi @znicholls, while making tests I discovered that the change I proposed did not work as I wanted. Going by memory, the problem is that the output hdf5 file is opened (for writing) and closed at every step of the chain (see emcee/backends/hdf.py). Consequently, crashes can still occur if a reader opens the file just before the writer, regardless of SWMR.
I could not find a way to fix this at the time, and I forgot to close the issue (@dfm: apologies, I should have done that).

To make SWMR works, it would be necessary to modify HDFBackend to keep the file open, if at all possible, and also implement some mechanism to safely close it when needed (chain ends, exception is raised, ...).
@dfm: please correct me if I'm wrong

@znicholls
Copy link
Contributor

Huh that's a shame. When I read your notebook, it seemed like the reader would sometimes crash, but the writer would never crash which still seemed like a win to me and something worth including. Maybe I misunderstood

@Thalos12
Copy link
Author

Thalos12 commented Mar 16, 2023

I believe that I did find that my solution sometimes failed, but I can't be sure it wasn't just my fault. I might try to dig up the code I used and see if I can reproduce the errors I remember I was having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants