-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the OME hierarchy correct? Field of view placement in hierarchy #36
Comments
Though reading through the whole documentation, I couldn't figure out where one would specify the arrangements and whether there is a logic to handle things like a search-first approach where things aren't saved in a grid. Thus, the current approach of saving everything as a single field of view where we define the actual placement may have its benefits => Discussion topic, may also be something to raise with OME-Zarr devs |
@jluethi exactly, that is the FOV folder. We have just one single field of view per well, which is the simplest case but now the answer is, do you think that using sites as fov could be feasible ? For my understanding, considering yokogawa microscope a field of view is a site in our case or a grid of sites ? Probably also @gusqgm could have some suggestions on this. |
Hey @mfranzon you are right with your first assumption: one field-of-view of the yokogawa means one XY site, having multiple slices in Z. |
To avoid some confusion: This apparently works, even if it isn't the officially intended use-case. But for the search first, this may actually be the easiest way to do it. I don't know if there are performance downsides to do this, but we should keep it in mind and see if it makes sense to go spec-compliant at least for the grid-case. |
The discussion on how to show several fields of view is broader than this single issue, and it continues in ome/ome-zarr-py#200. Before closing this issue, I will store our (not yet committed) code to build the "correct" zarr file (with each field of view in a different folder) in a different branch. We'll see in the future whether it can be useful, through some of the options discussed in the ome-zarr-py issue. |
@tcompa Sounds like a good plan! I'll be looking into the ways we could enable this with the ome zarr devs. But let's keep our single fov approach for the moment then and go back to the branch with multi-fov once that becomes viable to also visualize in napari :) |
* In conversionTaskWrap.py, construct the list of unique sites, and pass it to the slurm file (through the jinja template); * In yokogawa_to_zarr.py, populate all available sites/FOVs in the zarr file.
This is now done, with commit 4738f1f (in the https://github.com/fractal-analytics-platform/mwe_fractal/tree/dev-multi-fov branch). The workflow constructs a list of sites (AKA fields of view), and populates the zarr hierarchy accordingly, instead of merging them all together. The dev-multi-fov branch will not be developed, at the moment, but we could revive it at a later stage. I'm closing this issue (on the zarr file generation), since what is left is more about visualization (see issue on ome-zarr-py). |
@tcompa We should generate some test multi-fov OME-Zarr data to work on #66. What is the easiest way we get to these, trying to run the old branch or updating this branch to parsl so we can run it in the new setup? Once we fix #66, this should become a core part of the yokogawa_to_zarr workflow, so will eventually need to be ported to the parsl tasks anyway. Depending on how much effort this is, we can do this now or once we figured out the lazy loading |
Agreed, that's necessary to move forward. |
I quickly ported the multi-fov parsing tasks (there are two tasks: one to create the zarr structure, and one to write data) to Let's use/adapt that script (also with other datasets), to see whether the generated zarr seems correct. If so, we can then use it as a starting point for improvements of the reader. |
That's great! We can use the single well, 4 sites case as our test case for loading wells lazily & loading multi-site wells when reading a plate :) |
Some additional motivation to work in this direction: A single well from the FMI test dataset (5x4 sites, ~80 Z layers, 4 channels) takes ~36G. And at the moment several tasks (e.g. illumination correction) are loading data for the entire well. If some parameters increase just a little bit (9x8 sites, or some more channels) we will quickly run into memory errors even for just loading a single well. (we are also looking into other options, like reading channel by channel, but with no luck at the moment) EDIT: part of the issue here is probably due to a wrong use of dask's |
@tcompa We can of course use dask's delayed reading feature more in places where the downstream processing can also work with that. If we have specific spots where memory needs escalate, that's a good way to work around it. We should still aim for using:
though where possible. Single site processing will decrease memory load, as described, and also allow for stronger parallelization, especially for smaller datasets with few wells. Also, it will make it much easier to have many jobs with "low" demands running, which are probably easier to keep in check. It will be important to be able to be channel specific as well though. For example, illumination correction should apply a different matrix to each channel (the correction profiles are slightly different from each other). And we will have experiments where we can have 60 channels instead of 3 channels. This bug also shows why explicit channel handling will be important: #61 |
Still, what I was observing (>60G of memory loaded into memory) was due to the presence of an intermediate |
Sounds good. Investigating the second issue about channel access will be important, because we will often want to load specific channels and the user actually needs to specify which channel they want, e.g. based on names we parse in initially. The illumination correction will change with the channels, registration will be based on specific channels, measurements will be made for specified channels only etc. If that turns out to be a larger issue, let's open a separate issue to discuss it further :) |
Testing the multi-FOV examples now, the single well, 2x2 test dataset looks good! Nicely saved into 4 separate FOVs :) |
Closing this issue, as |
I was looking through the hierarchy structure for our zarr files now that we use them in the multiwell setup. The thing I'm not fully sure about:
What is the subfolder just below the column? The ones marked in the file tree below as " <= What is this?".
Now according to the OME-NGFF spec, after column, there is supposed to be field of views (e.g. our original images). But our images are all at the lowest level. If we had field of view at that level, we should have 36 things there in the test case
/data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_subset/
.@mfranzon we should discuss this. Was there an issue with following this spec or just a misunderstanding? I'm not sure whether this will affect performance or whether it's just a standard compliance question.
Also, if we follow this logic, do we still get the correct arrangements of the field of views (=sites?). In either case, we should keep a copy of the code that creates the current output if we want to make everything into a huge, single field-of-view :)
The text was updated successfully, but these errors were encountered: