Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the OME hierarchy correct? Field of view placement in hierarchy #36

Closed
jluethi opened this issue May 9, 2022 · 17 comments
Closed

Is the OME hierarchy correct? Field of view placement in hierarchy #36

jluethi opened this issue May 9, 2022 · 17 comments
Assignees
Labels
discussion We should discuss specific design choices

Comments

@jluethi
Copy link
Collaborator

jluethi commented May 9, 2022

I was looking through the hierarchy structure for our zarr files now that we use them in the multiwell setup. The thing I'm not fully sure about:
What is the subfolder just below the column? The ones marked in the file tree below as " <= What is this?".

Now according to the OME-NGFF spec, after column, there is supposed to be field of views (e.g. our original images). But our images are all at the lowest level. If we had field of view at that level, we should have 36 things there in the test case /data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_subset/.

@mfranzon we should discuss this. Was there an issue with following this spec or just a misunderstanding? I'm not sure whether this will affect performance or whether it's just a standard compliance question.
Also, if we follow this logic, do we still get the correct arrangements of the field of views (=sites?). In either case, we should keep a copy of the code that creates the current output if we want to make everything into a huge, single field-of-view :)

.
└── 20200907-CardiomyocyteDifferentiation14-Cycle17.zarr
    └── B                 row
        ├── 03          column
        │   └── 0          <= What is this?
        │       ├── 0     pyramid level 0 
        │       │   ├── 0     channel 1
        │       │   ├── 1     channel 2
        │       │   └── 2     channel 3
        │       ├── 1     pyramid level 1
        │       │   ├── 0
        │       │   ├── 1
        │       │   └── 2
        │       ├── 2
        │       │   ├── 0
        │       │   ├── 1
        │       │   └── 2
        │       ├── 3
        │       │   ├── 0
        │       │   ├── 1
        │       │   └── 2
        │       └── 4
        │           ├── 0
        │           ├── 1
        │           └── 2
        └── 05
            └── 0          <= What is this?
                ├── 0
                │   ├── 0
                │   ├── 1
                │   └── 2
                ├── 1
                │   ├── 0
                │   ├── 1
                │   └── 2
                ├── 2
                │   ├── 0
                │   ├── 1
                │   └── 2
                ├── 3
                │   ├── 0
                │   ├── 1
                │   └── 2
                └── 4
                    ├── 0
                    ├── 1
                    └── 2

@jluethi jluethi changed the title Is the OME hierarchy correct? Is the OME hierarchy correct? Field of view placement in hierarchy May 9, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented May 10, 2022

Though reading through the whole documentation, I couldn't figure out where one would specify the arrangements and whether there is a logic to handle things like a search-first approach where things aren't saved in a grid. Thus, the current approach of saving everything as a single field of view where we define the actual placement may have its benefits => Discussion topic, may also be something to raise with OME-Zarr devs

@mfranzon
Copy link
Collaborator

@jluethi exactly, that is the FOV folder. We have just one single field of view per well, which is the simplest case but now the answer is, do you think that using sites as fov could be feasible ? For my understanding, considering yokogawa microscope a field of view is a site in our case or a grid of sites ? Probably also @gusqgm could have some suggestions on this.

@gusqgm
Copy link
Collaborator

gusqgm commented May 10, 2022

Hey @mfranzon you are right with your first assumption: one field-of-view of the yokogawa means one XY site, having multiple slices in Z.
This means that for the search-first case we would then still have a per well .zarr file, just that each FOV/site would carry with it its starting position within the overview space, right?

@jluethi
Copy link
Collaborator Author

jluethi commented May 10, 2022

To avoid some confusion:
From microscopy nomenclature, each xy position would be a field of view (FOV) / site. FOV & site are equivalent terms for our use case. Thus, we would expect that if we parse according to the spec, we have e.g. 72 FOVs for my test case, not 1. What is actually implemented at the moment is combining all the FOVs into one and tiling it together correctly, then saving this as a single FOV to Zarr.

This apparently works, even if it isn't the officially intended use-case. But for the search first, this may actually be the easiest way to do it.

I don't know if there are performance downsides to do this, but we should keep it in mind and see if it makes sense to go spec-compliant at least for the grid-case.

@tcompa
Copy link
Collaborator

tcompa commented May 20, 2022

The discussion on how to show several fields of view is broader than this single issue, and it continues in ome/ome-zarr-py#200.

Before closing this issue, I will store our (not yet committed) code to build the "correct" zarr file (with each field of view in a different folder) in a different branch. We'll see in the future whether it can be useful, through some of the options discussed in the ome-zarr-py issue.

@tcompa tcompa self-assigned this May 20, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented May 20, 2022

@tcompa Sounds like a good plan! I'll be looking into the ways we could enable this with the ome zarr devs. But let's keep our single fov approach for the moment then and go back to the branch with multi-fov once that becomes viable to also visualize in napari :)

tcompa added a commit that referenced this issue May 20, 2022
* In conversionTaskWrap.py, construct the list of unique sites, and pass it to the slurm file (through the jinja template);
* In yokogawa_to_zarr.py, populate all available sites/FOVs in the zarr file.
@tcompa
Copy link
Collaborator

tcompa commented May 20, 2022

This is now done, with commit 4738f1f (in the https://github.com/fractal-analytics-platform/mwe_fractal/tree/dev-multi-fov branch). The workflow constructs a list of sites (AKA fields of view), and populates the zarr hierarchy accordingly, instead of merging them all together.

The dev-multi-fov branch will not be developed, at the moment, but we could revive it at a later stage.

I'm closing this issue (on the zarr file generation), since what is left is more about visualization (see issue on ome-zarr-py).

@tcompa tcompa closed this as completed May 20, 2022
Repository owner moved this from Todo to Done in Fractal Project Management May 20, 2022
@jluethi jluethi reopened this Jun 7, 2022
Repository owner moved this from Done to Todo in Fractal Project Management Jun 7, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented Jun 7, 2022

@tcompa We should generate some test multi-fov OME-Zarr data to work on #66.

What is the easiest way we get to these, trying to run the old branch or updating this branch to parsl so we can run it in the new setup? Once we fix #66, this should become a core part of the yokogawa_to_zarr workflow, so will eventually need to be ported to the parsl tasks anyway. Depending on how much effort this is, we can do this now or once we figured out the lazy loading

@tcompa
Copy link
Collaborator

tcompa commented Jun 8, 2022

@tcompa We should generate some test multi-fov OME-Zarr data to work on #66.

Agreed, that's necessary to move forward.
I'll copy the dev-multi-fov version of the yokogawa_to_zarr task to a new yokogawa_to_zarr_multifov task (to be run within the parsl, as all the others), and then the old branch can be deleted.

@tcompa
Copy link
Collaborator

tcompa commented Jun 8, 2022

I quickly ported the multi-fov parsing tasks (there are two tasks: one to create the zarr structure, and one to write data) to main. I only tested it on a simple case (single well, 4 sites), which you find in examples/example_uzh_1_well_2x2_sites_multifov.sh (with output in /data/active/fractal/tests/Temporary_data_UZH_1_well_2x2_sites_multifov).

Let's use/adapt that script (also with other datasets), to see whether the generated zarr seems correct. If so, we can then use it as a starting point for improvements of the reader.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 8, 2022

That's great! We can use the single well, 4 sites case as our test case for loading wells lazily & loading multi-site wells when reading a plate :)

@tcompa
Copy link
Collaborator

tcompa commented Jun 9, 2022

Some additional motivation to work in this direction:

A single well from the FMI test dataset (5x4 sites, ~80 Z layers, 4 channels) takes ~36G. And at the moment several tasks (e.g. illumination correction) are loading data for the entire well. If some parameters increase just a little bit (9x8 sites, or some more channels) we will quickly run into memory errors even for just loading a single well.

(we are also looking into other options, like reading channel by channel, but with no luck at the moment)

EDIT: part of the issue here is probably due to a wrong use of dask's delayed features, but loading single sites only would still be beneficial.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 9, 2022

@tcompa We can of course use dask's delayed reading feature more in places where the downstream processing can also work with that. If we have specific spots where memory needs escalate, that's a good way to work around it.

We should still aim for using:

  1. Single site processing
  2. Single channel processing

though where possible. Single site processing will decrease memory load, as described, and also allow for stronger parallelization, especially for smaller datasets with few wells. Also, it will make it much easier to have many jobs with "low" demands running, which are probably easier to keep in check.

It will be important to be able to be channel specific as well though. For example, illumination correction should apply a different matrix to each channel (the correction profiles are slightly different from each other). And we will have experiments where we can have 60 channels instead of 3 channels.
Thus, processing should not assume it can just load all channels at once. Either we loop over channels (e.g. when saving a dataset) or we just perform something on a given channel.

This bug also shows why explicit channel handling will be important: #61

@tcompa
Copy link
Collaborator

tcompa commented Jun 9, 2022

  1. Agreed, single-site processing is the way to go (and it will require only small changes to tasks). The bottleneck here is that we cannot yet visualize the results, but once there is progress on that side it will be quite a rapid change.
  2. Single-channel processing would be useful indeed. At the moment the (minor) issue is that .zarray sits at the level level, which is one level above the channel level. Without that file, da.from_zarr doesn't like its arguments. I'm sure there should be a workaround (maybe just using the component argument will do), but we haven't tried yet.

Still, what I was observing (>60G of memory loaded into memory) was due to the presence of an intermediate .compute() for the dask arrays (within the illumination-correction task), which is to be avoided. Unfortunately at the moment removing the intermediate compute leads to other problems (related to the map_blocks function that we use to apply the correction function onto each image). This is work in progress right now, but no need to discuss it in this issue.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 9, 2022

Sounds good. Investigating the second issue about channel access will be important, because we will often want to load specific channels and the user actually needs to specify which channel they want, e.g. based on names we parse in initially. The illumination correction will change with the channels, registration will be based on specific channels, measurements will be made for specified channels only etc. If that turns out to be a larger issue, let's open a separate issue to discuss it further :)

@jluethi jluethi moved this from Todo to Backlog in Fractal Project Management Jun 13, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented Jun 14, 2022

Testing the multi-FOV examples now, the single well, 2x2 test dataset looks good! Nicely saved into 4 separate FOVs :)

@jluethi jluethi moved this from Backlog to On Hold in Fractal Project Management Jun 14, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented Jun 15, 2022

Closing this issue, as yokogawa_to_zarr_multifov solves this, now we just need to build support in the plugin to be able to switch to this workflow (and make all the parallelization and processing easier to run by site)

@jluethi jluethi closed this as completed Jun 15, 2022
Repository owner moved this from On Hold to Done in Fractal Project Management Jun 15, 2022
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022
jacopo-exact pushed a commit that referenced this issue Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion We should discuss specific design choices
Projects
None yet
Development

No branches or pull requests

4 participants