Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance + bugfix of images and labels elements #127

Merged
merged 28 commits into from
Feb 24, 2023

Conversation

giovp
Copy link
Member

@giovp giovp commented Feb 5, 2023

This PR does the following:

Add support for coordinates in (multiscale)spatial-image.

without coordinates, it doesn't make much sense to use an xarray.DataTree cause many methods do not work across scales. The implementation is the following:

  • create a (multiscale)spatial-image from the parser
  • re-calculate coordinates based on this function

@singledispatch
def compute_coordinates(
data: Union[SpatialImage, MultiscaleSpatialImage]
) -> Union[SpatialImage, MultiscaleSpatialImage]:

  • during IO, coordinates are not saved but re-calculated at input. This makes the coordinates consistent (also across downscaling methods from spatial-image).

One important thing to take into accounts, is that the coordinats always refer to the implicit (pixel) coordinate systems. They are nonetheless useful as they unlock the power of xarray operations on data(trees).

Add support for 3d Images

This was commented out due to mixing spatial-image class. This was addressed here spatial-image/spatial-image#16 . This class is also supported in multiscale case.

Enhance schema for raster elements

In particular, it now performs correct validation of scales and re-implements the validation for "transform" being present in the right place in attrs.

It closes the issues listed here:

Open questions

One thing I'd like to address here, or potentially in separate PR, is the use of the name as attribute in (multiscale)spatial-image. It's not a specific attribute of spatial-image but of xarray dataarray or datatree. For instance, in the case of datatree it's used to access the DataArray and the desired node. E.g.

img = Image2DModel(..., scale_factors=[2, 4])
img["scale0"]["image"] # the name in this case is `image`
>>> DataArray ....

Right now, we are not very consistent about this through out repo. In particular, the transformations i think expect the fact that name="image" yet in many other parts of the repo that is not the case. Furthermore, if the user passes the name, the name is not saved (yet it is used in creating the spatial-image object).

It's also unclear how this interplay with the key of the image element in spatial data. e.g.

img = Image2DModel(..., scale_factors=[2, 4], name="myimage")
sdata =SpatialData(images={"myrealimage":img}) 
sdata.images["myrealimage"]["scale0"]["image"] # the name in this case is still `image`
>>> DataArray ....

I don't have any preference on how to handle this, but we should be consistent, so I suggest either of these two implementations:

  • either make name arbitrary, but then save and read accordingly if specified by user. Also transform should be agnostic to that.
  • do not allow to pass name, and always use the name "image" (default) across repo (and so we can avoid to save/read).

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 6, 2023

I think it's a good idea to keep coords. But one important thing, we need to understand/decide what's the relationship with transformations. Scale and translations play around well, but when rotations are involved things get dirty.

I would start with this:

  • the coordinates we save are the one for the local coordinate system. We gain no practical advantage with SpatialImage in this way, but we can use, as you said, methods that work across scales in MultiscaleSpatialImage.

We could extend to this (not urgent):

  • the implicit coordinate system doesn't need to start from 0 and have units equal to pixels. This is still supported by the ngff specs (and would address this issue: Currently considering only scale in the transformations for multiscales #125). Implications of this are:
    • when cropping or scaling sometimes we can update the xarray object instead of adding a transformation (unless the transformations are more complex like general affine, we can detect this)
    • the workflow becomes more intuitive to xarray users that already encode translations and scale into xarray objects, but at the same time the information is sometimes saved inside the transformation classes and others as coordinates (like for the cropping example just mentioned),
    • intrinsic coordinate systems become more powerful, but extrinsic coordinate systems remain important for aligning different objects, rotations and multiple transformations.

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 10, 2023

Some consideration; just found an implication on this pr.
In the nanostring cosmx dataset I need to flip each image and labels on the y-axis. I could do it with an affine transformation, or I could do it by transposing the data with dask array (it's a lazy computation so it's fine). But another way for xarray users is to flip only the coordinates (see stackoverflow).
Without this pr the last method can't be applied because coordinates are stripped. With this pr we need to adjust processing code so that it considers the coordinates (basically we apply the NGFF transformations to the coordinates when aligning things).

@LucaMarconato
Copy link
Member

Also we could decide to avoid saving the coordinates directly but converting them to NGFF into the "dataset transformations" (the translation + scale pair present at each level in the multiscale). The NGFF specs says to apply the new transformations after applying the dataset transformations, so this would be equivalent to apply the NGFF transformations to the xarray coordinates.

In both cases (saving the coordinates directly or converting them to the NGFF dataset transformations), we can only support coordinates that are equivalent to a scale and a translation. Non-linear coordinate displacements (like 0, 1, 2, 3, 10) would break the interplay with the NGFF transformations.

@giovp
Copy link
Member Author

giovp commented Feb 13, 2023

In the nanostring cosmx dataset I need to flip each image and labels on the y-axis. I could do it with an affine transformation, or I could do it by transposing the data with dask array (it's a lazy computation so it's fine). But another way for xarray users is to flip only the coordinates (see stackoverflow).

what do you need to flip them for?

Without this pr the last method can't be applied because coordinates are stripped. With this pr we need to adjust processing code so that it considers the coordinates (basically we apply the NGFF transformations to the coordinates when aligning things).

If by adjust you mean adjust IO then for sure we'll have to adapt

In both cases (saving the coordinates directly or converting them to the NGFF dataset transformations), we can only support coordinates that are equivalent to a scale and a translation. Non-linear coordinate displacements (like 0, 1, 2, 3, 10) would break the interplay with the NGFF transformations.

that's a good point. ideas could be:

  • only save the coordinate of the highest resolution pyramid, and infer the others at IO according to "scale" (I think only scale relevant for our use of multiscale image). I think this might break the roundtrip as there might be numerical differences popping up, so we'd have to maybe be less restrictive re IO ?
  • save the coordinates separately for each group. Probably best option but need to see how it interplays with ome_zarr reader/writers
  • simply use the to_zarr method of xarray and store in metadata any relevant ome-ngff metadata. I think this would be the quickest option but then we'd really drop any dependency with ome-zarr-py (except for format) which could be positive or negative in different ways.

@LucaMarconato
Copy link
Member

Answering the first 2 points

what do you need to flip them for?

The tiles where mapped to the global space in the wrong order. I flipped the y axis of each tile. Another way (better) would have been to adjust the mapping to the global space. We can do this (I write a TODO in the code)

If by adjust you mean adjust IO then for sure we'll have to adapt

io is not necessary, we need to make the processing methods (and transformations, since the processing methods can be called both on the intrinsic space and any transformed space) aware of the coordinates since we can't work anymore with pure pixel coordinates, but now only with xarray coordinates

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 13, 2023

Regarding the third part, currently the first option is what is implemented. Referring to the screenshot in this discussion, I am basically assuming that the transformation corresponding to "path": "0" in "Coordinate transformations for the multiscale representation" is translation [0, 0] and scale [1, 1].

We could save the coordinates in that slot by deriving the NGFF transformation that produces the corresponding xarray coordinates. The ones for the next multiscales can be derived from the first (or can also be computed from xarray). I save all the multiscale transformations to the file but when I read I actually load just the top one and re-derive the others: as you pointed out, it should be the same up to some numerical precision errors.

So the interpretation of the xarray coordinates is that they describe the intrinsic coordinate system (we would never use teh pixel space anymore), and the new transformation classes would always operate on the xarray coordinates.

@giovp
Copy link
Member Author

giovp commented Feb 13, 2023

So the interpretation of the xarray coordinates is that they describe the intrinsic coordinate system (we would never use teh pixel space anymore), and the new transformation classes would always operate on the xarray coordinates.

👍

@LucaMarconato
Copy link
Member

The tiles where mapped to the global space in the wrong order. I flipped the y axis of each tile. Another way (better) would have been to adjust the mapping to the global space. We can do this (I write a TODO in the code)

I checked, actually flipping the images is correct because the points are aligned with the flipped images. If change the global positioning of the fov the points become wrong.

@giovp giovp marked this pull request as ready for review February 20, 2023 09:51
@giovp giovp changed the base branch from feature/transform_ergonomics to main February 20, 2023 09:56
@giovp
Copy link
Member Author

giovp commented Feb 20, 2023

codecov in this repo is cursed.....

@giovp
Copy link
Member Author

giovp commented Feb 20, 2023

@LucaMarconato since multiscale-spatial-image there is now an explicit check for scale factors in building the multiscale (I removed the previous check here).

This check is now catching bugs here

shapes = []
for level in range(len(data)):
dims = data[f"scale{level}"].dims.values()
shape = np.array([dict(dims._mapping)[k] for k in axes if k != "c"])
shapes.append(shape)
multiscale_factors = []
shape0 = shapes[0]
for shape in shapes[1:]:
factors = shape0 / shape
factors - min(factors)
# assert np.allclose(almost_zero, np.zeros_like(almost_zero), rtol=2.)
try:
multiscale_factors.append(round(factors[0]))
except ValueError as e:
raise e
# mypy thinks that schema could be ShapesModel, PointsModel, ...

I find reading those lines quite difficult, in particular there are things like

factors = shape0 / shape
factors - min(factors)

that I don't get whether they are a bug or a leftover.
In general, there are also print that should be logg.warning and maybe a bit more comments on reasoning would be helpful.

Do you mind taking a look? I fixed 1 but there are still 3 failing

FAILED tests/_core/test_transformations_on_elements.py::test_transform_labels_spatial_multiscale_spatial_image - ValueError: Scale factor 10 is incompatible with image shape (10, 148, 148) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[full] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[labels] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.

meanwhile I'll go on and work on IO for channels with omero metadata

@giovp giovp changed the title IO for coordinates of multiscale spatial image enhance + bugfix of images and labels elements Feb 20, 2023
@giovp
Copy link
Member Author

giovp commented Feb 20, 2023

@scverse/spatialdata please refer to the header comment for the description of this PR

#127 (comment)

Copy link
Collaborator

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @giovp ! The code changes look good to me. I am not sure about the other test, but the failing bounding box query tests suggest to me that something has changed in the indexing behavior. The bounding boxes are taken using image.sel (see here). Somehow, the shape of the returned image seems off. Happy to pair on this if you'd like!

@LucaMarconato
Copy link
Member

I don't have any preference on how to handle this, but we should be consistent, so I suggest either of these two implementations:

  • either make name arbitrary, but then save and read accordingly if specified by user. Also transform should be agnostic to that.
  • do not allow to pass name, and always use the name "image" (default) across repo (and so we can avoid to save/read).

I am not sure which one I prefer, but I would start with the first approach, by making the transformations agnostic to that. I'll change the behavior.

@giovp
Copy link
Member Author

giovp commented Feb 23, 2023

I am not sure which one I prefer, but I would start with the first approach, by making the transformations agnostic to that. I'll change the behavior.

yeah I think when I try that only transformations didn't work, but IO and models shouldn't rely on that. If so I could quickly push a fix.

@LucaMarconato
Copy link
Member

@LucaMarconato since multiscale-spatial-image there is now an explicit check for scale factors in building the multiscale (I removed the previous check here).

This check is now catching bugs here

shapes = []
for level in range(len(data)):
dims = data[f"scale{level}"].dims.values()
shape = np.array([dict(dims._mapping)[k] for k in axes if k != "c"])
shapes.append(shape)
multiscale_factors = []
shape0 = shapes[0]
for shape in shapes[1:]:
factors = shape0 / shape
factors - min(factors)
# assert np.allclose(almost_zero, np.zeros_like(almost_zero), rtol=2.)
try:
multiscale_factors.append(round(factors[0]))
except ValueError as e:
raise e
# mypy thinks that schema could be ShapesModel, PointsModel, ...

I find reading those lines quite difficult, in particular there are things like

factors = shape0 / shape
factors - min(factors)

that I don't get whether they are a bug or a leftover. In general, there are also print that should be logg.warning and maybe a bit more comments on reasoning would be helpful.

Do you mind taking a look? I fixed 1 but there are still 3 failing

FAILED tests/_core/test_transformations_on_elements.py::test_transform_labels_spatial_multiscale_spatial_image - ValueError: Scale factor 10 is incompatible with image shape (10, 148, 148) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[full] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[labels] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.

meanwhile I'll go on and work on IO for channels with omero metadata

Yeah the code was very weird, I removed it altogether. Now I do the transformation for each element of the multiscale and then I assemble the MultiscaleSpatialImage object back together.

@codecov
Copy link

codecov bot commented Feb 23, 2023

Codecov Report

Merging #127 (7a9f140) into main (4d9058c) will increase coverage by 0.06%.
The diff coverage is 85.90%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
+ Coverage   86.75%   86.82%   +0.06%     
==========================================
  Files          23       23              
  Lines        3277     3377     +100     
==========================================
+ Hits         2843     2932      +89     
- Misses        434      445      +11     
Impacted Files Coverage Δ
spatialdata/_compat.py 100.00% <ø> (ø)
spatialdata/_io/format.py 86.13% <75.00%> (-2.10%) ⬇️
spatialdata/_core/models.py 85.32% <77.41%> (-0.78%) ⬇️
spatialdata/utils.py 76.41% <85.00%> (-1.85%) ⬇️
spatialdata/_core/core_utils.py 91.35% <88.17%> (-1.04%) ⬇️
spatialdata/_io/write.py 97.31% <88.88%> (+1.26%) ⬆️
spatialdata/_core/_transform_elements.py 87.87% <91.30%> (+0.05%) ⬆️
spatialdata/_core/_spatial_query.py 77.21% <100.00%> (ø)
spatialdata/_io/read.py 99.36% <100.00%> (+1.89%) ⬆️
spatialdata/_core/transformations.py 93.92% <0.00%> (+0.18%) ⬆️
... and 2 more

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 23, 2023

@giovp I reviewed the code and fixed the tests. There is only a comment that could require changes (the one about the name of the datatree nodes). Or we can also just merge and open an issue about that.

@giovp
Copy link
Member Author

giovp commented Feb 24, 2023

@LucaMarconato I'll merge this, now the default name for the dataarray in the datatree is not what we decided as im but whatever defaults on spatial-image which is "image"

i think the accessor would be really nice to have and will open issue in spatial-image

@giovp giovp merged commit 38c661c into main Feb 24, 2023
@giovp giovp deleted the models/images/add-coordinates branch February 24, 2023 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants