enhance + bugfix of images and labels elements #127

giovp · 2023-02-05T20:38:25Z

This PR does the following:

Add support for coordinates in `(multiscale)spatial-image`.

without coordinates, it doesn't make much sense to use an xarray.DataTree cause many methods do not work across scales. The implementation is the following:

create a (multiscale)spatial-image from the parser
re-calculate coordinates based on this function

spatialdata/spatialdata/_core/core_utils.py

Lines 307 to 310 in f65151e

    
           @singledispatch 
        
           def compute_coordinates( 
        
               data: Union[SpatialImage, MultiscaleSpatialImage] 
        
           ) -> Union[SpatialImage, MultiscaleSpatialImage]:

during IO, coordinates are not saved but re-calculated at input. This makes the coordinates consistent (also across downscaling methods from spatial-image).

One important thing to take into accounts, is that the coordinats always refer to the implicit (pixel) coordinate systems. They are nonetheless useful as they unlock the power of xarray operations on data(trees).

Add support for 3d Images

This was commented out due to mixing spatial-image class. This was addressed here spatial-image/spatial-image#16 . This class is also supported in multiscale case.

Enhance schema for raster elements

In particular, it now performs correct validation of scales and re-implements the validation for "transform" being present in the right place in attrs.

It closes the issues listed here:

Open questions

One thing I'd like to address here, or potentially in separate PR, is the use of the name as attribute in (multiscale)spatial-image. It's not a specific attribute of spatial-image but of xarray dataarray or datatree. For instance, in the case of datatree it's used to access the DataArray and the desired node. E.g.

img = Image2DModel(..., scale_factors=[2, 4])
img["scale0"]["image"] # the name in this case is `image`
>>> DataArray ....

Right now, we are not very consistent about this through out repo. In particular, the transformations i think expect the fact that name="image" yet in many other parts of the repo that is not the case. Furthermore, if the user passes the name, the name is not saved (yet it is used in creating the spatial-image object).

It's also unclear how this interplay with the key of the image element in spatial data. e.g.

img = Image2DModel(..., scale_factors=[2, 4], name="myimage")
sdata =SpatialData(images={"myrealimage":img}) 
sdata.images["myrealimage"]["scale0"]["image"] # the name in this case is still `image`
>>> DataArray ....

I don't have any preference on how to handle this, but we should be consistent, so I suggest either of these two implementations:

either make name arbitrary, but then save and read accordingly if specified by user. Also transform should be agnostic to that.
do not allow to pass name, and always use the name "image" (default) across repo (and so we can avoid to save/read).

LucaMarconato · 2023-02-06T11:04:55Z

I think it's a good idea to keep coords. But one important thing, we need to understand/decide what's the relationship with transformations. Scale and translations play around well, but when rotations are involved things get dirty.

I would start with this:

the coordinates we save are the one for the local coordinate system. We gain no practical advantage with SpatialImage in this way, but we can use, as you said, methods that work across scales in MultiscaleSpatialImage.

We could extend to this (not urgent):

the implicit coordinate system doesn't need to start from 0 and have units equal to pixels. This is still supported by the ngff specs (and would address this issue: Currently considering only scale in the transformations for multiscales #125). Implications of this are:
- when cropping or scaling sometimes we can update the xarray object instead of adding a transformation (unless the transformations are more complex like general affine, we can detect this)
- the workflow becomes more intuitive to xarray users that already encode translations and scale into xarray objects, but at the same time the information is sometimes saved inside the transformation classes and others as coordinates (like for the cropping example just mentioned),
- intrinsic coordinate systems become more powerful, but extrinsic coordinate systems remain important for aligning different objects, rotations and multiple transformations.

LucaMarconato · 2023-02-10T21:38:42Z

Some consideration; just found an implication on this pr.
In the nanostring cosmx dataset I need to flip each image and labels on the y-axis. I could do it with an affine transformation, or I could do it by transposing the data with dask array (it's a lazy computation so it's fine). But another way for xarray users is to flip only the coordinates (see stackoverflow).
Without this pr the last method can't be applied because coordinates are stripped. With this pr we need to adjust processing code so that it considers the coordinates (basically we apply the NGFF transformations to the coordinates when aligning things).

LucaMarconato · 2023-02-10T21:44:46Z

Also we could decide to avoid saving the coordinates directly but converting them to NGFF into the "dataset transformations" (the translation + scale pair present at each level in the multiscale). The NGFF specs says to apply the new transformations after applying the dataset transformations, so this would be equivalent to apply the NGFF transformations to the xarray coordinates.

In both cases (saving the coordinates directly or converting them to the NGFF dataset transformations), we can only support coordinates that are equivalent to a scale and a translation. Non-linear coordinate displacements (like 0, 1, 2, 3, 10) would break the interplay with the NGFF transformations.

giovp · 2023-02-13T14:26:42Z

In the nanostring cosmx dataset I need to flip each image and labels on the y-axis. I could do it with an affine transformation, or I could do it by transposing the data with dask array (it's a lazy computation so it's fine). But another way for xarray users is to flip only the coordinates (see stackoverflow).

what do you need to flip them for?

Without this pr the last method can't be applied because coordinates are stripped. With this pr we need to adjust processing code so that it considers the coordinates (basically we apply the NGFF transformations to the coordinates when aligning things).

If by adjust you mean adjust IO then for sure we'll have to adapt

In both cases (saving the coordinates directly or converting them to the NGFF dataset transformations), we can only support coordinates that are equivalent to a scale and a translation. Non-linear coordinate displacements (like 0, 1, 2, 3, 10) would break the interplay with the NGFF transformations.

that's a good point. ideas could be:

only save the coordinate of the highest resolution pyramid, and infer the others at IO according to "scale" (I think only scale relevant for our use of multiscale image). I think this might break the roundtrip as there might be numerical differences popping up, so we'd have to maybe be less restrictive re IO ?
save the coordinates separately for each group. Probably best option but need to see how it interplays with ome_zarr reader/writers
simply use the to_zarr method of xarray and store in metadata any relevant ome-ngff metadata. I think this would be the quickest option but then we'd really drop any dependency with ome-zarr-py (except for format) which could be positive or negative in different ways.

LucaMarconato · 2023-02-13T14:30:44Z

Answering the first 2 points

what do you need to flip them for?

The tiles where mapped to the global space in the wrong order. I flipped the y axis of each tile. Another way (better) would have been to adjust the mapping to the global space. We can do this (I write a TODO in the code)

If by adjust you mean adjust IO then for sure we'll have to adapt

io is not necessary, we need to make the processing methods (and transformations, since the processing methods can be called both on the intrinsic space and any transformed space) aware of the coordinates since we can't work anymore with pure pixel coordinates, but now only with xarray coordinates

LucaMarconato · 2023-02-13T14:39:42Z

Regarding the third part, currently the first option is what is implemented. Referring to the screenshot in this discussion, I am basically assuming that the transformation corresponding to "path": "0" in "Coordinate transformations for the multiscale representation" is translation [0, 0] and scale [1, 1].

We could save the coordinates in that slot by deriving the NGFF transformation that produces the corresponding xarray coordinates. The ones for the next multiscales can be derived from the first (or can also be computed from xarray). I save all the multiscale transformations to the file but when I read I actually load just the top one and re-derive the others: as you pointed out, it should be the same up to some numerical precision errors.

So the interpretation of the xarray coordinates is that they describe the intrinsic coordinate system (we would never use teh pixel space anymore), and the new transformation classes would always operate on the xarray coordinates.

giovp · 2023-02-13T15:15:30Z

So the interpretation of the xarray coordinates is that they describe the intrinsic coordinate system (we would never use teh pixel space anymore), and the new transformation classes would always operate on the xarray coordinates.

👍

LucaMarconato · 2023-02-13T19:01:01Z

The tiles where mapped to the global space in the wrong order. I flipped the y axis of each tile. Another way (better) would have been to adjust the mapping to the global space. We can do this (I write a TODO in the code)

I checked, actually flipping the images is correct because the points are aligned with the flipped images. If change the global positioning of the fov the points become wrong.

giovp · 2023-02-20T10:23:34Z

codecov in this repo is cursed.....

giovp · 2023-02-20T10:49:36Z

@LucaMarconato since multiscale-spatial-image there is now an explicit check for scale factors in building the multiscale (I removed the previous check here).

This check is now catching bugs here

spatialdata/spatialdata/_core/_transform_elements.py

Lines 172 to 187 in 238fb8c

    
           shapes = [] 
        
           for level in range(len(data)): 
        
               dims = data[f"scale{level}"].dims.values() 
        
               shape = np.array([dict(dims._mapping)[k] for k in axes if k != "c"]) 
        
               shapes.append(shape) 
        
           multiscale_factors = [] 
        
           shape0 = shapes[0] 
        
           for shape in shapes[1:]: 
        
               factors = shape0 / shape 
        
               factors - min(factors) 
        
               # assert np.allclose(almost_zero, np.zeros_like(almost_zero), rtol=2.) 
        
               try: 
        
                   multiscale_factors.append(round(factors[0])) 
        
               except ValueError as e: 
        
                   raise e 
        
           # mypy thinks that schema could be ShapesModel, PointsModel, ...

I find reading those lines quite difficult, in particular there are things like

factors = shape0 / shape
factors - min(factors)

that I don't get whether they are a bug or a leftover.
In general, there are also print that should be logg.warning and maybe a bit more comments on reasoning would be helpful.

Do you mind taking a look? I fixed 1 but there are still 3 failing

FAILED tests/_core/test_transformations_on_elements.py::test_transform_labels_spatial_multiscale_spatial_image - ValueError: Scale factor 10 is incompatible with image shape (10, 148, 148) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[full] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[labels] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.

meanwhile I'll go on and work on IO for channels with omero metadata

for more information, see https://pre-commit.ci

giovp · 2023-02-20T17:42:07Z

@scverse/spatialdata please refer to the header comment for the description of this PR

#127 (comment)

kevinyamauchi

Thanks, @giovp ! The code changes look good to me. I am not sure about the other test, but the failing bounding box query tests suggest to me that something has changed in the indexing behavior. The bounding boxes are taken using image.sel (see here). Somehow, the shape of the returned image seems off. Happy to pair on this if you'd like!

LucaMarconato · 2023-02-23T14:30:50Z

I don't have any preference on how to handle this, but we should be consistent, so I suggest either of these two implementations:

either make name arbitrary, but then save and read accordingly if specified by user. Also transform should be agnostic to that.

do not allow to pass name, and always use the name "image" (default) across repo (and so we can avoid to save/read).

I am not sure which one I prefer, but I would start with the first approach, by making the transformations agnostic to that. I'll change the behavior.

giovp · 2023-02-23T15:59:10Z

I am not sure which one I prefer, but I would start with the first approach, by making the transformations agnostic to that. I'll change the behavior.

yeah I think when I try that only transformations didn't work, but IO and models shouldn't rely on that. If so I could quickly push a fix.

LucaMarconato · 2023-02-23T20:44:42Z

@LucaMarconato since multiscale-spatial-image there is now an explicit check for scale factors in building the multiscale (I removed the previous check here).

This check is now catching bugs here

spatialdata/spatialdata/_core/_transform_elements.py

Lines 172 to 187 in 238fb8c

shapes = []

for level in range(len(data)):

dims = data[f"scale{level}"].dims.values()

shape = np.array([dict(dims._mapping)[k] for k in axes if k != "c"])

shapes.append(shape)

multiscale_factors = []

shape0 = shapes[0]

for shape in shapes[1:]:

factors = shape0 / shape

factors - min(factors)

# assert np.allclose(almost_zero, np.zeros_like(almost_zero), rtol=2.)

try:

multiscale_factors.append(round(factors[0]))

except ValueError as e:

raise e

# mypy thinks that schema could be ShapesModel, PointsModel, ...

I find reading those lines quite difficult, in particular there are things like
factors = shape0 / shape
factors - min(factors)
that I don't get whether they are a bug or a leftover. In general, there are also print that should be logg.warning and maybe a bit more comments on reasoning would be helpful.

Do you mind taking a look? I fixed 1 but there are still 3 failing
FAILED tests/_core/test_transformations_on_elements.py::test_transform_labels_spatial_multiscale_spatial_image - ValueError: Scale factor 10 is incompatible with image shape (10, 148, 148) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[full] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.
FAILED tests/_core/test_transformations_on_elements.py::test_transform_elements_and_entire_spatial_data_object[labels] - ValueError: Scale factor 10 is incompatible with image shape (10, 64, 128) along dimension `z`.
meanwhile I'll go on and work on IO for channels with omero metadata

Yeah the code was very weird, I removed it altogether. Now I do the transformation for each element of the multiscale and then I assemble the MultiscaleSpatialImage object back together.

codecov · 2023-02-23T20:47:28Z

Codecov Report

Merging #127 (7a9f140) into main (4d9058c) will increase coverage by 0.06%.
The diff coverage is 85.90%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
+ Coverage   86.75%   86.82%   +0.06%     
==========================================
  Files          23       23              
  Lines        3277     3377     +100     
==========================================
+ Hits         2843     2932      +89     
- Misses        434      445      +11

Impacted Files	Coverage Δ
spatialdata/_compat.py	`100.00% <ø> (ø)`
spatialdata/_io/format.py	`86.13% <75.00%> (-2.10%)`	⬇️
spatialdata/_core/models.py	`85.32% <77.41%> (-0.78%)`	⬇️
spatialdata/utils.py	`76.41% <85.00%> (-1.85%)`	⬇️
spatialdata/_core/core_utils.py	`91.35% <88.17%> (-1.04%)`	⬇️
spatialdata/_io/write.py	`97.31% <88.88%> (+1.26%)`	⬆️
spatialdata/_core/_transform_elements.py	`87.87% <91.30%> (+0.05%)`	⬆️
spatialdata/_core/_spatial_query.py	`77.21% <100.00%> (ø)`
spatialdata/_io/read.py	`99.36% <100.00%> (+1.89%)`	⬆️
spatialdata/_core/transformations.py	`93.92% <0.00%> (+0.18%)`	⬆️
... and 2 more

LucaMarconato · 2023-02-23T20:53:19Z

@giovp I reviewed the code and fixed the tests. There is only a comment that could require changes (the one about the name of the datatree nodes). Or we can also just merge and open an issue about that.

giovp · 2023-02-24T11:59:38Z

@LucaMarconato I'll merge this, now the default name for the dataarray in the datatree is not what we decided as im but whatever defaults on spatial-image which is "image"

i think the accessor would be really nice to have and will open issue in spatial-image

don't remove coords

86665b7

LucaMarconato mentioned this pull request Feb 6, 2023

Implemenet raster multiscale case for bounding box query #128

Closed

giovp added 11 commits February 17, 2023 15:25

update iter multiscale

632c112

merge

0410cea

update spatial-image and multiscale-spatial_image version

6e4a050

remove check that is now in multiscale spatial image

6565b12

add coordinates assignment

4833e63

add coordinates to parser

2d8bbd2

Merge branch 'main' into models/images/add-coordinates

22b18a9

fix tests

c99c5dd

update tests for 3d

f562a3b

add tests for labels

aef337b

update shapely

0dc0459

giovp marked this pull request as ready for review February 20, 2023 09:51

giovp changed the base branch from feature/transform_ergonomics to main February 20, 2023 09:56

giovp added 3 commits February 20, 2023 10:57

add some comments

97f8d7e

improve validation per #115

90e0cc3

remove multiscale_factors and add sclae_factors

6140832

try fixing tests

238fb8c

updates

c67435a

pre-commit-ci bot and others added 4 commits February 20, 2023 12:03

[pre-commit.ci] auto fixes from pre-commit.com hooks

3240659

for more information, see https://pre-commit.ci

fix mypy

55c0e7c

add channels to IO

4214abc

update omero exlcuding it from labels

f65151e

giovp changed the title ~~IO for coordinates of multiscale spatial image~~ enhance + bugfix of images and labels elements Feb 20, 2023

giovp requested review from LucaMarconato and kevinyamauchi February 20, 2023 17:41

kevinyamauchi reviewed Feb 22, 2023

View reviewed changes

use isel

448da01

LucaMarconato added 2 commits February 23, 2023 15:50

read name from raster data

51d865d

Merge branch 'main' into models/images/add-coordinates

697eae1

LucaMarconato added 2 commits February 23, 2023 18:40

fix unpad_raster()

5245259

all tests passing

0a752fe

LucaMarconato and others added 2 commits February 23, 2023 21:56

fixed precommit

f20582a

remove reference to name

7a9f140

giovp merged commit 38c661c into main Feb 24, 2023

giovp deleted the models/images/add-coordinates branch February 24, 2023 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance + bugfix of images and labels elements #127

enhance + bugfix of images and labels elements #127

giovp commented Feb 5, 2023 •

edited

Loading

LucaMarconato commented Feb 6, 2023 •

edited

Loading

LucaMarconato commented Feb 10, 2023 •

edited

Loading

LucaMarconato commented Feb 10, 2023

giovp commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023 •

edited

Loading

giovp commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023

giovp commented Feb 20, 2023

giovp commented Feb 20, 2023 •

edited

Loading

giovp commented Feb 20, 2023

kevinyamauchi left a comment •

edited

Loading

LucaMarconato commented Feb 23, 2023

giovp commented Feb 23, 2023 •

edited

Loading

LucaMarconato commented Feb 23, 2023

codecov bot commented Feb 23, 2023 •

edited

Loading

LucaMarconato commented Feb 23, 2023 •

edited

Loading

giovp commented Feb 24, 2023 •

edited

Loading

	@singledispatch
	def compute_coordinates(
	data: Union[SpatialImage, MultiscaleSpatialImage]
	) -> Union[SpatialImage, MultiscaleSpatialImage]:

enhance + bugfix of images and labels elements #127

enhance + bugfix of images and labels elements #127

Conversation

giovp commented Feb 5, 2023 • edited Loading

Add support for coordinates in (multiscale)spatial-image.

Add support for 3d Images

Enhance schema for raster elements

Open questions

LucaMarconato commented Feb 6, 2023 • edited Loading

LucaMarconato commented Feb 10, 2023 • edited Loading

LucaMarconato commented Feb 10, 2023

giovp commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023 • edited Loading

giovp commented Feb 13, 2023

LucaMarconato commented Feb 13, 2023

giovp commented Feb 20, 2023

giovp commented Feb 20, 2023 • edited Loading

giovp commented Feb 20, 2023

kevinyamauchi left a comment • edited Loading

Choose a reason for hiding this comment

LucaMarconato commented Feb 23, 2023

giovp commented Feb 23, 2023 • edited Loading

LucaMarconato commented Feb 23, 2023

codecov bot commented Feb 23, 2023 • edited Loading

Codecov Report

LucaMarconato commented Feb 23, 2023 • edited Loading

giovp commented Feb 24, 2023 • edited Loading

giovp commented Feb 5, 2023 •

edited

Loading

Add support for coordinates in `(multiscale)spatial-image`.

LucaMarconato commented Feb 6, 2023 •

edited

Loading

LucaMarconato commented Feb 10, 2023 •

edited

Loading

LucaMarconato commented Feb 13, 2023 •

edited

Loading

giovp commented Feb 20, 2023 •

edited

Loading

kevinyamauchi left a comment •

edited

Loading

giovp commented Feb 23, 2023 •

edited

Loading

codecov bot commented Feb 23, 2023 •

edited

Loading

LucaMarconato commented Feb 23, 2023 •

edited

Loading

giovp commented Feb 24, 2023 •

edited

Loading