Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpatialData lacks Observation Window Metadata #458

Closed
MobiTobi opened this issue Feb 15, 2024 · 7 comments
Closed

SpatialData lacks Observation Window Metadata #458

MobiTobi opened this issue Feb 15, 2024 · 7 comments

Comments

@MobiTobi
Copy link
Collaborator

tl;dr:
Observation windows are a crucial piece of spatial metadata.
It should be possible to associate SpatialData objects and elements with observation windows.
Polygon elements can describe observation windows, but violate FAIR principles.


Before I provide two examples of why observation window metadata is crucial for the analysis of spatial data, I want to point out that other communities learned this lesson a long time ago 1.

[The Observation Window] is an essential component of a point pattern object

Let's avoid the pitfalls they stumbled into a long time ago and steal their wisdom  :^)

No Data != No Measurement

The observation window allows us to distinguish between the absence of measured data points and the absence of measurement.

For example let's try to describe the density of a bunch of cells.
The results based on the area of the data bounding box or the area of the microscopy slide will yield very different results:
bb2

Bounding Boxes estimated from data are ill defined

Without an observation window users can fall back on spatialdata.get_extend to estimate a bounding box from the data.
Unfortunately the results depend on the coordinate system:
bb

Workarounds

With the current implementation I see two possible workarounds both suffering from the same major drawbacks:

  • Coordinate transformations applied to the data are not automatically applied to the observation window, thereby negating the main strength of spatialdata
  • unexpected metadata structure which clashes with the F in F.A.I.R.

In the first workaround users can store observations windows as a polygon element.
Does the polygon apply the complete spatialdata object?
Or just to one or multiple elements? And if so to which elements?

Alternatively users can store anything in the .uns attribute of the element table resulting in a clear association between observation window and element but a bad user interface.

What can we do about it?

The obvious solution would be to add a window or observation_window attribute to spatialdata objects and their elements.
This would be easy to use and obey the "findable" requirement.

For now we should give users a heads-up about observation windows in the user manual.
The spatialdata target audience are biologists without formal training in spatial analysis.
For me it would not have been obvious at all to explicitly mark the observation window before sharing data with others.

What do you think about making observation windows first class metadata in spatialdata?

Do you have a good idea of how we could get there?

Footnotes

  1. Baddeley, Rubak, Turner. Spatial point patterns: methodology and applications with R, CRC press, 2015.

@kevinyamauchi
Copy link
Collaborator

Hello @MobiTobi ! Thank you for writing out this issue. I have a few clarifying questions:

  • Is your primary concern that there should be some standardized metadata that specifies which polygons are "observation windows"?
  • How do you imagine the user using the "observation windows"? Are there details about it that are tied to specific analyses? What information is required for an "observation window" to be useful?
  • Is there an implementation of the "observation window" that you like in another library/format that you can link to? It would be helpful to have an example to better understand the intent/use case.

In the first workaround users can store observations windows as a polygon element.
Does the polygon apply the complete spatialdata object?
Or just to one or multiple elements? And if so to which elements?

I'm not sure I understand these questions. Can you please clarify?

@MobiTobi
Copy link
Collaborator Author

Is your primary concern that there should be some standardized metadata that specifies which polygons are "observation windows"?

I think that there should be standardized observation window metadata for elements.
It signifies where the element is defined if you want to think of it as a map from R2 to some value.

Is there an implementation of the "observation window" that you like in another library/format that you can link to?

A shapely Polygon or MultiPolygon would be enough to model

Can you please clarify?

Sure :) I see how I could have phrased it better.

With the questions I wanted to point out the ambiguities that come with storing windows on the same level as the observation data.
Let's say you read a spatialdata object from a supplementary file, collaborator or data repository.
They included one or multiple shape elements which annotate the observation window(s).
Which shape describes which element? Without outside information you have to resort to guesswork.

How do you imagine the user using the "observation windows"?
Are there details about it that are tied to specific analyses?

It tells which subset of plane is the domain where the data was observed and valid to include in subsequent analyses. The information needs to already come from the experimentalists. It's basically an outline of the measurement area with holes for invalid regions.

For many spatial omics datasets the observation window is obvious.
For example the [codeluppi osmFISH] (https://www.nature.com/articles/s41592-018-0175-z/figures/1) data shows clear tiling and you can intuit which part is outside and which part is inside the observation window.
But even for the very clear codeluppi data you can fall in the trap of including the stripped area (marked with a star) in your analysis before seeing that it behaves quite different from the rest of the data before you exclude it manually.

In a perfect world everyone uses spatialdata :^) and a mistake like that will be impossible, because the metadata makes it obvious that it is outside of the regular osmFISH measurements.

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 19, 2024

Hi @MobiTobi, thanks for reporting this and for the explanation. Unfortunately the data from many commercial technologies doesn't come with clear information on the observation window, so I would not add this as a standardized field. I think instead the information on the observation window should live at a different layer, more focused on metadata and qc, than the storage format used by SpatialData.

@melonora is working on this direction, so this is something that can be considered in the future in that context.

@LucaMarconato LucaMarconato added wontfix This will not be worked on and removed wontfix This will not be worked on labels Feb 19, 2024
@LucaMarconato
Copy link
Member

A few more comments.

Without an observation window users can fall back on spatialdata.get_extend to estimate a bounding box from the data.
Unfortunately the results depend on the coordinate system:

As you observed, the get_extent() is dependent on the coordinate system, and this is particularly pronounced when rotations are involved. So if an observation window is needed, this should indeed not be inferred from get_extent().

I'd also consider discussing this in the context of the NGFF data specification. Please see here two related discussions (even if not exactly on this topic): ome/ngff#31 ome/ngff#133.

@melonora
Copy link
Collaborator

Hi @MobiTobi,

As @LucaMarconato mentioned I am indeed working in this direction. Particularly I am working on schemas that would allow to extract this information and to put it into for example a SQL database. There are a couple of things that are required to make this work in a truly FAIR manner. If you would like I would be happy to set up a call to discuss.

@aeisenbarth
Copy link
Contributor

We currently use shapes elements for the regions of measurement (or regions of interest), annotated with IDs in the table, which other elements then reference. Raster images inheritently have bounds, but also there the relevant measurement is often only within a region of interest.

  • If NGFF provides a way to reference elements (path to another "shapes" element and index to a single shape item) we would be happy to use that.

  • If such metadata is kept as extra properties like AnnData uns, there are some pecularities: Subset/query/resampling operations may have to update metadata to keep it consistent. (Photo manipulation software frequently mess up region of interest and panorama metadata.) And for aggregation, merging of uns needs specific handling.

@LucaMarconato
Copy link
Member

Thank you for the discussion. I will close the issue as developing a specification for the observation window is not in scope of spatialdata (but would probably fall within the NGFF scope, so you are encouraged to continue the discussion there https://github.com/ome/ngff).

@LucaMarconato LucaMarconato closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants