Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tkakar/cat 1015 document containers #142

Merged
merged 9 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions containers/anndata-to-ui/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
# anndata-to-ui

This container saves [an AnnData store](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) in `zarr` format for viewing in the browser. It also
selects an approriate subset of genes to be used for visualization.
This container saves [an AnnData store](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) in `zarr` format for viewing in the browser due to it's scalability, performance, and flexibility features. It also selects an appropriate subset of genes to be used for visualization.

## Input
The input to the container is an [AnnData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).


## Output
The output is the converted `zarr` store.


## Normalization
All data from the input is scaled to [zero-mean unit-variance] (https://github.com/hubmapconsortium/salmon-rnaseq/blob/master/bin/analysis/scanpy_entry_point.py#L47) `TODO: update line number in the link`.
The `X` is replaced with the log-normalized raw counts to be visualized by Vitessce.

## Example
Example of a hubmap dataset using this container for data conversion would be `HBM856.HVWM.567`
tkakar marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion containers/anndata-to-ui/context/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def main(input_dir, output_dir):

# All data from secondary_analysis is scaled at the moment to zero-mean unit-variance
# https://github.com/hubmapconsortium/salmon-rnaseq/blob/master/bin/analysis/scanpy_entry_point.py#L47
# We currently cannot visaulize this in Vitessce so we replace `X` with the log-normalized raw counts:
# We currently cannot visualize this in Vitessce so we replace `X` with the log-normalized raw counts:
# https://github.com/hubmapconsortium/salmon-rnaseq/commit/9cf1dd4dbe4538b565a0355f56399d3587827eff
# Ideally, we should be able to manage the `layers` and `X` simultaneously in `zarr` but currently we cannot:
# https://github.com/theislab/anndata/issues/524
Expand Down
15 changes: 15 additions & 0 deletions containers/h5ad-to-arrow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,18 @@

This container translates [anndata's h5ad](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) to [Apache Arrow](https://arrow.apache.org/),
as well as CSV, and Vitessce JSON which conforms to our [schemas](https://github.com/hubmapconsortium/vitessce/tree/master/src/schemas).
The arrow format is a columnar format optimized for analytical workloads like querying and aggregations and is faster than AnnData's row-based storage for certain operations.

## Input
The input to the container is an [annData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).


## Output
The output includes the converted `arrow` file, a csv file representing the arrow file for readability purposes, and json files representing cells and cell sets for Vitessce.


## Normalization
None

## Example
Example of a hubmap dataset using this container for data conversion for Vitessce (visualization) would be `HBM768.NCSB.762`
15 changes: 14 additions & 1 deletion containers/mudata-to-ui/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,17 @@

This container saves [a MuData store](https://mudata.readthedocs.io/en/latest/api/generated/mudata.read_h5mu.html#mudata.read_h5mu) in `zarr` format for viewing in the browser.

It also selects an approriate subset of genes to be used for visualization and generates genomic profiles for all the present clusterings.
It also selects an appropriate subset of genes to be used for visualization and generates genomic profiles for all the present clusters.

## Input
The input to the container is an [AnnData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).

## Output
The output is the converted zarr store.

## Normalization
All data from the input is scaled to [zero-mean unit-variance] (https://github.com/hubmapconsortium/salmon-rnaseq/blob/master/bin/analysis/scanpy_entry_point.py#L47) `TODO: update line number in the link`.
The `X` is replaced with the log-normalized raw counts to be visualized by Vitessce.

## Example
Example of a hubmap dataset using this container for data conversion for Vitessce (visualization) would be `HBM768.NCSB.762`
17 changes: 16 additions & 1 deletion containers/ome-tiff-offsets/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# ome-tiff-offsets

This docker container creates a JSON list of byte offsets for each TIFF from an input directory. This makes visualization much more efficient as we can request specific IFDs and their tiles more efficiently
This docker container creates a JSON list of byte offsets for each TIFF from an input directory. This is needed for visualizing image datasets as it makes visualization much more efficient by allowing requesting specific IFDs and their tiles more efficiently.

## Input
The input to the container is one or more ome-tiff image files.


## Output
The output is a json file that includes an array/list of byte offsets for every input ome-tiff image.


## Normalization
None

## Example
Example of a hubmap dataset using this container for offsets generation for Vitessce (visualization) would be `HBM974.DMWR.753`.

21 changes: 20 additions & 1 deletion containers/ome-tiff-segments/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
# ome-tiff-segments
This container creates different component datasets needed for the visualization of GeoMx Assays Ome-tiff files which includes regions of interests (ROIs) and Areas of Interests (AOI). These datasets are created by converting the ome-tiff to an ome-xml file.

This container creates different component datasets needed for the visualization of GeoMx Assays Ome-tiff files.
## Input
The input to the container is the ome-tiff file from GeoMx assay.

## Output
The following output files are generated to support the visualization of GeoMx assays.

- ROIs as `obsSegmentations.json` file by extracting the vertices from Polygon tags within each ROI.

- A Segmentation OME-TIFF file extracted from the Bitmask within each ROI’s mask and grouped by the segment (text field within the mask).

- AOIs as Zarr store with obs representing the segment, roi-id, and aoi-id. The aoi-id has composite values (e.g., Shape:2), so the index is the numeric part extracted from this composite value.

- ROIs as Zarr store with obs having channel thresholds for the ROIs extracted from the annotations in ome-xml.

## Normalization
None

## Example
Example of a hubmap dataset using this container for data conversion would be `Not ingested yet`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test folder had slipped into the last PR, so removed it here.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading