Skip to content

Commit

Permalink
Tkakar/cat 1015 document containers (#142)
Browse files Browse the repository at this point in the history
* Expanded readmes for each container

* Refined readmes

* Consistent breaks

* Fixed minor line breaks

* Updated readme with examples

* Completed read me with missing parts

* Fixed links

* Cleaned comments

* Updated readme for mudata-to-ui
  • Loading branch information
tkakar authored Jan 22, 2025
1 parent 4a6c772 commit 71e6bf1
Show file tree
Hide file tree
Showing 61 changed files with 134 additions and 424 deletions.
17 changes: 15 additions & 2 deletions containers/anndata-to-ui/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
# anndata-to-ui

This container saves [an AnnData store](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) in `zarr` format for viewing in the browser. It also
selects an approriate subset of genes to be used for visualization.
This container saves [an AnnData store](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) in `zarr` format for viewing in the browser due to it's scalability, performance, and flexibility features. It also selects an appropriate subset of genes to be used for visualization.

## Input
The input to the container is an [AnnData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).

## Output
The output is the converted `zarr` store.

## Normalization
All data from the input is scaled to [zero-mean unit-variance] (https://github.com/hubmapconsortium/salmon-rnaseq/blob/main/bin/analysis/scanpy_entry_point.py#L31-L33).
The `X` is replaced with the log-normalized raw counts to be visualized by Vitessce.

## Example
Example of a hubmap dataset using this container for data conversion would be
`https://portal.hubmapconsortium.org/browse/HBM856.HVWM.567`
4 changes: 2 additions & 2 deletions containers/anndata-to-ui/context/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ def main(input_dir, output_dir):
adata.layers[layer] = adata.layers[layer].tocsc()

# All data from secondary_analysis is scaled at the moment to zero-mean unit-variance
# https://github.com/hubmapconsortium/salmon-rnaseq/blob/master/bin/analysis/scanpy_entry_point.py#L47
# We currently cannot visaulize this in Vitessce so we replace `X` with the log-normalized raw counts:
# https://github.com/hubmapconsortium/salmon-rnaseq/blob/master/bin/analysis/scanpy_entry_point.py#L31-L33
# We currently cannot visualize this in Vitessce so we replace `X` with the log-normalized raw counts:
# https://github.com/hubmapconsortium/salmon-rnaseq/commit/9cf1dd4dbe4538b565a0355f56399d3587827eff
# Ideally, we should be able to manage the `layers` and `X` simultaneously in `zarr` but currently we cannot:
# https://github.com/theislab/anndata/issues/524
Expand Down
14 changes: 14 additions & 0 deletions containers/h5ad-to-arrow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,17 @@

This container translates [anndata's h5ad](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html) to [Apache Arrow](https://arrow.apache.org/),
as well as CSV, and Vitessce JSON which conforms to our [schemas](https://github.com/hubmapconsortium/vitessce/tree/master/src/schemas).
The arrow format is a columnar format optimized for analytical workloads like querying and aggregations and is faster than AnnData's row-based storage for certain operations.

## Input
The input to the container is an [annData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).

## Output
The output includes the converted `arrow` file, a csv file representing the arrow file for readability purposes, and json files representing cells and cell sets for Vitessce.

## Normalization
None

## Example
Example of a hubmap dataset using this container for data conversion for Vitessce (visualization) would be
`https://portal.hubmapconsortium.org/browse/HBM768.NCSB.762`
15 changes: 14 additions & 1 deletion containers/mudata-to-ui/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,17 @@

This container saves [a MuData store](https://mudata.readthedocs.io/en/latest/api/generated/mudata.read_h5mu.html#mudata.read_h5mu) in `zarr` format for viewing in the browser.

It also selects an approriate subset of genes to be used for visualization and generates genomic profiles for all the present clusterings.
It also selects an appropriate subset of genes to be used for visualization and generates genomic profiles for all the present clusters.

## Input
The input to the container is an [AnnData file in h5ad format](https://anndata.readthedocs.io/en/latest/anndata.read_h5ad.html).

## Output
The output is the converted zarr store.

## Normalization
All data from the input is scaled to [zero-mean unit-variance] (https://github.com/hubmapconsortium/multiome-rna-atac-pipeline/blob/a52b6bb37f56dcd78d45ceef1868095d59ef1aac/bin/downstream.py#L30-L37).

## Example
Example of a hubmap dataset using this container for data conversion for Vitessce (visualization) would be
`https://portal.hubmapconsortium.org/browse/dataset/845e7b1c35e8f4926e53b4ef862c0ce7`
16 changes: 15 additions & 1 deletion containers/ome-tiff-offsets/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# ome-tiff-offsets

This docker container creates a JSON list of byte offsets for each TIFF from an input directory. This makes visualization much more efficient as we can request specific IFDs and their tiles more efficiently
This docker container creates a JSON list of byte offsets for each TIFF from an input directory. This is needed for visualizing image datasets as it makes visualization much more efficient by allowing requesting specific IFDs and their tiles more efficiently.

## Input
The input to the container is one or more ome-tiff image files.

## Output
The output is a json file that includes an array/list of byte offsets for every input ome-tiff image.

## Normalization
None

## Example
Example of a hubmap dataset using this container for offsets generation for Vitessce (visualization) would be
`https://portal.hubmapconsortium.org/browse/dataset/1c7e9d1b6de5263aad5e4b95096a6ec5`.

21 changes: 20 additions & 1 deletion containers/ome-tiff-segments/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
# ome-tiff-segments
This container creates different component datasets needed for the visualization of GeoMx Assays Ome-tiff files which includes regions of interests (ROIs) and Areas of Interests (AOI). These datasets are created by converting the ome-tiff to an ome-xml file.

This container creates different component datasets needed for the visualization of GeoMx Assays Ome-tiff files.
## Input
The input to the container is the ome-tiff file from GeoMx assay.

## Output
The following output files are generated to support the visualization of GeoMx assays.

- ROIs as `obsSegmentations.json` file by extracting the vertices from Polygon tags within each ROI.

- A Segmentation OME-TIFF file extracted from the Bitmask within each ROI’s mask and grouped by the segment (text field within the mask).

- AOIs as Zarr store with obs representing the segment, roi-id, and aoi-id. The aoi-id has composite values (e.g., Shape:2), so the index is the numeric part extracted from this composite value.

- ROIs as Zarr store with obs having channel thresholds for the ROIs extracted from the annotations in ome-xml.

## Normalization
None

## Example
Example of a hubmap dataset using this container for data conversion would be `TODO`

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

Binary file not shown.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit 71e6bf1

Please sign in to comment.