Skip to content

Commit

Permalink
Merge pull request #104 from csoneson/minor-edits
Browse files Browse the repository at this point in the history
Minor edits to paper
  • Loading branch information
ajitjohnson authored May 28, 2024
2 parents 2c599db + 3b781b9 commit 989456c
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ @misc{yapp_multiplexed_2024
}

@inproceedings{wang_spatial_2007,
title = {Spatial Latent Dirichlet Allocation},
title = {{Spatial Latent Dirichlet Allocation}},
volume = {20},
url = {https://papers.nips.cc/paper_files/paper/2007/hash/ec8956637a99787bd197eacd77acce5e-Abstract.html},
abstract = {In recent years, the language model Latent Dirichlet Allocation ({LDA}), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since {LDA} assumes that a document is a bag-of-words''. It is also critical to properly designwords'' and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation ({SLDA}), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents. Instead of knowing the partition of words into documents {\textbackslash}textit\{a priori\}, the word-document assignment becomes a random hidden variable in {SLDA}. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use {SLDA} to discover objects from a collection of images, and show it achieves better performance than {LDA}.},
Expand Down
8 changes: 4 additions & 4 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,23 @@ Multiplexed imaging data are revolutionizing our understanding of the compositio

# Statement of Need

A variety of methods have been introduced for high multiplexed imaging of tissues, including MxIF, CyCIF, CODEX, 4i, mIHC, MIBI, IBEX, and IMC [@angelo_multiplexed_2014; @gerdes_highly_2013; @giesen_highly_2014; @goltsev_deep_2018; @gut_multiplexed_2018; @tsujikawa_quantitative_2017; @lin_highly_2018]; although these methods differ in their implementations, all enable the collection of single-cell data on 20-100 proteins within preserved 2D and 3D tissue microenvironments. Analysis of high-plex images typically involves joining adjacent image tiles together and aligning channels across imaging cycles (stitching and registration) to generate a composite high-plex image and then identifying the positions and boundaries of individual cells via segmentation. The intensities of individual protein antigens, stains, and other detectable molecules are then quantified on a per-cell basis. This generates a “spatial feature table” (analogous to a count table in sequencing) that can be used to identify individual cell types and states; tools from spatial statistics are then used to identify how these cells are patterned in space from scales ranging from a few cell diameters (~10 µm) to several millimeters.
A variety of methods have been introduced for highly multiplexed imaging of tissues, including MxIF, CyCIF, CODEX, 4i, mIHC, MIBI, IBEX, and IMC [@angelo_multiplexed_2014; @gerdes_highly_2013; @giesen_highly_2014; @goltsev_deep_2018; @gut_multiplexed_2018; @tsujikawa_quantitative_2017; @lin_highly_2018]; although these methods differ in their implementations, all enable the collection of single-cell data on 20-100 proteins within preserved 2D and 3D tissue microenvironments. Analysis of high-plex images typically involves joining adjacent image tiles together and aligning channels across imaging cycles (stitching and registration) to generate a composite high-plex image and then identifying the positions and boundaries of individual cells via segmentation. The intensities of individual protein antigens, stains, and other detectable molecules are then quantified on a per-cell basis. This generates a “spatial feature table” (analogous to a count table in sequencing) that can be used to identify individual cell types and states; tools from spatial statistics are then used to identify how these cells are patterned in space from scales ranging from a few cell diameters (~10 µm) to several millimeters.

Spatial feature tables provide the quantitative data for analysis of high-plex data but human inspection of the original image data remains essential. At the current state of the art, many of the critical morphological details in high-resolution images cannot be fully and accurately quantified. Segmentation is also subject to errors identifiable by humans, but not fully resolvable computationally [@baker_quality_2024]. As a consequence, computation of spatial features and relationships must be performed in combination with visualization of the underlying image data. Humans excel at identifying tissue features that correspond to classical histo-morphologies; they are also effective at discriminating foreground signals from variable background [@nirmal_cell_2023] using a process of “visual gating” (perception of high and low-intensity levels while visualizing an image). More generally, effective integration of visualization and computation enables nuanced interpretation of cellular organization in relation to established tissue architectures.

While packages such as squidpy [@palla_squidpy_2022], Giotto [@dries_giotto_2021], and Seurat [@hao_dictionary_2024] have the potential to manage multiplexed imaging data, their functionalities are primarily optimized for spatial transcriptomics data. In contrast, `SCIMAP` is specifically designed to address the unique requirements of multiplexed imaging data analysis, offering features such as image-based visual gating and the integration of prior knowledge for cellular phenotyping, among others. `SCIMAP` uses the Python-based Napari [@chiu_napari_2022; @ahlers_napari_2023] image viewer to leverage these capabilities by providing a seamless interface to inspect and annotate high-plex imaging data alongside computational analysis. For example, we have implemented an image-based gating approach that allows users to visually determine the threshold that discriminates background from a true signal at both a whole-specimen and single-cell level. Users can also select specific regions of interest (ROIs) for selective or deeper analysis. This involves drawing ROIs over images (freehand or geometric) and then selecting the underlying single cell data for further analysis. This capability is essential for incorporating histopathological information on common tissue structures (e.g., epidermis, dermis, follicles), immune structures (e.g., secondary and tertiary lymphoid structures), tumor domains (e.g., tumor center, boundary, tumor buds), and tumor grade or stage (e.g., early lesions, invasive regions, established nodules). It also allows for excluding regions affected by significant tissue loss, folding, or artifactual staining [@baker_quality_2024]. `SCIMAP` then performs statistical and spatial analyses on individual ROIs or sets of ROIs. Spatial analysis, including the measurement of distances between cells, analysis of interaction patterns, categorization into neighborhoods, and scoring of these patterns, is crucial for elucidating the cellular communications that underpin the functional aspects of the biology being studied. `SCIMAP` offers various functions to facilitate these analyses.

Lastly, a single high-plex whole slide image can exceed 100GB per image and 10$^6$ cells, necessitating optimized functions for handling large matrices and images. `SCIMAP` employs the well-established AnnData object structure, complemented by Dask and Zarr for efficient image loading in Napari. This approach facilitates seamless viewing of images with overlaid data layers, thus enabling effective analysis of large datasets. To date, `SCIMAP` has been used in the analysis of over 5 datasets from 8 tissue and cancer types [@yapp_multiplexed_2024; @nirmal_spatial_2022; @gaglia_lymphocyte_2023; @maliga_immune_2024].
Lastly, a single high-plex whole slide image can exceed 100GB and 10$^6$ cells, necessitating optimized functions for handling large matrices and images. `SCIMAP` employs the well-established AnnData object structure, complemented by Dask and Zarr for efficient image loading in Napari. This approach facilitates seamless viewing of images with overlaid data layers, thus enabling effective analysis of large datasets. To date, `SCIMAP` has been used in the analysis of over 5 datasets from 8 tissue and cancer types [@yapp_multiplexed_2024; @nirmal_spatial_2022; @gaglia_lymphocyte_2023; @maliga_immune_2024].

# Availability and Features

`SCIMAP` is available as a standalone Python package for interactive use, in Jupyter Notebook for example, or can be accessed via a command-line interface (CLI; only a subset of functions that do not require visualization) for cloud-based processing. The package can be accessed at [github](https://github.com/labsyspharm/scimap) and installed locally through pip. Installation, usage instructions, general documentation, and tutorials, are available at [https://scimap.xyz/](https://scimap.xyz/). See See \autoref{fig:workflow} for a schematic of the workflow and system components.
`SCIMAP` is available as a standalone Python package for interactive use, in Jupyter Notebooks for example, or can be accessed via a command-line interface (CLI; only a subset of functions that do not require visualization) for cloud-based processing. The package can be accessed at [GitHub](https://github.com/labsyspharm/scimap) and installed locally through pip. Installation, usage instructions, general documentation, and tutorials, are available at [https://scimap.xyz/](https://scimap.xyz/). See \autoref{fig:workflow} for a schematic of the workflow and system components.

![SCIMAP Workflow Overview. The schematic highlights data import, cell classification, spatial analysis, and visualization techniques within the SCIMAP tool box.\label{fig:workflow}](figure-workflow.png)

`SCIMAP` operates on segmented single-cell data derived from imaging data using tools such as cellpose [@stringer_cellpose_2021] or MCMICRO [@schapiro_mcmicro_2022]. The essential inputs for `SCIMAP` are: (a) a single-cell expression matrix and (b) the X and Y coordinates for each cell. Additionally, multi-stack OME-TIFF or TIFF images can be optionally provided to enable visualization of the data analysis on the original raw images.

`SCIMAP` comprises of four main modules: preprocessing, analysis tools, visualization, and external methods. The preprocessing tools include functions for normalization, batch correction, and streamlined import from cloud processing pipelines such as MCMICRO [@schapiro_mcmicro_2022]. The analysis tools offer standard single-cell analysis techniques such as dimensionality reduction, clustering, prior knowledge-based cell phenotyping (a method through which cells are classified into specific cell types based on patterns of marker expression defined by the user), and various spatial analysis tools for measuring cellular distances, identifying regions of specific cell type aggregation, and assessing statistical differences in proximity scores or interaction frequencies. `SCIMAP` also includes neighborhood detection algorithms that utilize spatial-LDA [@wang_spatial_2007] for categorical data (cell types or clusters) and spatial lag for continuous data (marker expression values). All tools within the `SCIMAP` package for spatial analysis are compatible with both 2D and 3D data. Most analysis tools come with corresponding visualization functions to plot the results effectively. Additionally, the external methods module facilitates the integration of new tools developed by the community into `SCIMAP`, further extending its utility and applicability to both 2D and 3D data.
`SCIMAP` comprises four main modules: preprocessing, analysis tools, visualization, and external methods. The preprocessing tools include functions for normalization, batch correction, and streamlined import from cloud processing pipelines such as MCMICRO [@schapiro_mcmicro_2022]. The analysis tools offer standard single-cell analysis techniques such as dimensionality reduction, clustering, prior knowledge-based cell phenotyping (a method through which cells are classified into specific cell types based on patterns of marker expression defined by the user), and various spatial analysis tools for measuring cellular distances, identifying regions of specific cell type aggregation, and assessing statistical differences in proximity scores or interaction frequencies. `SCIMAP` also includes neighborhood detection algorithms that utilize spatial-LDA [@wang_spatial_2007] for categorical data (cell types or clusters) and spatial lag for continuous data (marker expression values). All tools within the `SCIMAP` package for spatial analysis are compatible with both 2D and 3D data. Most analysis tools come with corresponding visualization functions to plot the results effectively. Additionally, the external methods module facilitates the integration of new tools developed by the community into `SCIMAP`, further extending its utility and applicability to both 2D and 3D data.

# Acknowledgements

Expand Down

0 comments on commit 989456c

Please sign in to comment.