Skip to content

Scripts and notebooks to reproduce all analysis from Massoni-Badosa et al. (2022)

License

Notifications You must be signed in to change notification settings

Single-Cell-Genomics-Group-CNAG-CRG/TonsilAtlas

Repository files navigation

An Atlas of Cells in the Human Tonsil

Palatine tonsils are secondary lymphoid organs (SLOs) representing the first line of immunological defense against inhaled or ingested pathogens. We generated an atlas of the human tonsil composed of >556,000 cells profiled across five different data modalities, including single-cell transcriptome, epigenome, proteome, and immune repertoire sequencing, as well as spatial transcriptomics. This census identified 121 cell types and states, defined developmental trajectories, and enabled an understanding of the functional units of the tonsil. Exemplarily, we stratified myeloid slan-like subtypes, established a BCL6 enhancer as locally active in follicle-associated T and B cells, and identified SIX5 as putative transcriptional regulator of plasma cell maturation. Analyses of a validation cohort confirmed the presence, annotation, and markers of tonsillar cell types and provided evidence of age-related compositional shifts. We demonstrate the value of this resource by annotating cells from B cell-derived mantle cell lymphomas, linking transcriptional heterogeneity to normal B cell differentiation states of the human tonsil.

This repository contains all the scripts, notebooks and reports to reproduce all analysis from our manuscript entitled An Atlas of Cells in the Human Tonsil, published in Immunity in 2024. Here, we describe how to access the data, document the most important packages and versions used, and explain how to navigate the directories and files in this repository.

Data

The data has been deposited in five levels of organization, from raw to processed data:

  • Level 1: raw data. All fastq files for all data modalities have been deposited at ArrayExpress under accession id E-MTAB-13687.
  • Level 2: matrices. All data modalities correspond to different technologies from 10X Genomics. As such, they were mapped with different flavors of CellRanger (CR). The most important files in the ‘‘outs’’ folder of every CR run (including all matrices) have been deposited in Zenodo.
  • Level 3: Seurat Objects. All data was analyzed within the Seurat ecosystem. We have archived in Zenodo all Seurat Objects that contain the raw and processed counts, dimensionality reductions (PCA, Harmony, UMAP), and metadata needed to reproduce all figures from this manuscript.
  • Level 4: to allow for programmatic and modular access to the whole tonsil atlas dataset, we developed HCATonsilData, available on BioConductor. HCATonsilData provides a vignette which documents how to navigate and understand the data. It also provides access to the glossary to traceback all annotations in the atlas. In addition, we will periodically update the annotations as we refine it with suggestions from the community.
  • Level 5: interactive mode. Our tonsil atlas has been included as a reference in Azimuth, which allows interactive exploration of cell type markers on the web.

We refer to the READMEs in the Zenodo repositories for an explanation of how to access the matrices and Seurat objects. We have a separate repository (TonsilAtlasCAP) with scripts and documentation to download and remap all the fastq files from ArrayExpress

Package versions

You can check the versions of other packages at the "Session Information" section of each html report. To visualize one of the html reports online, you can copy&paste the URL of the report directly into the HTML GitHub viewer.

File system and name scheme

Although each technology requires specific analysis, they also share a similar pre-processing pipeline. We have strived to harmonize these pipelines into similar naming schemes so that it is easy for users to navigate this repo. Likewise, we have tried to code in a shared style. These are the most important steps:

In addition, the "figures_and_scripts" folder contains the scripts used to generate most of the figures in the manuscript. Finally, the "bin" folder contains functions and utilities used throughout many scripts.

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/Single-Cell-Genomics-Group-CNAG-CRG/TonsilAtlas.git

Relevant literature

About

Scripts and notebooks to reproduce all analysis from Massoni-Badosa et al. (2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages