This project focuses on analyzing the Single-Cell RNA sequencing data of liver metastases of colorectal cancer (CRC) with the dataset from GSE178318. The primary goal of this analysis is to explore the tumor microenvironment (TME) and check the expression of lymphotoxins in metastasized liver cancer cells. Key steps include quality control (QC), clustering, normalization, and differential expression analysis (DEA).
-
Data Preparation:
- Load the raw count matrix, barcodes, and gene names.
- Create the Seurat object for downstream analysis and add metadata about tissue origin (CRC, LM, PBMC).
-
Quality Control (QC):
- Calculate percentages of mitochondrial and ribosomal gene expressions.
- Assess the number of UMI counts and genes per cell using histograms and boxplots.
- Filter low-quality cells based on predefined thresholds (UMI counts, gene counts, mitochondrial content).
- Filter non-protein coding genes, keeping only protein-coding ones for analysis.
-
Normalization and Cell Cycle Analysis:
- Normalize data and score cells based on cell cycle phases (S, G2M).
- Perform PCA to examine variability and conduct clustering.
-
Data Integration:
- Integrate data across tissue groups (CRC, LM, PBMC) to reduce batch effects and ensure comparable cell types are identified.
-
Clustering:
- Perform dimensionality reduction (PCA, UMAP) and identify clusters based on tissue origin.
- Clustering without integration revealed batch effects, motivating the integration approach.
-
Differential Expression Analysis (DEA):
- Perform DEA between liver metastasis (LM), PBMC, and colorectal carcinoma (CRC) tissues.
- Identify significant tissue-specific markers and visualize results with heatmaps.
-
Functional Enrichment Analysis:
- Use enrichR to identify enriched biological processes, molecular functions, and cellular components from the identified markers.
- R Packages: Seurat, ggplot2, dplyr, openxlsx, SeuratDisk, enrichR, etc.
- Data: GSE178318, a single-cell RNA-seq dataset.
- UMI counts per cell (Histogram & Boxplots)
- Mitochondrial counts ratio per cell (Density plot)
- Complexity of gene expression
- Heatmaps for tissue-specific markers
- Enriched functions per tissue group
Detailed session info can be found in the corresponding section of the report.