Skip to content

charlesgwellem/GSE178381_scRNAseq_EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QC, Clustering, and Differential Expression Analysis for GSE178318 Data

Overview

This project focuses on analyzing the Single-Cell RNA sequencing data of liver metastases of colorectal cancer (CRC) with the dataset from GSE178318. The primary goal of this analysis is to explore the tumor microenvironment (TME) and check the expression of lymphotoxins in metastasized liver cancer cells. Key steps include quality control (QC), clustering, normalization, and differential expression analysis (DEA).

Key Steps

  1. Data Preparation:

    • Load the raw count matrix, barcodes, and gene names.
    • Create the Seurat object for downstream analysis and add metadata about tissue origin (CRC, LM, PBMC).
  2. Quality Control (QC):

    • Calculate percentages of mitochondrial and ribosomal gene expressions.
    • Assess the number of UMI counts and genes per cell using histograms and boxplots.
    • Filter low-quality cells based on predefined thresholds (UMI counts, gene counts, mitochondrial content).
    • Filter non-protein coding genes, keeping only protein-coding ones for analysis.
  3. Normalization and Cell Cycle Analysis:

    • Normalize data and score cells based on cell cycle phases (S, G2M).
    • Perform PCA to examine variability and conduct clustering.
  4. Data Integration:

    • Integrate data across tissue groups (CRC, LM, PBMC) to reduce batch effects and ensure comparable cell types are identified.
  5. Clustering:

    • Perform dimensionality reduction (PCA, UMAP) and identify clusters based on tissue origin.
    • Clustering without integration revealed batch effects, motivating the integration approach.
  6. Differential Expression Analysis (DEA):

    • Perform DEA between liver metastasis (LM), PBMC, and colorectal carcinoma (CRC) tissues.
    • Identify significant tissue-specific markers and visualize results with heatmaps.
  7. Functional Enrichment Analysis:

    • Use enrichR to identify enriched biological processes, molecular functions, and cellular components from the identified markers.

Tools Used

  • R Packages: Seurat, ggplot2, dplyr, openxlsx, SeuratDisk, enrichR, etc.
  • Data: GSE178318, a single-cell RNA-seq dataset.

Figures and Outputs

  • UMI counts per cell (Histogram & Boxplots)
  • Mitochondrial counts ratio per cell (Density plot)
  • Complexity of gene expression
  • Heatmaps for tissue-specific markers
  • Enriched functions per tissue group

Session Information

Detailed session info can be found in the corresponding section of the report.


Releases

No releases published

Packages

No packages published

Languages