This project explores different clustering and classification techniques applied to Peripheral blood mononuclear cells (PBMCs) transcriptomics data. It is divided into two parts: dimensionality reduction and classification.
For a quick result overview, please refer to the pdf.
Hugo Hakem, MEng BioE '23-24
This work serves as the final project for the BioE245 class taught in Spring 2024 by Professor Liana Lareau and GSIs Michael Herschl and Amy Lu.
Peripheral blood mononuclear cells (PBMCs) are a subset of white blood cells that play crucial roles in both the innate and adaptive immune systems. PBMCs include various immune cells such as T cells, B cells, and natural killer cells. These cells can be easily isolated from blood samples, making them highly informative for research purposes. Transcriptomics data from PBMCs provides valuable insights into immune responses and disease states, as changes in gene expression reflect the immune system's activity.
For this project, two datasets were provided:
- labels.csv: Contains the association between cell barcodes and their respective cell type labels.
- processed_counts.csv: Stores normalized log read counts for each cell, with rows representing cells and columns representing genes.
The goal of this part is to compare different dimensionality reduction techniques (AutoEncoder, PCA, and t-SNE) to represent PBMCs on a 2D plot.
The goal of this part is to classify PBMCs using supervised machine learning techniques such as Decision Trees, Naive Bayes models, and Feed Forward Neural Networks.
To run both notebooks, install the following packages:
conda install ipykernel conda-forge -y
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pandas matplotlib seaborn tqdm -c conda-forge -y
conda install -c conda-forge py-xgboost