Skip to content

iamandreatonina/Bioinformatic_resources

Repository files navigation

Bioinformatic Resources Project

Project in R for the course of Bioinformatic Resources, held by Alessandro Romanel (A.Y. 2022-2023).

Topic: Perform an analysis of the selected dataset representing RNA-seq count data extracted from different cancer datasets from the Cancer Genome Atlas (TCGA). From the original TCGA data, 50 cases (tumor samples) and 50 controls (normal samples) were randomly selected.

Dataset selected: Thyroid carcinoma

Report (both in Rmd and HTML)

Authors

Project developed by:

Tasks

  • Load the RData file.
  • Extract only protein-coding genes.
  • Perform differential expression analysis using edgeR package and select up-b and down-regulated genes using a p-value cutoff of 0.01, a log fold change ratio >1.5 for up-regulated genes and < (-1.5) for down-regulated genes, and a log CPM >1.
  • Perform gene set enrichment analysis using clusterProfiler.
  • Visualize one pathway you find enriched using the upregulated gene list by utilizing pathview.
  • Identify which transcription factors (TFs) have enriched scores in the promoters of all up-regulated genes.
  • Select one among the top enriched TFs, compute the empirical distributions of scores for all PWMs that you find in MotifDB for the selected TF, and determine for all of them the distribution (log2) threshold cutoff at 99.75%.
  • Identify which up-regulated genes have a region in their promoter with binding scores above the computed thresholds for any of the previously selected PWMs.
  • Find PPI interactions among differentially expressed genes by using STRING database and export the network in TSV format.
  • Import the network by using igraph package and identify and plot the largest connected component ( we also decided to use ggnet2 from GGally package).

About

💻 Project for the course of Bioinformatic Resources

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages