Welcome to my GitHub repository for the HackBio Internship! This repository highlights my journey through an intensive 8-week practical internship in oncology, focusing on machine learning and data science applications.
The internship was divided into five progressive stages, covering a comprehensive range of activities aimed at building both theoretical and practical foundations in cancer research:
- Stage 0: Built theoretical foundations and wrote an essay on supervised learning's importance in cancer research.
- Stage 1: Collaborated with a team to conduct a literature review, "Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis," and summarized our findings in a video.
- Stage 2: Preprocessed Glioblastoma dataset from TCGA, performed differential expression analysis and pathway enrichment using ShinyGo, alongside Biomarker and ML interns.
- Stage 3: Implemented a pipeline for potential Sarcoma biomarkers based on age classification using differential expression, functional enrichment, and ML models.
- Stage 4: Reproduced research by clustering gene expression data for LGG glioma based on IDH status, using KNN machine learning model.
- Stage 5-7: Final Capstone Project
Throughout this internship, I achieved:
- A deep understanding of supervised and unsupervised machine learning approaches.
- Hands-on experience in cancer biomarker discovery.
- Practical skills in R programming, data visualization, and dataset analysis.
- Collaboration with interdisciplinary teams, project management, and effective communication.
- R Programming: Mastered the basics of R, RStudio, and programming syntax.
- Data Visualization: Created insightful plots for biological datasets.
- Machine Learning: Applied KNN and Random Forest models for cancer diagnosis and classification.
- Bioinformatics: Conducted differential gene expression and pathway enrichment analyses.
- Teamwork: Collaborated with cross-functional teams of data scientists and biomarker interns
You can explore various stages of the internship within this repository, including:
- R scripts for preprocessing,Ml models used and analysis.
- Research reports,plots generated, R packages, tools used and video presentation.