Skip to content

Latest commit

 

History

History
45 lines (30 loc) · 2.26 KB

File metadata and controls

45 lines (30 loc) · 2.26 KB

2020 CSHL Codeathon

Exploring Feature Selection for Genomics Expression Profile (draft version)

Motivation:

NCI-DOE collaboration (https://github.com/ravichas/ML-TC1) show that genomic expression profiles collected from different cancer sites/types can be modeled (classification) using the deep-learning (convolutional neural network) method. The method works well for a balanced dataset. Neural network method doesn't answer what features (i.e. genes) are important for the classification? A project to explore feature selection for genomics data could be useful for cancer research communities.

Complexity of the problem and open questions:

Genomics data is high dimensional in terms of the number of genes/probes/features. Models constructed from a high dimensional Omics data will be complex and difficult to explain. Identifying important features/genes is as important as building high accuracy models. Keeping in mind that genes do not work alone, pathway-based analysis could be used to

Overview

  • Data collection
  • Datasets created in the previous step will be used to construct/compare several supervised and unsupervised models (tSNE, PCA,
  • Important features from these models will be compared with experimental findings
  • Summarize the conclusions
  • Provide list of open questions and propose future directions

alt text

Links

Presentation slide Google Link:

Summary link:

GitHub:

Team