2020 CSHL Codeathon

Exploring Feature Selection for Genomics Expression Profile (draft version)

Motivation:

NCI-DOE collaboration (https://github.com/ravichas/ML-TC1) show that genomic expression profiles collected from different cancer sites/types can be modeled (classification) using the deep-learning (convolutional neural network) method. The method works well for a balanced dataset. Neural network method doesn't answer what features (i.e. genes) are important for the classification? A project to explore feature selection for genomics data could be useful for cancer research communities.

Complexity of the problem and open questions:

Genomics data is high dimensional in terms of the number of genes/probes/features. Models constructed from a high dimensional Omics data will be complex and difficult to explain. Identifying important features/genes is as important as building high accuracy models. Keeping in mind that genes do not work alone, pathway-based analysis could be used to

Overview

Data collection
- Data source will be Genmic Data Commons (https://portal.gdc.cancer.gov/). Several datasets (cancer sites that have same lineages and different lineages) will be constructed. * Data collection will be based on our previous effort (details available from https://github.com/ravichas/ML-TC1/blob/master/TC1-dataprep.ipynb)
Datasets created in the previous step will be used to construct/compare several supervised and unsupervised models (tSNE, PCA,
Important features from these models will be compared with experimental findings
Summarize the conclusions
Provide list of open questions and propose future directions

Links

Presentation slide Google Link:

https://docs.google.com/presentation/d/1erTGpWVEhLZJ9xCHSKYEd81tABpZgOH7QiWieMJ9s_A/edit?usp=sharing

Summary link:

https://docs.google.com/document/d/1osFSpdmZKxnbVrgs4vkbsvhuxuZVHJZhgqkz93UupVA/edit?usp=sharing

GitHub:

https://github.com/STRIDES-Codes/Exploring-feature-selection-in-deep-learning-models-for-GDC-Cancer-site-expression-data

Team

Andrew Weisman
Anwar Khan
Aishwarya Mandava
Medina Colic
Regina Umarova
Sarangan Ravichandran (Github:https://github.com/ravichas; Email: [email protected])
S. E. Krumme

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Data		Data
Img		Img
JupyterNotebooks		JupyterNotebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2020 CSHL Codeathon

Exploring Feature Selection for Genomics Expression Profile (draft version)

Motivation:

Complexity of the problem and open questions:

Overview

Links

Team

About

Releases

Packages

Contributors 2

License

STRIDES-Codes/Exploring-feature-selection-in-deep-learning-models-for-GDC-Cancer-site-expression-data

Folders and files

Latest commit

History

Repository files navigation

2020 CSHL Codeathon

Exploring Feature Selection for Genomics Expression Profile (draft version)

Motivation:

Complexity of the problem and open questions:

Overview

Links

Team

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages