Skip to content

Commit

Permalink
Evaluate performance of covariates at predicting various mutations (#47)
Browse files Browse the repository at this point in the history
* Evaluate performance of covariates on TP53

Creates an explore directory and README for this type of exploratory notebook.

See how well covariates (non-expression features) predict TP53 mutation.

Related to #8:
General mutation-load does provide some ability to predict mutation status of
TP53.

Partially addresses #21:
Covariates are extracted from samples.tsv.

* Evaluate more covariate/mutation combinations

Evaluate covariate-only classifiers for the interesting mutations compiled in
cognoma/cancer-data#22 (comment).

Switches to an expand grid system for evaluating all possible covariate
combinations.

Plot performance of all covariates on each mutation.

Switches to `covariates.tsv` created in
cognoma/cancer-data#24 for encoded covariates.

* Export clean notebook to script

* Address review comments
  • Loading branch information
dhimmel authored Sep 22, 2016
1 parent 6f5eb62 commit cbc0604
Show file tree
Hide file tree
Showing 5 changed files with 1,149 additions and 0 deletions.
9 changes: 9 additions & 0 deletions explore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# A directory for exploratory machine learning analyses

This directory is home to exploratory analyses that help answer questions about how we should do machine learning. For algorithm implementations see the [`algorithms`](../algorithms) directory. For other types of analyses, place them here.

Notebooks should be exported to scripts for review. For example, from the directory containing your scripts run:

```sh
jupyter nbconvert --to=script *.ipynb
```
3 changes: 3 additions & 0 deletions explore/confounding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This analysis looks into covariates and their potential confounding effects.

Specifically, we find that disease type, gender, and mutation burden predict _TP53_ mutation with AUROC = 84%.
249 changes: 249 additions & 0 deletions explore/confounding/auroc.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
mutation disease_covariate organ_covariate gender_covariate mutation_covariate survival_covariate symbol positive_prevalence mean_cv_auroc training_auroc testing_auroc
238 1 0 0 1 1 ALK 0.018889 0.85642 0.82764 0.84673
238 1 0 1 1 0 ALK 0.018889 0.85762 0.82936 0.84633
238 1 0 1 1 1 ALK 0.018889 0.85618 0.82939 0.84613
238 1 1 1 1 0 ALK 0.018889 0.85385 0.82907 0.84604
238 1 1 1 1 1 ALK 0.018889 0.85313 0.82821 0.84564
238 0 1 1 1 1 ALK 0.018889 0.85275 0.82823 0.84135
238 0 1 1 1 0 ALK 0.018889 0.85362 0.82817 0.84085
238 0 0 1 1 0 ALK 0.018889 0.8477 0.82684 0.83622
238 1 1 0 1 1 ALK 0.018889 0.85368 0.83335 0.83528
238 1 1 0 1 0 ALK 0.018889 0.85243 0.83322 0.83508
238 1 0 0 1 0 ALK 0.018889 0.85629 0.82957 0.83488
238 0 0 1 1 1 ALK 0.018889 0.8476 0.82754 0.83274
238 0 0 0 1 1 ALK 0.018889 0.84552 0.82696 0.82865
238 0 1 0 1 1 ALK 0.018889 0.85331 0.83747 0.81236
238 0 1 0 1 0 ALK 0.018889 0.85211 0.83829 0.80524
238 0 1 0 0 1 ALK 0.018889 0.78255 0.76903 0.70129
238 0 1 0 0 0 ALK 0.018889 0.77866 0.76335 0.69586
238 0 1 1 0 0 ALK 0.018889 0.77725 0.76281 0.69506
238 0 1 1 0 1 ALK 0.018889 0.78244 0.7626 0.69406
238 1 0 1 0 0 ALK 0.018889 0.77592 0.77116 0.69327
238 1 0 1 0 1 ALK 0.018889 0.78126 0.77554 0.68574
238 1 0 0 0 1 ALK 0.018889 0.78142 0.77687 0.68405
238 1 1 1 0 0 ALK 0.018889 0.78731 0.76632 0.68315
238 1 1 1 0 1 ALK 0.018889 0.79123 0.76632 0.68315
238 1 0 0 0 0 ALK 0.018889 0.77786 0.7709 0.68295
238 1 1 0 0 0 ALK 0.018889 0.78553 0.7541 0.6829
238 1 1 0 0 1 ALK 0.018889 0.79121 0.76059 0.67533
238 0 0 0 0 1 ALK 0.018889 0.51956 0.50443 0.60007
238 0 0 1 0 0 ALK 0.018889 0.57719 0.54399 0.565
238 0 0 1 0 1 ALK 0.018889 0.57671 0.54399 0.565
238 0 0 0 1 0 ALK 0.018889 0.84524 0.17316 0.16378
672 1 0 0 1 0 BRCA1 0.018615 0.84055 0.8314 0.86103
672 1 0 1 1 0 BRCA1 0.018615 0.83859 0.8314 0.86103
672 0 1 1 1 0 BRCA1 0.018615 0.83636 0.83073 0.86006
672 0 1 0 1 0 BRCA1 0.018615 0.83632 0.83074 0.86006
672 0 1 0 1 1 BRCA1 0.018615 0.83624 0.83074 0.86006
672 0 1 1 1 1 BRCA1 0.018615 0.83628 0.8309 0.86006
672 1 1 0 1 0 BRCA1 0.018615 0.83635 0.83116 0.86006
672 1 1 1 1 0 BRCA1 0.018615 0.83634 0.83113 0.86006
672 1 0 0 1 1 BRCA1 0.018615 0.83862 0.82585 0.85291
672 1 0 1 1 1 BRCA1 0.018615 0.83864 0.82585 0.85291
672 1 1 0 1 1 BRCA1 0.018615 0.83638 0.82333 0.8475
672 1 1 1 1 1 BRCA1 0.018615 0.8364 0.82333 0.8475
672 0 0 1 1 0 BRCA1 0.018615 0.82859 0.81965 0.83918
672 0 0 0 1 0 BRCA1 0.018615 0.82738 0.81799 0.83544
672 0 0 0 1 1 BRCA1 0.018615 0.82738 0.81799 0.83544
672 0 0 1 1 1 BRCA1 0.018615 0.82738 0.81799 0.83544
672 1 0 0 0 1 BRCA1 0.018615 0.7033 0.75681 0.71179
672 0 1 0 0 1 BRCA1 0.018615 0.71455 0.75257 0.7113
672 1 0 1 0 0 BRCA1 0.018615 0.72292 0.75269 0.70943
672 0 1 1 0 1 BRCA1 0.018615 0.71844 0.75269 0.70763
672 1 0 1 0 1 BRCA1 0.018615 0.71953 0.75806 0.707
672 1 0 0 0 0 BRCA1 0.018615 0.72229 0.75222 0.7052
672 1 1 1 0 0 BRCA1 0.018615 0.72895 0.75386 0.7025
672 1 1 0 0 0 BRCA1 0.018615 0.72504 0.75177 0.7016
672 0 1 1 0 0 BRCA1 0.018615 0.72877 0.75015 0.69945
672 0 1 0 0 0 BRCA1 0.018615 0.72111 0.7484 0.69861
672 1 1 0 0 1 BRCA1 0.018615 0.72 0.71965 0.66907
672 1 1 1 0 1 BRCA1 0.018615 0.72568 0.71965 0.66907
672 0 0 1 0 1 BRCA1 0.018615 0.54317 0.53337 0.53564
672 0 0 0 0 1 BRCA1 0.018615 0.53768 0.5 0.5
672 0 0 1 0 0 BRCA1 0.018615 0.52635 0.50191 0.45798
675 0 0 0 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 0 0 1 1 BRCA2 0.032439 0.8235 0.79603 0.88723
675 0 0 1 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 0 1 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 1 0 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 1 0 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 1 1 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 1 1 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 0 0 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 0 0 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 0 1 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 0 1 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 1 0 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 1 0 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 1 1 1 0 BRCA2 0.032439 0.82343 0.79603 0.88723
675 1 1 1 1 1 BRCA2 0.032439 0.82343 0.79603 0.88723
675 0 1 1 0 1 BRCA2 0.032439 0.69087 0.71246 0.80843
675 1 1 1 0 1 BRCA2 0.032439 0.69176 0.71708 0.80458
675 0 1 1 0 0 BRCA2 0.032439 0.69254 0.71312 0.80346
675 0 1 0 0 1 BRCA2 0.032439 0.69183 0.71193 0.80284
675 1 1 1 0 0 BRCA2 0.032439 0.69065 0.71562 0.80281
675 1 1 0 0 1 BRCA2 0.032439 0.69176 0.71874 0.7994
675 0 1 0 0 0 BRCA2 0.032439 0.69146 0.71288 0.79498
675 1 0 1 0 1 BRCA2 0.032439 0.68641 0.7158 0.79288
675 1 0 0 0 1 BRCA2 0.032439 0.68913 0.71485 0.79149
675 1 0 1 0 0 BRCA2 0.032439 0.68932 0.71682 0.79034
675 1 0 0 0 0 BRCA2 0.032439 0.69165 0.71541 0.78996
675 1 1 0 0 0 BRCA2 0.032439 0.69176 0.71744 0.78448
675 0 0 1 0 1 BRCA2 0.032439 0.58062 0.5581 0.61077
675 0 0 0 0 1 BRCA2 0.032439 0.55403 0.52461 0.57545
675 0 0 1 0 0 BRCA2 0.032439 0.55441 0.54702 0.57182
29126 0 0 1 1 0 CD274 0.0026006 0.92574 0.82964 0.9219
29126 0 0 1 1 1 CD274 0.0026006 0.93361 0.78557 0.89697
29126 0 0 0 1 1 CD274 0.0026006 0.93557 0.86543 0.87713
29126 0 0 0 1 0 CD274 0.0026006 0.92561 0.86562 0.8741
29126 0 1 0 1 0 CD274 0.0026006 0.92561 0.87603 0.86708
29126 0 1 0 1 1 CD274 0.0026006 0.92561 0.87603 0.86708
29126 0 1 1 1 0 CD274 0.0026006 0.9269 0.87603 0.86708
29126 0 1 1 1 1 CD274 0.0026006 0.92561 0.87603 0.86708
29126 1 0 0 1 0 CD274 0.0026006 0.92561 0.87603 0.86708
29126 1 0 0 1 1 CD274 0.0026006 0.92561 0.87603 0.86708
29126 1 0 1 1 0 CD274 0.0026006 0.92655 0.87603 0.86708
29126 1 0 1 1 1 CD274 0.0026006 0.92561 0.87603 0.86708
29126 1 1 0 1 0 CD274 0.0026006 0.92561 0.87559 0.86598
29126 1 1 0 1 1 CD274 0.0026006 0.92561 0.87559 0.86598
29126 1 1 1 1 0 CD274 0.0026006 0.92561 0.87559 0.86598
29126 1 1 1 1 1 CD274 0.0026006 0.92561 0.87559 0.86598
29126 0 0 1 0 1 CD274 0.0026006 0.62654 0.59426 0.78388
29126 0 0 1 0 0 CD274 0.0026006 0.64024 0.56949 0.76033
29126 1 0 1 0 0 CD274 0.0026006 0.81727 0.88121 0.72824
29126 1 1 1 0 0 CD274 0.0026006 0.76515 0.82536 0.70785
29126 0 1 0 0 0 CD274 0.0026006 0.71433 0.6496 0.57287
29126 0 1 0 0 1 CD274 0.0026006 0.742 0.6496 0.57287
29126 1 0 0 0 0 CD274 0.0026006 0.71125 0.6496 0.57287
29126 1 0 0 0 1 CD274 0.0026006 0.75034 0.6496 0.57287
29126 1 1 0 0 0 CD274 0.0026006 0.71125 0.6496 0.57287
29126 1 1 0 0 1 CD274 0.0026006 0.71125 0.6496 0.57287
29126 1 1 1 0 1 CD274 0.0026006 0.79372 0.6496 0.57287
29126 0 1 1 0 0 CD274 0.0026006 0.82562 0.6745 0.5646
29126 0 1 1 0 1 CD274 0.0026006 0.79327 0.70564 0.55799
29126 0 0 0 0 1 CD274 0.0026006 0.53146 0.55083 0.54504
29126 1 0 1 0 1 CD274 0.0026006 0.79784 0.7308 0.54256
4221 0 1 0 1 0 MEN1 0.0078018 0.83516 0.83394 0.90351
4221 1 1 0 1 0 MEN1 0.0078018 0.85521 0.83546 0.84057
4221 1 0 0 1 0 MEN1 0.0078018 0.85524 0.83897 0.83564
4221 1 1 1 1 0 MEN1 0.0078018 0.85037 0.83299 0.82597
4221 0 1 1 1 0 MEN1 0.0078018 0.83827 0.81897 0.82133
4221 1 1 1 1 1 MEN1 0.0078018 0.85458 0.83504 0.81452
4221 1 1 0 1 1 MEN1 0.0078018 0.8574 0.8345 0.81433
4221 0 1 0 1 1 MEN1 0.0078018 0.84544 0.82366 0.81255
4221 0 1 1 1 1 MEN1 0.0078018 0.854 0.82502 0.81146
4221 1 0 0 1 1 MEN1 0.0078018 0.85072 0.83205 0.80979
4221 1 0 1 1 1 MEN1 0.0078018 0.85552 0.8339 0.8087
4221 1 0 1 1 0 MEN1 0.0078018 0.85487 0.833 0.80781
4221 0 0 0 1 0 MEN1 0.0078018 0.75727 0.72028 0.78562
4221 0 0 0 1 1 MEN1 0.0078018 0.76606 0.73202 0.77289
4221 0 0 1 1 0 MEN1 0.0078018 0.75744 0.72204 0.76115
4221 0 0 1 1 1 MEN1 0.0078018 0.76603 0.71694 0.7428
4221 1 0 1 0 0 MEN1 0.0078018 0.72826 0.76812 0.69307
4221 0 1 0 0 1 MEN1 0.0078018 0.78894 0.75859 0.6837
4221 0 1 1 0 0 MEN1 0.0078018 0.70709 0.71926 0.63398
4221 1 1 0 0 1 MEN1 0.0078018 0.75274 0.76115 0.62007
4221 1 1 1 0 1 MEN1 0.0078018 0.74922 0.76524 0.60902
4221 1 0 0 0 0 MEN1 0.0078018 0.72857 0.73738 0.60616
4221 1 1 1 0 0 MEN1 0.0078018 0.72861 0.73261 0.59136
4221 1 0 0 0 1 MEN1 0.0078018 0.75487 0.74982 0.57616
4221 1 1 0 0 0 MEN1 0.0078018 0.72307 0.69823 0.56719
4221 0 1 0 0 0 MEN1 0.0078018 0.71001 0.70745 0.55041
4221 0 1 1 0 1 MEN1 0.0078018 0.78162 0.737 0.5219
4221 1 0 1 0 1 MEN1 0.0078018 0.75782 0.75556 0.51944
4221 0 0 1 0 0 MEN1 0.0078018 0.59436 0.53765 0.47178
4221 0 0 0 0 1 MEN1 0.0078018 0.63918 0.60479 0.41604
4221 0 0 1 0 1 MEN1 0.0078018 0.64165 0.61631 0.41298
5979 0 1 1 1 0 RET 0.016014 0.84825 0.82611 0.88778
5979 0 1 0 1 0 RET 0.016014 0.84804 0.82584 0.88767
5979 0 1 0 1 1 RET 0.016014 0.84551 0.82589 0.88767
5979 0 1 1 1 1 RET 0.016014 0.84586 0.82639 0.88767
5979 0 0 0 1 1 RET 0.016014 0.80154 0.78132 0.88467
5979 0 0 1 1 0 RET 0.016014 0.79772 0.78132 0.88467
5979 0 0 1 1 1 RET 0.016014 0.80119 0.78036 0.88097
5979 1 1 1 1 0 RET 0.016014 0.84965 0.83005 0.87626
5979 1 1 1 1 1 RET 0.016014 0.84671 0.83035 0.87594
5979 1 0 1 1 0 RET 0.016014 0.84605 0.82659 0.8709
5979 1 0 0 1 0 RET 0.016014 0.84601 0.82627 0.87079
5979 1 0 0 1 1 RET 0.016014 0.84289 0.82633 0.87079
5979 1 0 1 1 1 RET 0.016014 0.8433 0.82669 0.87079
5979 1 1 0 1 0 RET 0.016014 0.8476 0.82988 0.87037
5979 1 1 0 1 1 RET 0.016014 0.84665 0.82979 0.87037
5979 0 1 0 0 1 RET 0.016014 0.72391 0.74649 0.78996
5979 0 1 0 0 0 RET 0.016014 0.73719 0.75251 0.78278
5979 0 1 1 0 1 RET 0.016014 0.72498 0.75247 0.78027
5979 0 1 1 0 0 RET 0.016014 0.73027 0.75576 0.77834
5979 1 0 0 0 1 RET 0.016014 0.72909 0.75378 0.76387
5979 1 1 0 0 0 RET 0.016014 0.74045 0.76007 0.76253
5979 1 1 0 0 1 RET 0.016014 0.72952 0.75645 0.75777
5979 1 0 1 0 1 RET 0.016014 0.72807 0.76008 0.75198
5979 1 1 1 0 1 RET 0.016014 0.72902 0.75858 0.74963
5979 1 0 0 0 0 RET 0.016014 0.73982 0.76069 0.74936
5979 1 1 1 0 0 RET 0.016014 0.74002 0.75906 0.74325
5979 1 0 1 0 0 RET 0.016014 0.73368 0.76267 0.74229
5979 0 0 1 0 1 RET 0.016014 0.57853 0.54692 0.58501
5979 0 0 0 0 1 RET 0.016014 0.55554 0.52569 0.54398
5979 0 0 1 0 0 RET 0.016014 0.56051 0.46443 0.45082
5979 0 0 0 1 0 RET 0.016014 0.79629 0.21868 0.11533
7157 1 1 0 1 0 TP53 0.35409 0.84786 0.84534 0.85651
7157 1 1 0 1 1 TP53 0.35409 0.84806 0.84484 0.85651
7157 1 1 1 1 1 TP53 0.35409 0.84774 0.84489 0.85644
7157 1 1 1 1 0 TP53 0.35409 0.8475 0.84537 0.85627
7157 1 0 1 1 0 TP53 0.35409 0.84783 0.84529 0.85374
7157 1 0 0 1 1 TP53 0.35409 0.84739 0.84391 0.85014
7157 1 0 1 1 1 TP53 0.35409 0.84744 0.84322 0.85006
7157 1 1 1 0 1 TP53 0.35409 0.82911 0.82597 0.84982
7157 0 1 0 1 0 TP53 0.35409 0.84504 0.84027 0.84954
7157 1 0 0 1 0 TP53 0.35409 0.84783 0.84421 0.84908
7157 1 1 0 0 1 TP53 0.35409 0.82826 0.82576 0.84874
7157 0 1 0 1 1 TP53 0.35409 0.84511 0.83979 0.84788
7157 0 1 1 1 0 TP53 0.35409 0.84507 0.83896 0.84765
7157 1 0 1 0 1 TP53 0.35409 0.83044 0.82512 0.84753
7157 0 1 1 1 1 TP53 0.35409 0.84513 0.83901 0.84744
7157 1 1 1 0 0 TP53 0.35409 0.82923 0.82674 0.84737
7157 1 0 0 0 1 TP53 0.35409 0.83008 0.82507 0.84703
7157 1 0 1 0 0 TP53 0.35409 0.83034 0.82702 0.84611
7157 0 1 1 0 1 TP53 0.35409 0.8271 0.82209 0.84578
7157 0 1 0 0 1 TP53 0.35409 0.82692 0.82182 0.84554
7157 0 1 1 0 0 TP53 0.35409 0.82799 0.82325 0.84532
7157 1 0 0 0 0 TP53 0.35409 0.82831 0.82592 0.8439
7157 1 1 0 0 0 TP53 0.35409 0.82758 0.82536 0.84369
7157 0 1 0 0 0 TP53 0.35409 0.82558 0.82191 0.84312
7157 0 0 1 1 0 TP53 0.35409 0.72269 0.72046 0.73622
7157 0 0 1 1 1 TP53 0.35409 0.72909 0.72524 0.73554
7157 0 0 0 1 1 TP53 0.35409 0.72904 0.72525 0.73546
7157 0 0 0 1 0 TP53 0.35409 0.72255 0.71984 0.73498
7157 0 0 0 0 1 TP53 0.35409 0.57634 0.57419 0.58941
7157 0 0 1 0 1 TP53 0.35409 0.59265 0.58341 0.58714
7157 0 0 1 0 0 TP53 0.35409 0.53082 0.48264 0.4987
7428 1 0 0 1 0 VHL 0.018478 0.98297 0.98363 0.99392
7428 1 1 0 1 0 VHL 0.018478 0.98464 0.98335 0.99392
7428 1 1 0 1 1 VHL 0.018478 0.98284 0.98095 0.99392
7428 1 1 1 1 0 VHL 0.018478 0.98556 0.98297 0.99392
7428 1 1 1 1 1 VHL 0.018478 0.98436 0.98306 0.99388
7428 1 0 0 1 1 VHL 0.018478 0.98075 0.98375 0.99375
7428 1 1 0 0 1 VHL 0.018478 0.98036 0.98134 0.99318
7428 1 0 0 0 1 VHL 0.018478 0.974 0.98134 0.99318
7428 1 0 1 1 0 VHL 0.018478 0.98078 0.98287 0.99305
7428 1 0 1 1 1 VHL 0.018478 0.98097 0.98297 0.99301
7428 1 0 1 0 1 VHL 0.018478 0.97814 0.98014 0.99161
7428 1 1 0 0 0 VHL 0.018478 0.98076 0.97964 0.99161
7428 1 0 1 0 0 VHL 0.018478 0.97937 0.96852 0.99126
7428 1 1 1 0 0 VHL 0.018478 0.98076 0.97868 0.99126
7428 1 1 1 0 1 VHL 0.018478 0.9796 0.97852 0.99047
7428 0 1 0 1 1 VHL 0.018478 0.97458 0.97149 0.98488
7428 0 1 1 1 1 VHL 0.018478 0.97516 0.97063 0.98217
7428 0 1 0 1 0 VHL 0.018478 0.9757 0.97078 0.98212
7428 0 1 0 0 1 VHL 0.018478 0.97056 0.96755 0.98204
7428 0 1 1 0 1 VHL 0.018478 0.97085 0.96782 0.98081
7428 0 1 1 1 0 VHL 0.018478 0.97554 0.96966 0.9795
7428 0 1 0 0 0 VHL 0.018478 0.96924 0.9652 0.97552
7428 0 1 1 0 0 VHL 0.018478 0.97062 0.96434 0.97264
7428 1 0 0 0 0 VHL 0.018478 0.97722 0.91939 0.92727
7428 0 0 1 1 0 VHL 0.018478 0.6471 0.60612 0.59161
7428 0 0 0 1 0 VHL 0.018478 0.60565 0.57224 0.58099
7428 0 0 1 0 0 VHL 0.018478 0.61571 0.57833 0.55656
7428 0 0 1 1 1 VHL 0.018478 0.65996 0.61086 0.54301
7428 0 0 1 0 1 VHL 0.018478 0.63026 0.58997 0.52146
7428 0 0 0 1 1 VHL 0.018478 0.62362 0.58426 0.50175
7428 0 0 0 0 1 VHL 0.018478 0.54628 0.52655 0.43684
Loading

0 comments on commit cbc0604

Please sign in to comment.