Co-first Author: Shidan Wang, QBRC & Alyssa Chen
Contact: [email protected]
Scripts for https://www.nature.com/articles/s41598-018-27707-4, Comprehensive analysis of lung cancer pathology images to discover tumor shape and boundary features that predict survival outcome.
Wang, Shidan, Alyssa Chen, Lin Yang, Ling Cai, Yang Xie, Junya Fujimoto, Adi Gazdar, and Guanghua Xiao. "Comprehensive analysis of lung cancer pathology images to discover tumor shape and boundary features that predict survival outcome." Scientific reports 8, no. 1 (2018): 10393.
- A scripts folder
- An README file
- Pathology images that support the findings of this study are available online in NLST and The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD).
- The weights (.h5) file can be downloaded from https://drive.google.com/file/d/1zdrLQXypAd-XI1KQ0KOdjMqkfrxvQU7F/view?usp=sharing
-
Python 2
-
keras==2.0.5
-
tensorflow==1.2.1
-
Other commonly used python libraries
-
R
-
survival==2.41-3
Aperio Imagescope is used to annotate the pathology slides (.svs files) and generate the corresponding .xml files. "Tumor" and "normal" regions are circled out from which the training set image patches are extracted.
In total, 2475 ROI, 2139 Normal, and 730 White patches were generated. One can easily generate more training/testing samples by running ./scripts/1_generatePatches.py
. Below is a sample ROI, normal, and white patch, respectively.
The thousands of image patches are used to train an InceptionV3 model by running ./scripts/2_modelInception.py
. Training curve:
A tumor region heatmap for a pathology image can be generated using ./script/3_getHeatmap.py
:
Done by ./script/4_generateSlideProps.py
Done by ./script/5_univariateAnalysisSlides.R
and 6_coxph_model.R
. Prediction performance in TCGA validation dataset: