This project aims to perform abnormality classification in mammography by means of Convolutional Neural Networks. The dataset of interest is the CBIS DDSM dataset. The mammogram images feature two kinds of breast abnormalities: mass and calcification, which can be either benign or malignant.
The classification task consists in distinguishing between the four cases:
- Benign mass
- Malignant mass
- Benign calcification
- Malignant calcification
A subtask is to just distinguish masses from calcifications.
The full detailed report is available here.
Left: example of mass Right: example of calcification
All the Jupyter notebooks used for the experiments are collected in the scripts
folder.
Specifically:
- Scratch_CNN_2_class: CNN built from scratch for the 2-categories classification task.
- Scratch_CNN_4_class: CNN built from scratch for the 4-categories classification task.
- Scratch_CNN_ben_mal: CNN build from scratch for benign-malignant classification.
- VGG16_2_class: VGG16 with feature-extraction and fine-tuning for the 2-categories classification task.
- VGG16_4_class: VGG16 with feature-extraction and fine-tuning for the 4-categories classification task.
- Baseline_Dual_CNN: Dual CNN model exploiting images of nearby healthy tissue too.
- Composite_4_class: Two parallel CNN models to decompose the 4-categories classification task.
- Baseline_Siamese: Siamese CNN exploiting images of nearby healthy tissue too.
- Ensemble_2_class: Ensemble of different CNN models for the 2-categories classification task.
- Ensemble_4_class: Ensemble of different CNN models for the 4-categories classification task.
Extra:
- LearningRate: Experiments tuning the learning rate for different optimizers.
You can download the dataset from Google Drive. All the scripts assume that the dataset zip file is located in the root of your Google Drive folder, but you can easily change it.
I developed and tested many models for the 2-class and 4-class tasks.
The best model for the 2-class task obtained a 91.37% accuracy on the test set. The best model for the 4-class task obtained a 61.01% accuracy on the test set.
Comparing the results with those presented in many papers, the models achieved state-of-the-art accuracy [1][2][3].
[1] Neeraj Dhungel, Gustavo Carneiro, and Andrew P Bradley. “Automated mass
detection in mammograms using cascaded deep learning and random forests”.
In: 2015 international conference on digital image computing
[2] Dina A Ragab et al. “Breast cancer detection using deep convolutional neural
networks and support vector machines”. In: PeerJ 7 (2019)
[3] Li Shen et al. “Deep learning to improve breast cancer detection on screening
mammography”. In: Scientific reports 9.1 (2019)
See the report for full details.
The project was developed using the following technologies:
- Python: scripting language
- Keras: open-source library for experimentation with deep neural networks
- Google Colab: free cloud-based Jupyter notebook environment by Google
The dataset of interest is the CBIS DDSM (Curated Breast Imaging Subset of Digital Database for Screening Mammography), a collection of mammography images by Lee et al. It is an updated version of the original DDSM dataset, where all the images have been segmented and labeled.
Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). Curated Breast Imaging Subset of DDSM [Dataset]. The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2016.7O02S9CY
The author (Leonardo Lai) designed and performed all the experiments listed in the project.
If you want to cite this work, please use the following:
@software{leonardo_lai_2021_4700130,
author = {Leonardo Lai},
title = {leoll2/MedicalCNN: v1.0},
month = apr,
year = 2021,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.4700130},
url = {https://doi.org/10.5281/zenodo.4700130}
}