-
Build and deploy an end to end DL: Image classification application to AWS EC2 using Docker, CI/CD Jenkins
-
In this project, we aimed to revolutionize healthcare by accurately classifying chest diseases from CT scan images. This would enhance early diagnosis and treatment.
-
We leveraged
Transfer Learning
approach: downloaded a pre-trained Vgg16 model (CNN architexture) from Keras and fine-tuned the model to fit our custom dataset. -
Fine tuning is done by dropping original dense layers and added a custon dense layer since our dataset has only two classes unlike Imagenet data used for pretraining Vgg16 had 1000 classes.
-
Fine tuned Vgg16 model was trained on a chest CT scan images dataset having two labels: Normal, adenocarcinoma.
-
Project structure is made with a data science project template. This template ensured modularity, reusability, and maintainability of the code. It included modules for logging, exception handling, and utilities.
-
Utilized DagsHub with MLflow for experiment tracking and model management, allowed us to track the experiments, compare results and manage models effectively.
-
Also integrated DVC (Data Version Control) for managing the data pipeline to ensure reproducibility and collaboration among the team members.
- Data Ingestion: We ingested the CT scan images from Google drive using
gdown
package. Images were preprocessed to remove any noise and normalize the pixel values. - Prepare Base Model: We prepared a base CNN model using a pre-trained model, VGG16. Then customized VGG16 model to train on our dataset (dropped dense layer, added custom dense layer since our dataset had only two classes).
- Model Trainer: We trained the custom CNN model on the prepared dataset. Then, used a training-validation split to ensure the model's generalization capabilities.
- Model Evaluation: We evaluated the model's performance on a test dataset. We calculated metrics like accuracy, precision, recall, and F1-score.
- MLflow Integration: We integrated MLflow with the model trainer and evaluator components. This allowed us to track the experiments and manage the models effectively.
- DVC Pipeline: We integrated DVC with the data ingestion, model trainer, and evaluator components. This ensured reproducibility and collaboration among the team members.
- Deployed the pipeline to AWS EC2 using containers Docker, AWS ECR, CI/CD tool Jenkins
- Built user application with Flask
By the end of this project, we achieved a high level of accuracy in classifying chest diseases from CT scan images. This would significantly improve the early diagnosis and treatment of patients with chest diseases.
- Update config.yaml # to define constants
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the pipeline
- Update the main.py
- Update the dvc.yaml
git add .
git commit -m "Updated"
git push origin main
- conda create -n chest python=3.8 -y
- conda activate chest
- pip install -r requirements.txt
- MLFLOW_TRACKING_URI= MLFLOW_TRACKING_URI,
- MLFLOW_TRACKING_USERNAME= MLFLOW_TRACKING_USERNAME,
- MLFLOW_TRACKING_PASSWORD=MLFLOW_TRACKING_PASSWORD
export MLFLOW_TRACKING_URI= MLFLOW_TRACKING_URI
export MLFLOW_TRACKING_USERNAME= MLFLOW_TRACKING_USERNAME
export MLFLOW_TRACKING_PASSWORD= MLFLOW_TRACKING_PASSWORD
- dvc init # initializes dvc (o/p .dvc, .dvcignore files generated)
- dvc repro # runs dvc.yaml file -> creates artificats -> dvc.lock
- dev dag