Estimating Time from Referral to Procurement using Organ Retrieval and Collection of Health Information for Donation from physionet.org. This project is linked with mlops datatalks to MLOps Zoomcamp 2024 course!.
The goal of this project is to predict the time interval between hospital referral and organ procurement using a machine learning model. This helps healthcare professionals estimate the procurement timeline, potentially improving the efficiency and planning of organ transplants.
We selected the following features to train our model, a description in deep is in html file or physionet.org:
- Age: The age of the patient. (Numerical)
- Gender: The gender of the patient. (Categorical)
- Race: The race of the patient. (Categorical)
- HeightIn: The height of the patient in inches. (Numerical)
- WeightKg: The weight of the patient in kilograms. (Numerical
- blood_type: A combination of the ABO Blood Type and the Rh factor (positive or negative). (Categorial)
- brain_death: A boolean indicating if brain death has occurred. (Categorical)
The target variable, time_to_procurement, is calculated as the difference between time_procured and time_referred, converted into hours. The Random Forest model was chosen for its ability to handle complex, non-linear relationships and interactions among features. The model's performance was evaluated using metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Feature importance was visualized to understand the contribution of each feature to the model's predictions.
We used a Random Forest regression model for this task.
This are the steps to follow in order to update credential access or cookies for the project to run propertly. this project has two ways of loading data. The first is by using a Cookie and updating you credenitals in the docker compose file, the second is updating your credenitals in the docker compose file. Both cases you need to sign up and generate credentials. Below I explain how to do it:
- go to the link physionet.org and click on Account on the upper-right corner of the screen.
- fill the information with a valid email
- check you email inbos/junk/spam an email from [email protected]. Click on the link to activate your account.
- you will be asked for update your password. after you generate your password, click on "Activate" buttom.
- sign in with your credentials and go to this page [physionet.org](https:// doi.org/10.13026/b1c0-3506)
- go to the buttom of the page and click on the link that says "sign the data use agreement for the project"
- click on "Agree"
- click on the link to go to the dataset or click here physionet.org
- go to the buttom of the page and you will see this:
The next steps depends on what method you would use to download the data. I list here two main methods, the first is collecting a Cookie from a download from physionet website for only the data used. The second, is simply update the credentials in the docker compose file on root. The following steps are the steps needed to run the pipelines and download the data successfully:
- in order to download the data and get the Cookie, press F12 (or inspect element with right-click) go to "network" panel and click on download "referrals.csv"
- one csv file is downloaded, right-click on GET method that you should see on the network panel if you follow step 10. Then click on "copy cURL"
- use any cURL of your preference like "Postman". I suggest convert to convert the cURL in a python script.
- copy cookie and replace the new cookie on the docker compose env
- open docker-compose.yml and update PHYSIONET_USERNAME and PHYSIONET_PASSWORD with your credentials. Do not include any aditional character.
On this project you could upload to a cloud using docker compose up with the docker-compose.yml file on root directory. To run it locally you need a docker running on you machine and run the following comand:
docker-compose up --build
after a few minutes, port 4200 will be in use by prefect server, port 5000 will be in use by mlflow server, port 5232, and 8080 will be in use by postgreSQL and adminer respectivle. And port 3000 will be use by grafana. At the end, port 9696 will be listening to predict values. An example of curl code is below:
curl --location 'http://localhost:9696/predict' \
--header 'Content-Type: application/json' \
--header 'Content-Type: application/json' \
--data '{
"Age": 60,
"HeightIn": 68,
"WeightKg": 70,
"brain_death": 0,
"Gender_M": 1,
"Race_Hispanic": 0,
"Race_Other / Unknown": 0,
"Race_White / Caucasian": 1,
"blood_type_A-Negative ": 0,
"blood_type_A-Positive": 0,
"blood_type_A-Positive ": 0,
"blood_type_A1-Negative": 0,
"blood_type_A1-Negative ": 0,
"blood_type_A1-Positive": 0,
"blood_type_A1-Positive ": 0,
"blood_type_A1B-Negative": 0,
"blood_type_A1B-Negative ": 0,
"blood_type_A1B-Positive": 0,
"blood_type_A1B-Positive ": 0,
"blood_type_A2-Negative": 0,
"blood_type_A2-Negative ": 0,
"blood_type_A2-Positive": 0,
"blood_type_A2-Positive ": 0,
"blood_type_A2B-Negative": 0,
"blood_type_A2B-Positive": 1,
"blood_type_A2B-Positive ": 0,
"blood_type_AB-Negative": 0,
"blood_type_AB-Negative ": 0,
"blood_type_AB-Positive": 0,
"blood_type_AB-Positive ": 0,
"blood_type_B-Negative": 0,
"blood_type_B-Negative ": 0,
"blood_type_B-Positive": 0,
"blood_type_B-Positive ": 0,
"blood_type_O-Negative": 0,
"blood_type_O-Negative ": 0,
"blood_type_O-Positive": 0,
"blood_type_O-Positive ": 0
}'
With this, you will get the expected days to get a donation from donor info.
In case you want to check formating or linting you can use the following commands:
pylint src/
black src/
isort src/
The pylint code passed with 10/10 and black and isort do not suggest any changes.