XGBoost and GNN training and models for prediction of Hansen solubility parameters used in https://doi.org/10.1016/j.chemolab.2024.105168. Dependencies:
- Python packages :
deepchem
,mordred
,pandas
,hyperopt
,rdkit
,sklearn
,xgboost
Folders:
data
: folder with all data used in the papertrained_models
: folder with trained models, for now only XGBOOST models are available in GitHub (due to their size), if you want to use trained GNN models download them from this link.
Files:
XGBOOST related
XGBOOST_feature_generation.ipynb
- jupyter notebook with the code for generating descriptors(features) for the molecues, along with their initial filteringXGBOOST_training.ipynb
- jupyter notebook for training and testing XGBOOST modelsXGBOOST_new_data_predictions.ipynb
- jupyter notebook for loading and applying/testing the XGBOOST models on new dataSHAP_XGBOOST.ipynb
- jupyter notebook for SHAP plots
GNN related
GNN_training_D/P/H.ipynb
- jupyter notebook for training the GNN models for D(dispersive component), P(polar component) or H(hydrogen bond component) parameter. They are separate for readability purposes, but the code is essentialy the same in all three cases.GNN_new_data_predictions.ipynb
- jupyter notebook for loading and applying/testing the GNN models on new data. Make sure you download the models from the above link first and put them in a convinient folder (in the notebook it's in the trained_models/gnn but it's not necesarry, just change themodel_dir
argument to the appropriate path)
Other
dataset_exploratory_analysis.ipynb
- jupyter notebook with the code and visualizations for exploratory analysis of the training and test datasetsplots.ipynb
- jupyter notebook with plots of the results(predictions)sphere.py
- python file for drawing Hansen sphere (.py because then you can rotate and ajust the sphere for best visibility)