My Solutions for Machine Learning for Bioinformatics Course (Graduate Course) Assignments and Research Project.
The project is about the prediction of binding between proteins and drugs. In phase 1, this is done by machine learning algorithms (XGBoost).
In phase 2, the problem was to improve the limitations of the well-known model DeepDTA. By reading the state-of-the-art paper GraphDTA, which takes advantage of Graph Neural Networks, I modified DeepDTA by implementing LSTM to learn protein sequence (as DeepDTA doesn't take the sequential nature of target amino-acid structures into account) and graph convolutional network to learn drug structure. Also, I applied some interpretability methods to analyze the network learned on data and got valuable insights that the learned model is overly dependent on the drugs without a reasonable focus on the proteins.
My literature review on the topic consisted:
- DeepDTA: Deep Drug-Target Binding Affinity Prediction (arxiv)
- GraphDTA: prediction of drug–target binding affinity using graph convolutional networks (bioarxiv)
- DeepGS: Deep Representation Learning of Graphs and Sequences for Drug-Target Binding Affinity Prediction (arxiv)
- Saliency Maps DNN Interpretation or: Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (arxiv)
- Guided Back Propagation DNN Interpretation or:STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET (arxiv)
- LRP DNN Interpretation or:On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation (arxiv)
- Microsoft Research Tutorials on Graph Neural Networks. (link), (link)
- PyTorch Geometric Extension Library. (link)
Covered topics:
- Autoencoders
- VAE (Theory & Implementation) (arxiv)
- GAN (arxiv), Wasserstein GAN, Mode Collapse and Mini Batch Discrimination (arxiv), (link)
- RNN, LSTM (Theory & Implementation)
Covered topics:
- Hidden Markov Models
- Deep Learning Basics (Also a more rigorous view on Batch Normalization by the paper: How Does Batch Normalization Help Optimization? (arxiv) and SGD Optimization in Over-parameterized Network by: A Convergence Theory for Deep Learning via Over-Parameterization (arxiv))
- Universal Approximation of Neural Networks
- MLP Implementation from Scratch
- Reading and Implementation of ResNet Paper with PyTorch.(arxiv) Also a more rigorous view on ResNet by the papers: Visualizing the Loss Landscape of Neural Nets (arxiv) and Deep Residual Networks, Deep Learning Gets Way Deeper by Kaiming He.(link)
Covered topics:
- PCA (Theory & Implementation), ICA
- K-Means
- GMM (Theory & Implementation), Expectation Maximization and Variational Lower Bound
- Reading t-SNE paper (link)
Covered topics:
- Ensemble Learning, Bagging, Boosting such as Random Forest, AdaBoost (Theory & Implementation)
- Feature Selection (Bayesian Networks, Markov Blanket, and d-separation - LASSO Regularizer)
Covered topics:
- Perceptron (Theory & Implementation)
- Support Vector Machine (Theory & Implementation)
- Kernel Methods
Covered topics:
- Basics of Information Theory
- Decision Tree (Theory & Implementation)
- KNN (Theory & Implementation)
- Hypothesis Testing of The Performance of the Models
Covered topics:
- Review of Multivariable calculus
- Review of Linear Algebra
- Review of Probability & Statistics