This project was developed as a part of the course 'Project management and tools for health informatics' at Stockholm University under the Global Master's in Health Informatics Programme. This programme was a joint collaboration between Karolinska Institutet and Stockholm University in Sweden.
The aims of this course were to develop expertise to:
• Explain basic project management methods • Be able to account for success factors in health informatics projects • Understand basic methods in the field of data mining • Explain process models for data mining projects • Explain how language technology methods can be used for analysis of clinical text • Explain the difference between rule-based methods and machine learning methods
To develop skills and expertise to:
• Apply basic project management methods • Work in an international multidisciplinary project group • Independent lead and implement a limited project in health informatics • Document the steps in the design of a prototype for a health information project • Apply the steps in a process model for data mining projects • Apply methods in the field of text mining to various types of health informational problems
To conduct scientific study using appropriate evaluation techniques and:
• Explain and argue for their positions regarding the implementation of a health information project • Explain how to work with sensitive health information in a safe and ethical manner
(More information about this course can be found at here)
During this project, I worked in an multi-disciplinary group consisting of physicians, nurses, physiotherapists, biomedical scientists and software engineers to develop a heart prediction system using data from [UCI Machine Learning Repository] (https://archive.ics.uci.edu/ml/datasets/heart+disease). This team-based project consisted of the following steps:
- Description of the data and understanding the features in the dataset
- Data pre-processing (dealing with missing values, continuous to discrete values, dichotomize target variable, convert variables to factors and remove insignificant attributes)
- Tested classifiers - (Gradient Boosting, Random Forest, Logistic Regression, Support Vector Machine and Naive Bayesian)
- Selected optimal combination of datasets and trained using splitting criteria
- Evaluated using the metrics: Accuracy, Precision, Recall, F-Measure, AUC
The outcome of this project was a fully functional heart disease prediction system which was developed in R. The user interface to input the values for testing was developed using R Shiny. We were awarded a perfect A grade for this project with the feedback that the system we presented was one of the best in our class. The report on this project can be found here for reference.