Skip to content

This Challenge aims to infer important COVID-19 public health risk factors from outdated data in South Africa

Notifications You must be signed in to change notification settings

oyewunmio/1ST-PLACE-South-African-COVID-19-Vulnerability-Map-Hackathon

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

South African COVID-19 Vulnerability Map Hackathon

This is the description of our solution in this hackathon that achieved 1st place on the private leaderboard with a score of 3.5123. The challenge itself was to determine the most vulnerable wards in South Africa due to the CoVID-19 pandemic using old data.

Our approach :

We quickly noticed that the data was small, and since it was old and census data, we figured it was going to be messy. All in all, we did some data cleaning in order to eliminate data points that could seriously damage the model's performance. We also applied clustering early on, and then we went on to try a considerable number of feature interactions since all features were percentages except a couple. We also looked at the target's behavior. The interactions were tried by probing the leaderboard and seeing their effectiveness one by one since the validation score was not reliable at all. In the end, we came up with over a hundred features out of the original ones but we settled with a few that we handpicked. And finally we applied PCA to wrap it all up for some dimensionality reduction.

Our model was a single xgboost ( We tried lightgbm and catboost early on, but xgboost seemed to outperform both of them in this particular challenge ) that was manually tuned and tested over and over again. We went for a single strong model rather than a number of weak models and an ensemble which paid off.

Mistakes and insights :

Never give up trying even at the end of a challenge. We basically kept 1st place in the last hour before the competition ended or we would have placed 2nd.

Do not hesitate to try ideas that seem crazy or useless in the context of such a challenge given the amount of submissions we were given ( Some of the stupidest of ours worked and got us a better score ).

A tip in hackathons is to always set up a quick baseline with raw features and a raw model, set up a score and try to beat that in every run you do.

About

This Challenge aims to infer important COVID-19 public health risk factors from outdated data in South Africa

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%