Initial hypothesis: Causes of bee hive die-off between states are different enough that different disease prevention tactics should be determined by region or sub-region.
Conclusion: Using linear ridge regression and random forest regression with multiple combinations of independent variables as predictors demonstrates that region and sub-region data do not have significant predictive power in determining causes of colony loss. More robust data collection of colony stressors are required for future predictive models.
USDA data collected by Cornell University
Partially cleaned USDA data on Kaggle
Initial data: /raw_data
Project proposal: Unit 7 - Capstone Data project proposal (single project).pdf
Data wrangling and cleaning: Bee Colony Capstone - data cleaning.ipynb
Exploratory data analysis: Bee Colony Capstone - Exploratory Data Analysis.ipynb
Pre-processing: Bee Colony Capstone - Preprocessing.ipynb
Initial attempts at time series analysis: Bee Colony Capstone - first pass at modeling.ipynb
Regression models: Bee Colony Capstone - regression models.ipynb
Slide deck presentation: Capstone_presentation_bee_colony_data.pdf
Final report: Final report - bee colony capstone.pdf
Remainder of files are csv files of processed data between stages of analysis.