This repository contains a set of R scripts and the data illustrating the possibility to model with a random forest hourly electricity generation from renewable sources using the latest climate reanalysis from the Copernicus Climate Change Service (C3S). This work was supposed to be part of a scientific paper but then I have opted for a public repository.
The random forests models the hourly generation of run-of-river hydro-power, wind onshore and solar power using six different essential climate variables: runoff (ro
), snow depth (sd
), surface solar radiation downwards (ssrd
), 2-meter temperature (t2m
), wind speed at 10 meters (ws10
) and wind speed at 100 meters (ws100
).
This repository contains all the code and the data to run from scratch the analysis. The main folder contains all the markdown, while in src
there is all the R files and the data. The markdown shows the commented code outputs and it has been generated with the powerful Knitr's spin
.
There are six scripts in this repository:
- 01_hyperparameters_tuning: tuning of the hyperparameters of the random forest
- 02_compute_errors: calculation of the modelling errors
- 03_plot_errors: plotting the errors
- 04_analysis_importance: defining which are the most important variables
- 05_analysis_response: defining the impact of each single predictor using DALEX
- 055_plot_response: plot of the responses defined in the previous step
This random forest-based approach, in spite of its simplicity, leads in many cases to an error between 10%-20% (normalised mean absolute error) for the hourly generation. For example this is the error using the predictors aggregated at country level (NUTS0), the results improve using smaller aggregations:
Moreover, this data-driven approach could be used to analyse the relationship between meteorological predictors and generations, for example analysing what is the impact of wind speed on the European-wide wind power...
...or classifying the importance of predictors for each type of generation:
This work is based on two datasets:
- ERA-NUTS: time-series based on ERA5 reanalysis for all the European regions using the NUTS 2016 classification. The original data has been split in single feather files in the folder
src/ERA5-NUTS-2015-2018
. - Electricity generation data: the data is coming from the ENTSO-E Transparency platform for run-of-river hydro-power (
Hydro Run-of-river and poundage
) and from Open Power System Data time-series for wind and solar. The data has been split for country and generation type in feather files in the foldersrc/ts_prod
.
To run the code you need:
- R
- some famous packages like the
tidyverse
meta-package,lubridate
,randomForest
andfeather
- the package
DALEX
, possibly a recent version (I have used the 0.4.4)
This code is inspired by the work done during the C3S ECEM project (see this paper by Troccoli et al. ). This repository can be cited using the DOI 10.5281/zenodo.3458548
If you need further information, additional code or data, you can contact me or you can create an Issue in the repository.