This project tries to predict the unemployment rate in the euro zone based on macroeconomic data issued by ECB.
The main motivation for this project is to familiarize ourselves with machine learning on time series data.
Jump to Latest results
The source for our data sets is the ECB's statistics portal, exclusively.
We use a variety of macroeconomic metrics such as the unemployment data, the GDP (at market price), or the population.
We do not expect much from "predicting" the unemployment rate using a simple Gaussian process approach based on historical unemployment rate data. After all, other than a general feeling for the variance of the unemployment rate, there is not much to learn for the model. We do it anyway in order to familiarize ourselves with Gaussian processes, and we might need it in the future to interpolate data where there is a lack of it.
As expected, while a Gaussian process with a Matérn kernel (
Interpolation works okay-ish if the gaps are not too wide:
See unemployment_gp_trainer.py for the code.
(Source)
Given a time series of the unemployment rate the LSTM is trained to predict the next unemployment rate. In a first step we only train it with recent unemployment rates and will look into pouring more macroeconomic data into the model in later stages. We do not expect good predictions but the LSTM should be able to learn some form of correlation lengths similar to the gaussian process above.
The following figure shows the output of the LSTM model (hidden state dimension: 32) trained on the time series up to 2022-01-01.
The model is very good at fitting the data it was trained on. Predictions about the "future" after 2022-01-01 are not bad but may be subject to a favourable cut-off point for test/training data. It is safe to say, that some sort of correlation length was learned and is used in the regime where the model has not been trained on.
(Source)
As a next step we use the same LSTM model with on a higher-dimensional dataset.
Apart from the unemployment rate we include the timestamps, ECB's key interest rate, and the Euro Stoxx 50 index.
On a technical level, the LSTM model tries to predict a 4-dimensional vector at time
While the LSTM model does fit the training data quite well, its predictions about the future are questionable.
The following figures present the LSTM's predictions on the different features for both, train and test regions.
Looking a little bit more closely at the regime where the model did not have training data, we see good predictions for the ECB's key interest and the Euro Stoxx 50 index but rather bad or exaggerated predictions for the unemployment rate.
These results are rather flaky. The LSTM will predict very different things if we start the test-range a little earlier or if we stop the training at a different point (with marginal differences in the overall loss). This may indicate that there is little covariance between unemployment rate, Euro Stoxx 50, and the key interest rate.
The repo uses black and mypy among other things. Make sure
Install dev dependencies:
pip install .[dev]