This repository contains implementations of various offline value-based deep reinforcement learning algorithms. The algorithms are implemented in PyTorch and are based on the following papers:
Create an environment with Python 3.10, install poetry and subsequently the package using the code below from the root directory of the repository:
pip install poetry
poetry install
All methods apply different techniques to combat the overestimation bias of Q-learning. The algorithms are tested on a simple environment CartPole-v1
from Gymnasium. The results are shown below:
Average reward: 22.0
Average reward: 151.8
Average reward: 246.1
Average reward: 195.7
Average reward: 266.3