Offline deep reinfocement learning

This repository contains implementations of various offline value-based deep reinforcement learning algorithms. The algorithms are implemented in PyTorch and are based on the following papers:

Installation

Create an environment with Python 3.10, install poetry and subsequently the package using the code below from the root directory of the repository:

pip install poetry
poetry install

Results

All methods apply different techniques to combat the overestimation bias of Q-learning. The algorithms are tested on a simple environment CartPole-v1 from Gymnasium. The results are shown below:

Random agent

Average reward: 22.0

Double Q-learning

Average reward: 151.8

Clipped Double Q-learning

Average reward: 246.1

Multi Q-learning

Average reward: 195.7

Quantile Regression Q-learning

Average reward: 266.3

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/offline_rl		src/offline_rl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline deep reinfocement learning

Installation

Results

Random agent

Double Q-learning

Clipped Double Q-learning

Multi Q-learning

Quantile Regression Q-learning

About

Releases

Packages

Languages

maxzw/offline_rl

Folders and files

Latest commit

History

Repository files navigation

Offline deep reinfocement learning

Installation

Results

Random agent

Double Q-learning

Clipped Double Q-learning

Multi Q-learning

Quantile Regression Q-learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages