Skip to content

Deep offline reinforcement algorithms implemented in PyTorch

Notifications You must be signed in to change notification settings

maxzw/offline_rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Offline deep reinfocement learning

This repository contains implementations of various offline value-based deep reinforcement learning algorithms. The algorithms are implemented in PyTorch and are based on the following papers:

Installation

Create an environment with Python 3.10, install poetry and subsequently the package using the code below from the root directory of the repository:

pip install poetry
poetry install

Results

All methods apply different techniques to combat the overestimation bias of Q-learning. The algorithms are tested on a simple environment CartPole-v1 from Gymnasium. The results are shown below:

Random agent

Average reward: 22.0

Double Q-learning

Average reward: 151.8

double_dqn

Clipped Double Q-learning

Average reward: 246.1

clipped_dqn

Multi Q-learning

Average reward: 195.7

multi_dqn

Quantile Regression Q-learning

Average reward: 266.3

quantile_dqn

About

Deep offline reinforcement algorithms implemented in PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published