Code for the paper "Pessimistic policy iteration with bounded uncertainty".

BUP is an uncertainty-based offline RL algorithm based on TD3 algorithm.

This code is built on the official TD3_BC repository and referred to the EDAC code for ensemble critic.

Running main_uncertain.py to reproduce the results in the paper:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
TD3_ensembles.py		TD3_ensembles.py
main_uncertain.py		main_uncertain.py
utils.py		utils.py

Provide feedback