Skip to content
/ ClORL Public

Authors' implementation of the "Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?"

License

Notifications You must be signed in to change notification settings

DT6A/ClORL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Offline Reinforcement Learning with Classification

arXiv

The repository organisation is inspired by CORL and ReBRAC repositories.

Dependencies & Docker setup

To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:

python3 install -r requirements.txt

However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)

docker build -t clorl .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>:/workspace/" \
    --name clorl \
    clorl bash

How to reproduce experiments

Training

Configs for reproducing results of original algorithms are stored in the configs/<algorithm_name>/<task_type>. All avaialable hyperparameters are listed in the src/algorithms/<algorithm_name>.py. Implemented algorithms are: rebrac, iql, lb-sac.

Configs for reproducing results of algorithms with classification are stored in configs/<algorithm_name>-ce/<task_type>, configs/<algorithm_name>-ce-ct/<task_type>, configs/<algorithm_name>-ce-at/<task_type>. The notation (the same in the paper): ce denotes the replacement of MSE with cross-entropy, ce-at denotes cross-entropy with tuned algorithm parameters, ce-ct denotes cross-entropy with tuned classification parameter. All available hyperparameters are listed in the src/algorithms/<algorithm_name>_cl.py. Implemented algorithms are: rebrac, iql, lb-sac.

For example, to start ReBRAC+classification training process with D4RL halfcheetah-medium-v2 dataset, run the following:

PYTHONPATH=. python3 src/algorithms/rebrac_cl.py --config_path="configs/rebrac-ce/halfcheetah/medium_expert_v2.yaml"

Targeted Reproduction

We provide Weights & Biases logs for all of our experiments here.

If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps.

Paper element Sweeps path (we omit the common prefix configs/sweeps/)
Tables 1, 2, 3, 16, 17, 18 eval/<algorithm_name>.yaml, eval/<algorithm_name>-ce.yaml, eval/<algorithm_name>-ce-at.yaml, eval/<algorithm_name>-ce-ct.yaml, eval/<algorithm_name>-ce-mt.yaml
Figure 2 All sweeps from expand
Figure 3 All sweeps from network_sizes
Hyperparameters tuning All sweeps from tuning

Reliable Reports

We also provide a script and binary data for reconstructing the graphs and tables from our paper: plotting/plotting.py. We repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.

Citing

If you use this code for your research, please consider the following bibtex:

@article{tarasov2024value,
  title={Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?},
  author={Tarasov, Denis and Brilliantov, Kirill and Kharlapenko, Dmitrii},
  journal={arXiv preprint arXiv:2406.06309},
  year={2024}
}

About

Authors' implementation of the "Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published