Skip to content

Latest commit

 

History

History
79 lines (62 loc) · 4.88 KB

README.md

File metadata and controls

79 lines (62 loc) · 4.88 KB

Offline Reinforcement Learning with Classification

arXiv

The repository organisation is inspired by CORL and ReBRAC repositories.

Dependencies & Docker setup

To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:

python3 install -r requirements.txt

However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)

docker build -t clorl .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>:/workspace/" \
    --name clorl \
    clorl bash

How to reproduce experiments

Training

Configs for reproducing results of original algorithms are stored in the configs/<algorithm_name>/<task_type>. All avaialable hyperparameters are listed in the src/algorithms/<algorithm_name>.py. Implemented algorithms are: rebrac, iql, lb-sac.

Configs for reproducing results of algorithms with classification are stored in configs/<algorithm_name>-ce/<task_type>, configs/<algorithm_name>-ce-ct/<task_type>, configs/<algorithm_name>-ce-at/<task_type>. The notation (the same in the paper): ce denotes the replacement of MSE with cross-entropy, ce-at denotes cross-entropy with tuned algorithm parameters, ce-ct denotes cross-entropy with tuned classification parameter. All available hyperparameters are listed in the src/algorithms/<algorithm_name>_cl.py. Implemented algorithms are: rebrac, iql, lb-sac.

For example, to start ReBRAC+classification training process with D4RL halfcheetah-medium-v2 dataset, run the following:

PYTHONPATH=. python3 src/algorithms/rebrac_cl.py --config_path="configs/rebrac-ce/halfcheetah/medium_expert_v2.yaml"

Targeted Reproduction

We provide Weights & Biases logs for all of our experiments here.

If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps.

Paper element Sweeps path (we omit the common prefix configs/sweeps/)
Tables 1, 2, 3, 16, 17, 18 eval/<algorithm_name>.yaml, eval/<algorithm_name>-ce.yaml, eval/<algorithm_name>-ce-at.yaml, eval/<algorithm_name>-ce-ct.yaml, eval/<algorithm_name>-ce-mt.yaml
Figure 2 All sweeps from expand
Figure 3 All sweeps from network_sizes
Hyperparameters tuning All sweeps from tuning

Reliable Reports

We also provide a script and binary data for reconstructing the graphs and tables from our paper: plotting/plotting.py. We repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.

Citing

If you use this code for your research, please consider the following bibtex:

@article{tarasov2024value,
  title={Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?},
  author={Tarasov, Denis and Brilliantov, Kirill and Kharlapenko, Dmitrii},
  journal={arXiv preprint arXiv:2406.06309},
  year={2024}
}