Evaluating Memory and Credit Assignment in Memory-Based RL

This is the official code for the paper (Section 5.1 & 5.2: discrete control)

"When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment", NeurIPS 2023 (oral)

by Tianwei Ni, Michel Ma, Benjamin Eysenbach, and Pierre-Luc Bacon.

Please switch to the branch to check the code for Section 5.3 (PyBullet continuous control).

Modular Design

The code has a modular design which requires three configuration files. We hope that such design could facilitate future research on different environments, RL algorithms, and sequence models.

config_env: specify the environment, with config_env.env_name specifying the exact (memory / credit assignment) length of the task
- Passive T-Maze (this work)
- Active T-Maze (this work)
- Passive Visual Match (based on [Hung et al., 2018])
- Key-to-Door (based on [Raposo et al., 2021])
config_rl: specify the RL algorithm and its hyperparameters
- DQN (with epsilon greedy)
- SAC-Discrete (we find --freeze_critic can prevent gradient explosion, see the discussion in Appendix C.1 in the latest version of the arXiv paper).
config_seq: specify the sequence model and its hyperparameters including training sequence length config_seq.sampled_seq_len and number of layers --config_seq.model.seq_model_config.n_layer
- LSTM [Hochreiter and Schmidhuber, 1997]
- Transformer (GPT-2) [Radford et al., 2019]

Installation

We use python 3.7+ and list the basic requirements in requirements.txt.

Reproducing the Results

Below are example commands to reproduce the main results shown in Figure 3 and 6. For the ablation results, please adjust the corresponding hyperparameters.

To run Passive T-Maze with a memory length of 50 with LSTM-based agent:

python main.py \
    --config_env configs/envs/tmaze_passive.py \
    --config_env.env_name 50 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 20000 \
    --config_seq configs/seq_models/lstm_default.py \
    --config_seq.sampled_seq_len -1 \

To run Passive T-Maze with a memory length of 1500 with Transformer-based agent:

python main.py \
    --config_env configs/envs/tmaze_passive.py \
    --config_env.env_name 1500 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 6700 \
    --config_seq configs/seq_models/gpt_default.py \
    --config_seq.sampled_seq_len -1 \

To run Active T-Maze with a memory length of 20 with Transformer-based agent:

python main.py \
    --config_env configs/envs/tmaze_active.py \
    --config_env.env_name 20 \
    --config_rl configs/rl/dqn_default.py \
    --train_episodes 40000 \
    --config_seq configs/seq_models/gpt_default.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \
    --config_seq.model.seq_model_config.n_head 2 \

To run Passive Visual Match with a memory length of 60 with Transformer-based agent:

python main.py \
    --config_env configs/envs/visual_match.py \
    --config_env.env_name 60 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 40000 \
    --config_seq configs/seq_models/gpt_cnn.py \
    --config_seq.sampled_seq_len -1 \

To run Key-to-Door with a memory length of 120 with LSTM-based agent:

python main.py \
    --config_env configs/envs/keytodoor.py \
    --config_env.env_name 120 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 40000 \
    --config_seq configs/seq_models/lstm_cnn.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \

To run Key-to-Door with a memory length of 250 with Transformer-based agent:

python main.py \
    --config_env configs/envs/visual_match.py \
    --config_env.env_name 250 \
    --config_rl configs/rl/sacd_default.py \
    --shared_encoder --freeze_critic \
    --train_episodes 30000 \
    --config_seq configs/seq_models/gpt_cnn.py \
    --config_seq.sampled_seq_len -1 \
    --config_seq.model.seq_model_config.n_layer 2 \
    --config_seq.model.seq_model_config.n_head 2 \

The train_episodes of each task is specified in budget.py.

By default, the logging data will be stored in logs/ folder with csv format. If you use --debug flag, it will be stored in debug/ folder.

Logging and Plotting

After the logging data is stored, you can plot the learning curves and aggregation plots (e.g., Figure 3 and 6) using vis.ipynb jupyter notebook.

We also provide our logging data used in the paper shared in google drive (< 400 MB).

Acknowledgement

The code is largely based on prior works:

Questions

If you have any questions, please raise an issue (preferred) or send an email to Tianwei ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
buffers		buffers
configs		configs
envs		envs
policies		policies
torchkit		torchkit
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
budget.py		budget.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
vis.ipynb		vis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Memory and Credit Assignment in Memory-Based RL

Modular Design

Installation

Reproducing the Results

Logging and Plotting

Acknowledgement

Questions

About

Releases

Packages

Languages

License

twni2016/Memory-RL

Folders and files

Latest commit

History

Repository files navigation

Evaluating Memory and Credit Assignment in Memory-Based RL

Modular Design

Installation

Reproducing the Results

Logging and Plotting

Acknowledgement

Questions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages