Skip to content

Custom imitation library for making RecurrentReward model. I used GRU.

Notifications You must be signed in to change notification settings

CAI23sbP/RecurrentRLHF

Repository files navigation

Custom imitation library for making a RecurrentReward model.

I used GRU.

Addition

To support Dict type observation, added dict_preference.py and dict_reward_nets.py.

GRU reward net: you can use dict type obs

Non-GRU reward net: you can use dict type obs

How to Train

No Ensembling train python3 train.py

Ensembling train python3 train_ensemble.py

setting your reward network

go to test_net and modify your code

Test env list

CartPole

BipedalWalker

Pendulum-v1

MountainCar

library compatibility

torch: 1.13.1+cu116

imitation: 1.0.0

stable_baselines3: 2.3.0

result in MountainCar (w/o variable horizon and w/o ensemble)

[Note]

To make a fixed horizon, i used a AbsorbAfterDoneWrapper which is from seals.

All parameters are same, except which has recurrent neural network or not.

image

  1. GRU
GRU_reward_50-episode-1.mp4
  1. No GRU
Non_GRU_reward_50-episode-1.mp4

result in MountainCar(w/o variable horizon and w/ ensemble)

  1. GRU w/ ensemble
GRU_reward-episode-0.mp4
  1. No GRU w/ ensemble
Non_GRU_reward-episode-0.mp4

My sister project (related to GRU)

GRU-PPO for stable-baselines3 (or contrib) library

https://github.com/CAI23sbP/GRU_AC/tree/master

About

Custom imitation library for making RecurrentReward model. I used GRU.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages