GitHub - CAI23sbP/RecurrentRLHF: Custom imitation library for making RecurrentReward model. I used GRU.

Custom imitation library for making a RecurrentReward model.

I used GRU.

Addition

To support Dict type observation, added dict_preference.py and dict_reward_nets.py.

GRU reward net: you can use dict type obs

Non-GRU reward net: you can use dict type obs

How to Train

No Ensembling train python3 train.py

Ensembling train python3 train_ensemble.py

setting your reward network

go to test_net and modify your code

Test env list

CartPole

BipedalWalker

Pendulum-v1

MountainCar

library compatibility

torch: 1.13.1+cu116

imitation: 1.0.0

stable_baselines3: 2.3.0

result in MountainCar (w/o variable horizon and w/o ensemble)

[Note]

To make a fixed horizon, i used a AbsorbAfterDoneWrapper which is from seals.

All parameters are same, except which has recurrent neural network or not.

GRU

GRU_reward_50-episode-1.mp4

No GRU

Non_GRU_reward_50-episode-1.mp4

result in MountainCar(w/o variable horizon and w/ ensemble)

GRU w/ ensemble

GRU_reward-episode-0.mp4

No GRU w/ ensemble

Non_GRU_reward-episode-0.mp4

My sister project (related to GRU)

GRU-PPO for stable-baselines3 (or contrib) library

https://github.com/CAI23sbP/GRU_AC/tree/master

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
__pycache__		__pycache__
common		common
custom_envs		custom_envs
test_net		test_net
README.md		README.md
custom_preference.py		custom_preference.py
dict_preference.py		dict_preference.py
recurrent_preference.py		recurrent_preference.py
train.py		train.py
train_ensemble.py		train_ensemble.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Addition

How to Train

setting your reward network

Test env list

library compatibility

result in MountainCar (w/o variable horizon and w/o ensemble)

result in MountainCar(w/o variable horizon and w/ ensemble)

My sister project (related to GRU)

About

Releases

Packages

Languages

CAI23sbP/RecurrentRLHF

Folders and files

Latest commit

History

Repository files navigation

Addition

How to Train

setting your reward network

Test env list

library compatibility

result in MountainCar (w/o variable horizon and w/o ensemble)

result in MountainCar(w/o variable horizon and w/ ensemble)

My sister project (related to GRU)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages