This package is a Recurrent Behavior Cloning.
And it is compatible with Imitation library
It is okay to use a expert dataset which is from human , whether it has recurrent state or not (like lstm_state or gru_state).
python3 train_gru_bc.py
BC loss (ent_weight = 1e-3 , l2_weight = 0.0)
Pytorch == 1.12.1
Stable-baselines3 == 2.0.0
Sb3-contrib == 2.0.o
Imitation == 1.0.0
RecurrentRLHF (Preference based RL with Recurrent reward model)
GRU_AC (Actor-critic or Proximal Policy Optimizer with GRU)
BipedalWalker policy's hyper-parameter [git repo]