You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PPO is a method of reinforcement learning. However app, maxent and gail are all inverse reinforcement learning method. Due to the emergence of policy-based inverse reinforcement learning algorithms, you can use PPO with any inverse reinforcement learning algorithm to complete the training.
References:
Ng A Y, Russell S J. Algorithms for inverse reinforcement learning[C]//Icml. 2000, 1: 2.
Ho J, Gupta J, Ermon S. Model-free imitation learning with policy optimization[C]//International Conference on Machine Learning. PMLR, 2016: 2760-2769.
hi, how am i supposed to save expert demo in ppo main?
The text was updated successfully, but these errors were encountered: