Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

Open
CAI23sbP opened this issue Apr 24, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@CAI23sbP
Copy link

CAI23sbP commented Apr 24, 2024

Problem

A Preference based Reinforcement learning at a POMDP problem.
In paper, A author said that a reward model can apply a recurrent neural network for solving the POMDP problem.

Solution

I added a GRU for solving the POMDP problem. Please see my repo
My main idea :

  1. BufferingWrapper and RewardVecEnvWrapper must be merged for saving hidden_state with observation, action and etc...
  2. To apply a Recurrent reward network ensembling, I generated hidden_states whose number are same to ensemble_size.

result

I applied this in BipedalWalker-v3 env with AbsorbAfterDoneWrapper from your sister project seals
image

Addition

I added dict_preference.py for using dict type observation space.

@CAI23sbP CAI23sbP added the enhancement New feature or request label Apr 24, 2024
@CAI23sbP CAI23sbP changed the title Preference based Reinforcement Learning applies "recurrent reward network" for solving a POMDP problem Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant