-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2452
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 8 New Failures, 2 Unrelated FailuresAs of commit 4b2897a with merge base 33e86c5 (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Great and long awaited feature!
Thanks a mil
if self._step_cnt % self.n_steps == 0: | ||
return self.operator(current_val, self.gamma) | ||
else: | ||
return current_val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
test/test_rb.py
Outdated
INIT_ALPHA = 0.7 | ||
INIT_BETA = 0.6 | ||
GAMMA = 0.1 | ||
EVERY_N_STEPS = 10 | ||
LINEAR_STEPS = 100 | ||
TOTAL_STEPS = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's maybe make these args to the func?
test/test_rb.py
Outdated
expected_alpha_vals = np.linspace(INIT_ALPHA, 0.0, num=LINEAR_STEPS + 1) | ||
expected_alpha_vals = np.pad( | ||
expected_alpha_vals, (0, TOTAL_STEPS - LINEAR_STEPS), constant_values=0.0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use torch here
test/test_rb.py
Outdated
assert np.isclose( | ||
rb.sampler.alpha, expected_alpha_vals[i] | ||
), f"expected {expected_alpha_vals[i]}, got {rb.sampler.alpha}" | ||
assert np.isclose( | ||
rb.sampler.beta, expected_beta_vals[i] | ||
), f"expected {expected_beta_vals[i]}, got {rb.sampler.beta}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use torch.testing.assert_close
Co-authored-by: Vincent Moens <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
self.initial_val = getattr(self.sampler, self.param_name) | ||
self._step_cnt = 0 | ||
|
||
def state_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow! Ok then...
def _step(self): | ||
if self._step_cnt < self.num_steps: | ||
return self.initial_val + (self._delta * self._step_cnt) | ||
else: | ||
return self.final_val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's fine, maybe let's add a comment to let someone know in the future that this should be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
Description
Add scheduler for alpha/beta parameters of PrioritizedSampler.
Motivation and Context
close #1575
Following the suggestions made by @vmoens in issue #1575, this PR adds different Scheduler classes through which the user can adjust the alpha and beta parameters of the
PrioritizedSampler
during training when using thePrioritizedReplayBuffer
. This is explicitly suggested in the paper "Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015. Prioritized experience replay".The main reason to use separate scheduler classes for the annealing instead of a simple linear annealing (as also suggested by @vmoens in issue #1575) is the greater flexibility for the users. This way, the annealing can take place for example after taking a sample from the replay buffer or after a full training epoch (depending on where the user places the
scheduler.step()
command). Also, through theLinearScheduler
,StepScheduler
andLambdaScheduler
different annealing schemes can be used (or new ones can be easily created).Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!