[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452

LTluttmann · 2024-09-24T20:30:44Z

Description

Add scheduler for alpha/beta parameters of PrioritizedSampler.

Motivation and Context

Following the suggestions made by @vmoens in issue #1575, this PR adds different Scheduler classes through which the user can adjust the alpha and beta parameters of the PrioritizedSampler during training when using the PrioritizedReplayBuffer. This is explicitly suggested in the paper "Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015. Prioritized experience replay".

The main reason to use separate scheduler classes for the annealing instead of a simple linear annealing (as also suggested by @vmoens in issue #1575) is the greater flexibility for the users. This way, the annealing can take place for example after taking a sample from the replay buffer or after a full training epoch (depending on where the user places the scheduler.step() command). Also, through the LinearScheduler, StepScheduler and LambdaScheduler different annealing schemes can be used (or new ones can be easily created).

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2024-09-24T20:30:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2452

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

PyTorch Testing Nodes Undergoing ROCm 6.2.1 Upgrades

❌ 8 New Failures, 2 Unrelated Failures

As of commit 4b2897a with merge base 33e86c5 ():

NEW FAILURES - The following jobs have failed:

Build Windows Wheels / pytorch/rl / upload / wheel-py3_9-cpu (gh)
##[error]Unable to find any artifacts for the associated workflow
Build Windows Wheels / pytorch/rl / upload / wheel-py3_9-cuda11_8 (gh)
##[error]Unable to find any artifacts for the associated workflow
Build Windows Wheels / pytorch/rl / upload / wheel-py3_9-cuda12_1 (gh)
##[error]Unable to find any artifacts for the associated workflow
Build Windows Wheels / pytorch/rl / upload / wheel-py3_9-cuda12_4 (gh)
##[error]Unable to find any artifacts for the associated workflow
Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
##[error]Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
##[error]Workflow failed! Resource not accessible by integration
Generate documentation / build-docs (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t b6d66214ea1bdc33182508addc2ac08a63c1f308e6b7efb781a38b1322bed048 /exec failed with exit code 1
Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t e8f16a9c9e8d1f27250aa4e5b358d0972d41d04220d09bd46805a8ba7c72d317 /exec failed with exit code 134

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh) (trunk failure)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Windows / unittests-cpu / windows-job (gh) (trunk failure)
RuntimeError: Compiler: cl is not found.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens

Amazing! Great and long awaited feature!
Thanks a mil

torchrl/data/replay_buffers/scheduler.py

vmoens · 2024-09-25T13:28:59Z

torchrl/data/replay_buffers/scheduler.py

+        if self._step_cnt % self.n_steps == 0:
+            return self.operator(current_val, self.gamma)
+        else:
+            return current_val


vmoens · 2024-09-25T13:30:29Z

test/test_rb.py

+    INIT_ALPHA = 0.7
+    INIT_BETA = 0.6
+    GAMMA = 0.1
+    EVERY_N_STEPS = 10
+    LINEAR_STEPS = 100
+    TOTAL_STEPS = 200


let's maybe make these args to the func?

vmoens · 2024-09-25T13:30:49Z

test/test_rb.py

+    expected_alpha_vals = np.linspace(INIT_ALPHA, 0.0, num=LINEAR_STEPS + 1)
+    expected_alpha_vals = np.pad(
+        expected_alpha_vals, (0, TOTAL_STEPS - LINEAR_STEPS), constant_values=0.0
+    )


let's use torch here

vmoens · 2024-09-25T13:31:10Z

test/test_rb.py

+        assert np.isclose(
+            rb.sampler.alpha, expected_alpha_vals[i]
+        ), f"expected {expected_alpha_vals[i]}, got {rb.sampler.alpha}"
+        assert np.isclose(
+            rb.sampler.beta, expected_beta_vals[i]
+        ), f"expected {expected_beta_vals[i]}, got {rb.sampler.beta}"


let's use torch.testing.assert_close

Co-authored-by: Vincent Moens <[email protected]>

vmoens

LGTM thanks

vmoens · 2024-09-30T11:21:04Z

torchrl/data/replay_buffers/scheduler.py

+        self.initial_val = getattr(self.sampler, self.param_name)
+        self._step_cnt = 0
+
+    def state_dict(self):


Oh wow! Ok then...

vmoens · 2024-09-30T11:22:16Z

torchrl/data/replay_buffers/scheduler.py

+    def _step(self):
+        if self._step_cnt < self.num_steps:
+            return self.initial_val + (self._delta * self._step_cnt)
+        else:
+            return self.final_val


yeah that's fine, maybe let's add a comment to let someone know in the future that this should be fixed

torchrl/data/replay_buffers/scheduler.py

vmoens

LGTM thanks

LTluttmann added 2 commits September 24, 2024 19:36

[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler

e2337ef

[Doc] Improve docstrings of samplers

b4dca1b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024

vmoens reviewed Sep 25, 2024

View reviewed changes

vmoens added the enhancement New feature or request label Sep 25, 2024

LTluttmann and others added 2 commits September 26, 2024 08:52

Apply suggestions from code review

5aa2a05

Co-authored-by: Vincent Moens <[email protected]>

[Feature] allow for tensor-type parameters in ParameterScheduler

915d1c4

vmoens reviewed Sep 30, 2024

View reviewed changes

vmoens approved these changes Sep 30, 2024

View reviewed changes

Apply suggestions from code review

4b2897a

vmoens merged commit 5851652 into pytorch:main Sep 30, 2024
69 of 79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452

[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452

LTluttmann commented Sep 24, 2024

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

vmoens left a comment

vmoens Sep 25, 2024

vmoens Sep 25, 2024

vmoens Sep 25, 2024

vmoens Sep 25, 2024

vmoens left a comment

vmoens Sep 30, 2024

vmoens Sep 30, 2024

vmoens left a comment

[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452

[Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler #2452

Conversation

LTluttmann commented Sep 24, 2024

Description

Motivation and Context

Types of changes

Checklist

pytorch-bot bot commented Sep 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2452

❗ 1 Active SEVs

❌ 8 New Failures, 2 Unrelated Failures

vmoens left a comment

Choose a reason for hiding this comment

vmoens Sep 25, 2024

Choose a reason for hiding this comment

vmoens Sep 25, 2024

Choose a reason for hiding this comment

vmoens Sep 25, 2024

Choose a reason for hiding this comment

vmoens Sep 25, 2024

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens Sep 30, 2024

Choose a reason for hiding this comment

vmoens Sep 30, 2024

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading