Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Accelerate slice sampler on GPU #2672

Merged
merged 3 commits into from
Dec 20, 2024

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Dec 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2672

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 7 Unrelated Failures

As of commit 563e4a9 with merge base 133d709 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Dec 19, 2024
ghstack-source-id: c34ded33943cded7eea1d74b43498c83549ec94c
Pull Request resolved: #2672
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 19, 2024
ghstack-source-id: 8d8845925a039e8c710d97c3f801a2ccfaacf3f1
Pull Request resolved: #2672
Copy link

github-actions bot commented Dec 19, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4424s 0.4279s 2.3368 Ops/s 2.2485 Ops/s $\color{#35bf28}+3.92\%$
test_transformed 0.6059s 0.6008s 1.6645 Ops/s 1.6393 Ops/s $\color{#35bf28}+1.54\%$
test_serial 1.3517s 1.3501s 0.7407 Ops/s 0.7156 Ops/s $\color{#35bf28}+3.50\%$
test_parallel 1.2934s 1.2127s 0.8246 Ops/s 0.8165 Ops/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-True-True-True-True] 0.1610ms 29.6840μs 33.6882 KOps/s 32.0350 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_step_mdp_speed[True-True-True-True-False] 57.9790μs 18.0357μs 55.4457 KOps/s 54.7899 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-True-True-False-True] 48.7610μs 17.2650μs 57.9207 KOps/s 56.6509 KOps/s $\color{#35bf28}+2.24\%$
test_step_mdp_speed[True-True-True-False-False] 48.0940μs 10.2103μs 97.9406 KOps/s 96.0317 KOps/s $\color{#35bf28}+1.99\%$
test_step_mdp_speed[True-True-False-True-True] 74.8470μs 32.8957μs 30.3991 KOps/s 30.0881 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-True-False-True-False] 56.5760μs 20.1785μs 49.5577 KOps/s 49.1524 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-True-False-False-True] 51.9570μs 19.4796μs 51.3357 KOps/s 50.8685 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[True-True-False-False-False] 68.7990μs 12.0097μs 83.2662 KOps/s 79.9156 KOps/s $\color{#35bf28}+4.19\%$
test_step_mdp_speed[True-False-True-True-True] 0.1002ms 34.3205μs 29.1371 KOps/s 28.5481 KOps/s $\color{#35bf28}+2.06\%$
test_step_mdp_speed[True-False-True-True-False] 75.9190μs 22.1121μs 45.2241 KOps/s 45.0059 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-False-True-False-True] 72.0450μs 19.3556μs 51.6646 KOps/s 51.1307 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-False-True-False-False] 35.2060μs 11.9587μs 83.6208 KOps/s 80.8341 KOps/s $\color{#35bf28}+3.45\%$
test_step_mdp_speed[True-False-False-True-True] 71.7350μs 36.6650μs 27.2739 KOps/s 27.2123 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-False-False-True-False] 54.2810μs 24.1430μs 41.4199 KOps/s 41.2239 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-False-False-False-True] 65.4220μs 21.1700μs 47.2366 KOps/s 46.8267 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-False-False-False-False] 47.4280μs 13.9134μs 71.8730 KOps/s 69.7916 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[False-True-True-True-True] 75.4420μs 34.3744μs 29.0914 KOps/s 28.1801 KOps/s $\color{#35bf28}+3.23\%$
test_step_mdp_speed[False-True-True-True-False] 95.4300μs 22.1007μs 45.2475 KOps/s 44.9559 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-True-True-False-True] 51.1650μs 21.6077μs 46.2798 KOps/s 44.8592 KOps/s $\color{#35bf28}+3.17\%$
test_step_mdp_speed[False-True-True-False-False] 49.6430μs 13.3947μs 74.6561 KOps/s 72.6294 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-True-False-True-True] 82.7350μs 36.4618μs 27.4259 KOps/s 27.0572 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[False-True-False-True-False] 62.1160μs 23.7519μs 42.1020 KOps/s 41.3180 KOps/s $\color{#35bf28}+1.90\%$
test_step_mdp_speed[False-True-False-False-True] 2.6914ms 24.0432μs 41.5918 KOps/s 41.5397 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-True-False-False-False] 46.0570μs 15.2758μs 65.4628 KOps/s 63.8861 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[False-False-True-True-True] 89.5780μs 38.6625μs 25.8649 KOps/s 25.6343 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[False-False-True-True-False] 54.9230μs 25.7823μs 38.7864 KOps/s 37.7088 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[False-False-True-False-True] 68.0980μs 23.8109μs 41.9976 KOps/s 41.2702 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[False-False-True-False-False] 59.3110μs 15.2867μs 65.4165 KOps/s 63.8481 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[False-False-False-True-True] 95.2280μs 40.8896μs 24.4561 KOps/s 24.3147 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[False-False-False-True-False] 74.0790μs 27.6110μs 36.2175 KOps/s 35.8881 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[False-False-False-False-True] 68.1070μs 25.3507μs 39.4466 KOps/s 38.9596 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-False-False-False-False] 43.0810μs 17.1334μs 58.3656 KOps/s 58.3097 KOps/s $\color{#35bf28}+0.10\%$
test_values[generalized_advantage_estimate-True-True] 11.3208ms 9.6417ms 103.7160 Ops/s 103.5995 Ops/s $\color{#35bf28}+0.11\%$
test_values[vec_generalized_advantage_estimate-True-True] 35.2515ms 33.1566ms 30.1599 Ops/s 28.1094 Ops/s $\textbf{\color{#35bf28}+7.29\%}$
test_values[td0_return_estimate-False-False] 0.2351ms 0.1719ms 5.8180 KOps/s 5.7155 KOps/s $\color{#35bf28}+1.79\%$
test_values[td1_return_estimate-False-False] 26.3568ms 23.5750ms 42.4179 Ops/s 42.5512 Ops/s $\color{#d91a1a}-0.31\%$
test_values[vec_td1_return_estimate-False-False] 34.6804ms 33.3220ms 30.0102 Ops/s 29.8209 Ops/s $\color{#35bf28}+0.63\%$
test_values[td_lambda_return_estimate-True-False] 38.7894ms 34.3782ms 29.0882 Ops/s 29.6150 Ops/s $\color{#d91a1a}-1.78\%$
test_values[vec_td_lambda_return_estimate-True-False] 35.3918ms 33.2853ms 30.0433 Ops/s 29.6305 Ops/s $\color{#35bf28}+1.39\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 12.5708ms 8.3444ms 119.8404 Ops/s 117.8907 Ops/s $\color{#35bf28}+1.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.0416ms 1.8160ms 550.6690 Ops/s 545.4220 Ops/s $\color{#35bf28}+0.96\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4410ms 0.3473ms 2.8792 KOps/s 2.7860 KOps/s $\color{#35bf28}+3.34\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 40.5011ms 36.7803ms 27.1884 Ops/s 23.3654 Ops/s $\textbf{\color{#35bf28}+16.36\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.8033ms 3.0079ms 332.4557 Ops/s 329.3622 Ops/s $\color{#35bf28}+0.94\%$
test_dqn_speed[False-None] 5.6351ms 1.3810ms 724.1036 Ops/s 712.6048 Ops/s $\color{#35bf28}+1.61\%$
test_dqn_speed[False-backward] 1.9126ms 1.8407ms 543.2856 Ops/s 532.7315 Ops/s $\color{#35bf28}+1.98\%$
test_dqn_speed[True-None] 0.5740ms 0.4694ms 2.1305 KOps/s 2.0706 KOps/s $\color{#35bf28}+2.89\%$
test_dqn_speed[True-backward] 0.9695ms 0.8858ms 1.1290 KOps/s 1.1037 KOps/s $\color{#35bf28}+2.29\%$
test_dqn_speed[reduce-overhead-None] 0.7498ms 0.4830ms 2.0706 KOps/s 2.0827 KOps/s $\color{#d91a1a}-0.58\%$
test_dqn_speed[reduce-overhead-backward] 1.2625ms 0.9419ms 1.0617 KOps/s 1.1065 KOps/s $\color{#d91a1a}-4.05\%$
test_ddpg_speed[False-None] 3.5357ms 2.8659ms 348.9334 Ops/s 345.4266 Ops/s $\color{#35bf28}+1.02\%$
test_ddpg_speed[False-backward] 4.0712ms 3.9554ms 252.8184 Ops/s 249.9604 Ops/s $\color{#35bf28}+1.14\%$
test_ddpg_speed[True-None] 1.6454ms 1.0015ms 998.4565 Ops/s 988.1693 Ops/s $\color{#35bf28}+1.04\%$
test_ddpg_speed[True-backward] 2.2307ms 1.9034ms 525.3889 Ops/s 454.5667 Ops/s $\textbf{\color{#35bf28}+15.58\%}$
test_ddpg_speed[reduce-overhead-None] 1.3303ms 1.0152ms 984.9810 Ops/s 988.6513 Ops/s $\color{#d91a1a}-0.37\%$
test_ddpg_speed[reduce-overhead-backward] 1.9364ms 1.8824ms 531.2424 Ops/s 524.2671 Ops/s $\color{#35bf28}+1.33\%$
test_sac_speed[False-None] 8.5237ms 7.8960ms 126.6471 Ops/s 124.6302 Ops/s $\color{#35bf28}+1.62\%$
test_sac_speed[False-backward] 12.3927ms 10.6239ms 94.1278 Ops/s 93.2130 Ops/s $\color{#35bf28}+0.98\%$
test_sac_speed[True-None] 2.0866ms 1.8345ms 545.1202 Ops/s 545.6913 Ops/s $\color{#d91a1a}-0.10\%$
test_sac_speed[True-backward] 3.5673ms 3.4905ms 286.4906 Ops/s 283.1715 Ops/s $\color{#35bf28}+1.17\%$
test_sac_speed[reduce-overhead-None] 2.3184ms 1.8507ms 540.3305 Ops/s 533.0957 Ops/s $\color{#35bf28}+1.36\%$
test_sac_speed[reduce-overhead-backward] 3.9803ms 3.5123ms 284.7139 Ops/s 283.7762 Ops/s $\color{#35bf28}+0.33\%$
test_redq_speed[False-None] 14.5712ms 12.5863ms 79.4516 Ops/s 79.7481 Ops/s $\color{#d91a1a}-0.37\%$
test_redq_speed[False-backward] 24.6703ms 21.8844ms 45.6946 Ops/s 45.7290 Ops/s $\color{#d91a1a}-0.08\%$
test_redq_speed[True-None] 5.3722ms 4.4549ms 224.4736 Ops/s 219.0299 Ops/s $\color{#35bf28}+2.49\%$
test_redq_speed[True-backward] 13.3200ms 11.7997ms 84.7479 Ops/s 84.0222 Ops/s $\color{#35bf28}+0.86\%$
test_redq_speed[reduce-overhead-None] 5.1779ms 4.5001ms 222.2156 Ops/s 222.6868 Ops/s $\color{#d91a1a}-0.21\%$
test_redq_speed[reduce-overhead-backward] 13.2804ms 11.8445ms 84.4272 Ops/s 84.3495 Ops/s $\color{#35bf28}+0.09\%$
test_redq_deprec_speed[False-None] 15.3166ms 12.6697ms 78.9285 Ops/s 77.6965 Ops/s $\color{#35bf28}+1.59\%$
test_redq_deprec_speed[False-backward] 21.0644ms 18.3153ms 54.5991 Ops/s 53.8934 Ops/s $\color{#35bf28}+1.31\%$
test_redq_deprec_speed[True-None] 4.1749ms 3.5626ms 280.6928 Ops/s 279.5130 Ops/s $\color{#35bf28}+0.42\%$
test_redq_deprec_speed[True-backward] 7.9838ms 7.8581ms 127.2579 Ops/s 126.2389 Ops/s $\color{#35bf28}+0.81\%$
test_redq_deprec_speed[reduce-overhead-None] 4.2987ms 3.6340ms 275.1792 Ops/s 279.9158 Ops/s $\color{#d91a1a}-1.69\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.2675ms 7.8867ms 126.7963 Ops/s 126.3178 Ops/s $\color{#35bf28}+0.38\%$
test_td3_speed[False-None] 8.3050ms 7.9172ms 126.3068 Ops/s 123.5654 Ops/s $\color{#35bf28}+2.22\%$
test_td3_speed[False-backward] 12.4504ms 10.2367ms 97.6875 Ops/s 95.7070 Ops/s $\color{#35bf28}+2.07\%$
test_td3_speed[True-None] 1.8140ms 1.7243ms 579.9554 Ops/s 574.9698 Ops/s $\color{#35bf28}+0.87\%$
test_td3_speed[True-backward] 3.3861ms 3.3018ms 302.8610 Ops/s 301.0997 Ops/s $\color{#35bf28}+0.58\%$
test_td3_speed[reduce-overhead-None] 2.0143ms 1.7295ms 578.1900 Ops/s 574.2217 Ops/s $\color{#35bf28}+0.69\%$
test_td3_speed[reduce-overhead-backward] 3.3722ms 3.3149ms 301.6690 Ops/s 300.1281 Ops/s $\color{#35bf28}+0.51\%$
test_cql_speed[False-None] 37.8625ms 35.7622ms 27.9625 Ops/s 27.3148 Ops/s $\color{#35bf28}+2.37\%$
test_cql_speed[False-backward] 47.5013ms 45.6296ms 21.9156 Ops/s 21.6043 Ops/s $\color{#35bf28}+1.44\%$
test_cql_speed[True-None] 16.1148ms 15.2760ms 65.4621 Ops/s 63.5476 Ops/s $\color{#35bf28}+3.01\%$
test_cql_speed[True-backward] 25.4284ms 21.9987ms 45.4572 Ops/s 45.3389 Ops/s $\color{#35bf28}+0.26\%$
test_cql_speed[reduce-overhead-None] 16.5489ms 15.4114ms 64.8872 Ops/s 65.0414 Ops/s $\color{#d91a1a}-0.24\%$
test_cql_speed[reduce-overhead-backward] 23.0210ms 21.7630ms 45.9495 Ops/s 43.9741 Ops/s $\color{#35bf28}+4.49\%$
test_a2c_speed[False-None] 8.0856ms 7.0934ms 140.9771 Ops/s 137.9505 Ops/s $\color{#35bf28}+2.19\%$
test_a2c_speed[False-backward] 16.4562ms 14.0373ms 71.2388 Ops/s 70.3923 Ops/s $\color{#35bf28}+1.20\%$
test_a2c_speed[True-None] 4.6671ms 4.1785ms 239.3193 Ops/s 237.7080 Ops/s $\color{#35bf28}+0.68\%$
test_a2c_speed[True-backward] 10.9710ms 10.6348ms 94.0311 Ops/s 92.9249 Ops/s $\color{#35bf28}+1.19\%$
test_a2c_speed[reduce-overhead-None] 5.0521ms 4.2034ms 237.9042 Ops/s 234.7436 Ops/s $\color{#35bf28}+1.35\%$
test_a2c_speed[reduce-overhead-backward] 11.4384ms 10.6500ms 93.8970 Ops/s 90.5059 Ops/s $\color{#35bf28}+3.75\%$
test_ppo_speed[False-None] 9.2186ms 7.3666ms 135.7478 Ops/s 131.4602 Ops/s $\color{#35bf28}+3.26\%$
test_ppo_speed[False-backward] 14.8074ms 14.4477ms 69.2153 Ops/s 68.9322 Ops/s $\color{#35bf28}+0.41\%$
test_ppo_speed[True-None] 4.0077ms 3.6705ms 272.4415 Ops/s 267.0584 Ops/s $\color{#35bf28}+2.02\%$
test_ppo_speed[True-backward] 10.3520ms 9.5496ms 104.7161 Ops/s 103.8431 Ops/s $\color{#35bf28}+0.84\%$
test_ppo_speed[reduce-overhead-None] 6.8089ms 3.6739ms 272.1880 Ops/s 269.9234 Ops/s $\color{#35bf28}+0.84\%$
test_ppo_speed[reduce-overhead-backward] 9.9425ms 9.5636ms 104.5637 Ops/s 104.0651 Ops/s $\color{#35bf28}+0.48\%$
test_reinforce_speed[False-None] 7.7562ms 6.4901ms 154.0797 Ops/s 152.0440 Ops/s $\color{#35bf28}+1.34\%$
test_reinforce_speed[False-backward] 9.8759ms 9.7116ms 102.9702 Ops/s 100.7243 Ops/s $\color{#35bf28}+2.23\%$
test_reinforce_speed[True-None] 3.2117ms 2.6244ms 381.0401 Ops/s 374.0226 Ops/s $\color{#35bf28}+1.88\%$
test_reinforce_speed[True-backward] 10.3711ms 8.6901ms 115.0733 Ops/s 116.1142 Ops/s $\color{#d91a1a}-0.90\%$
test_reinforce_speed[reduce-overhead-None] 3.0898ms 2.6179ms 381.9829 Ops/s 371.8371 Ops/s $\color{#35bf28}+2.73\%$
test_reinforce_speed[reduce-overhead-backward] 8.8810ms 8.5317ms 117.2106 Ops/s 115.6600 Ops/s $\color{#35bf28}+1.34\%$
test_iql_speed[False-None] 33.5708ms 31.7945ms 31.4520 Ops/s 31.4811 Ops/s $\color{#d91a1a}-0.09\%$
test_iql_speed[False-backward] 47.7292ms 45.4843ms 21.9856 Ops/s 22.1999 Ops/s $\color{#d91a1a}-0.97\%$
test_iql_speed[True-None] 11.6211ms 10.3750ms 96.3852 Ops/s 93.9391 Ops/s $\color{#35bf28}+2.60\%$
test_iql_speed[True-backward] 22.3371ms 21.0818ms 47.4343 Ops/s 46.5491 Ops/s $\color{#35bf28}+1.90\%$
test_iql_speed[reduce-overhead-None] 12.4142ms 10.7770ms 92.7903 Ops/s 94.2582 Ops/s $\color{#d91a1a}-1.56\%$
test_iql_speed[reduce-overhead-backward] 22.4143ms 21.1010ms 47.3911 Ops/s 46.2145 Ops/s $\color{#35bf28}+2.55\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.6936ms 4.7509ms 210.4849 Ops/s 200.1961 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8664ms 0.5148ms 1.9427 KOps/s 1.9002 KOps/s $\color{#35bf28}+2.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7738ms 0.4893ms 2.0439 KOps/s 2.0077 KOps/s $\color{#35bf28}+1.81\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.0612ms 4.5383ms 220.3451 Ops/s 218.5360 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.3187s 0.7383ms 1.3545 KOps/s 1.9898 KOps/s $\textbf{\color{#d91a1a}-31.93\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8084ms 0.4773ms 2.0950 KOps/s 2.0637 KOps/s $\color{#35bf28}+1.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.9117ms 1.6190ms 617.6484 Ops/s 598.8441 Ops/s $\color{#35bf28}+3.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.2724ms 1.5407ms 649.0678 Ops/s 637.9273 Ops/s $\color{#35bf28}+1.75\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.9453ms 4.6719ms 214.0455 Ops/s 208.4663 Ops/s $\color{#35bf28}+2.68\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5121ms 0.6406ms 1.5609 KOps/s 1.5345 KOps/s $\color{#35bf28}+1.72\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9990ms 0.6161ms 1.6230 KOps/s 1.5949 KOps/s $\color{#35bf28}+1.77\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.0401ms 4.5814ms 218.2719 Ops/s 213.6091 Ops/s $\color{#35bf28}+2.18\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.4580ms 0.5139ms 1.9459 KOps/s 1.9342 KOps/s $\color{#35bf28}+0.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8341ms 0.4930ms 2.0283 KOps/s 1.9792 KOps/s $\color{#35bf28}+2.48\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.8921ms 4.5592ms 219.3374 Ops/s 218.5889 Ops/s $\color{#35bf28}+0.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1615ms 0.5062ms 1.9754 KOps/s 1.9270 KOps/s $\color{#35bf28}+2.51\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6798ms 0.4761ms 2.1005 KOps/s 2.0477 KOps/s $\color{#35bf28}+2.58\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.5526ms 4.7080ms 212.4032 Ops/s 202.8300 Ops/s $\color{#35bf28}+4.72\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0166ms 0.6470ms 1.5457 KOps/s 1.4992 KOps/s $\color{#35bf28}+3.10\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8695ms 0.6153ms 1.6251 KOps/s 1.6076 KOps/s $\color{#35bf28}+1.09\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3847s 11.8368ms 84.4821 Ops/s 39.6900 Ops/s $\textbf{\color{#35bf28}+112.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.1887ms 2.3465ms 426.1680 Ops/s 420.7105 Ops/s $\color{#35bf28}+1.30\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.1088ms 1.3625ms 733.9694 Ops/s 739.9754 Ops/s $\color{#d91a1a}-0.81\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 5.4098ms 4.1961ms 238.3176 Ops/s 225.9292 Ops/s $\textbf{\color{#35bf28}+5.48\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.4435ms 2.4066ms 415.5280 Ops/s 418.8999 Ops/s $\color{#d91a1a}-0.80\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.6259ms 1.5468ms 646.4940 Ops/s 742.9617 Ops/s $\textbf{\color{#d91a1a}-12.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3506s 11.3368ms 88.2085 Ops/s 222.1205 Ops/s $\textbf{\color{#d91a1a}-60.29\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.5585ms 2.2315ms 448.1266 Ops/s 415.2894 Ops/s $\textbf{\color{#35bf28}+7.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.0542ms 1.3899ms 719.4835 Ops/s 649.0486 Ops/s $\textbf{\color{#35bf28}+10.85\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 15.6472ms 12.9824ms 77.0274 Ops/s 70.5811 Ops/s $\textbf{\color{#35bf28}+9.13\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 15.8654ms 14.5129ms 68.9044 Ops/s 67.2141 Ops/s $\color{#35bf28}+2.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 24.0669ms 21.9232ms 45.6138 Ops/s 43.1938 Ops/s $\textbf{\color{#35bf28}+5.60\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.0987ms 15.0164ms 66.5937 Ops/s 66.4257 Ops/s $\color{#35bf28}+0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 23.2512ms 21.5067ms 46.4972 Ops/s 45.0600 Ops/s $\color{#35bf28}+3.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.5160ms 16.1060ms 62.0885 Ops/s 60.4746 Ops/s $\color{#35bf28}+2.67\%$

Copy link

github-actions bot commented Dec 19, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7216s 0.7213s 1.3864 Ops/s 1.3452 Ops/s $\color{#35bf28}+3.07\%$
test_transformed 0.9729s 0.9716s 1.0293 Ops/s 1.0287 Ops/s $\color{#35bf28}+0.05\%$
test_serial 2.2366s 2.1568s 0.4637 Ops/s 0.4672 Ops/s $\color{#d91a1a}-0.76\%$
test_parallel 1.9862s 1.8617s 0.5371 Ops/s 0.5311 Ops/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[True-True-True-True-True] 0.1827ms 40.4960μs 24.6938 KOps/s 24.7154 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[True-True-True-True-False] 55.9910μs 23.6291μs 42.3207 KOps/s 41.9159 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-True-True-False-True] 0.1286ms 22.6196μs 44.2094 KOps/s 43.9713 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[True-True-True-False-False] 41.9400μs 13.0905μs 76.3911 KOps/s 75.3657 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-True-False-True-True] 73.1320μs 43.3488μs 23.0687 KOps/s 22.6299 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-True-False-True-False] 57.1510μs 25.8359μs 38.7058 KOps/s 37.9310 KOps/s $\color{#35bf28}+2.04\%$
test_step_mdp_speed[True-True-False-False-True] 59.8810μs 25.2060μs 39.6731 KOps/s 38.6319 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-True-False-False-False] 41.5010μs 15.4986μs 64.5220 KOps/s 62.3767 KOps/s $\color{#35bf28}+3.44\%$
test_step_mdp_speed[True-False-True-True-True] 81.8420μs 45.8657μs 21.8028 KOps/s 21.7382 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-False-True-True-False] 59.2810μs 28.6026μs 34.9619 KOps/s 34.1722 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[True-False-True-False-True] 0.1292ms 25.0022μs 39.9964 KOps/s 38.8645 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[True-False-True-False-False] 44.5900μs 15.3298μs 65.2323 KOps/s 63.1364 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[True-False-False-True-True] 79.8120μs 47.7578μs 20.9390 KOps/s 20.2466 KOps/s $\color{#35bf28}+3.42\%$
test_step_mdp_speed[True-False-False-True-False] 58.1610μs 30.9301μs 32.3310 KOps/s 31.8328 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[True-False-False-False-True] 59.4710μs 27.3847μs 36.5168 KOps/s 36.4245 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-False-False-False] 47.1110μs 17.8521μs 56.0157 KOps/s 54.9969 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[False-True-True-True-True] 95.3310μs 45.7040μs 21.8799 KOps/s 21.5595 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[False-True-True-True-False] 62.5410μs 28.5643μs 35.0087 KOps/s 34.2181 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[False-True-True-False-True] 53.7700μs 28.8320μs 34.6837 KOps/s 33.8740 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[False-True-True-False-False] 46.1310μs 17.2907μs 57.8344 KOps/s 56.1719 KOps/s $\color{#35bf28}+2.96\%$
test_step_mdp_speed[False-True-False-True-True] 75.2810μs 48.1593μs 20.7644 KOps/s 20.5932 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-True-False-True-False] 60.2710μs 31.0664μs 32.1891 KOps/s 32.0379 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-True-False-False-True] 3.1517ms 31.7324μs 31.5135 KOps/s 31.1607 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-True-False-False-False] 48.3110μs 19.7187μs 50.7132 KOps/s 49.4338 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[False-False-True-True-True] 83.0910μs 50.5120μs 19.7973 KOps/s 19.3414 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-False-True-True-False] 67.4310μs 33.3984μs 29.9416 KOps/s 28.9799 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[False-False-True-False-True] 61.6510μs 31.5031μs 31.7429 KOps/s 31.6210 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[False-False-True-False-False] 49.2810μs 19.5677μs 51.1047 KOps/s 49.6213 KOps/s $\color{#35bf28}+2.99\%$
test_step_mdp_speed[False-False-False-True-True] 82.3010μs 52.7871μs 18.9440 KOps/s 18.7743 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[False-False-False-True-False] 75.5410μs 35.4271μs 28.2270 KOps/s 27.2162 KOps/s $\color{#35bf28}+3.71\%$
test_step_mdp_speed[False-False-False-False-True] 61.9310μs 33.2616μs 30.0647 KOps/s 29.7486 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-False-False-False-False] 54.4410μs 21.9478μs 45.5626 KOps/s 44.7944 KOps/s $\color{#35bf28}+1.71\%$
test_values[generalized_advantage_estimate-True-True] 25.4855ms 25.1305ms 39.7922 Ops/s 39.6488 Ops/s $\color{#35bf28}+0.36\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1137s 3.1751ms 314.9539 Ops/s 365.2461 Ops/s $\textbf{\color{#d91a1a}-13.77\%}$
test_values[td0_return_estimate-False-False] 0.1050ms 83.0005μs 12.0481 KOps/s 12.2321 KOps/s $\color{#d91a1a}-1.50\%$
test_values[td1_return_estimate-False-False] 56.7292ms 56.3223ms 17.7550 Ops/s 17.4463 Ops/s $\color{#35bf28}+1.77\%$
test_values[vec_td1_return_estimate-False-False] 1.3103ms 1.0923ms 915.5147 Ops/s 912.2233 Ops/s $\color{#35bf28}+0.36\%$
test_values[td_lambda_return_estimate-True-False] 89.8615ms 89.4137ms 11.1840 Ops/s 11.0859 Ops/s $\color{#35bf28}+0.88\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3791ms 1.0953ms 912.9710 Ops/s 919.2559 Ops/s $\color{#d91a1a}-0.68\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.3613ms 25.1124ms 39.8209 Ops/s 39.3749 Ops/s $\color{#35bf28}+1.13\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0460ms 0.7674ms 1.3032 KOps/s 1.3184 KOps/s $\color{#d91a1a}-1.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7771ms 0.6809ms 1.4686 KOps/s 1.4648 KOps/s $\color{#35bf28}+0.26\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5442ms 1.4892ms 671.4979 Ops/s 670.9295 Ops/s $\color{#35bf28}+0.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7537ms 0.6957ms 1.4375 KOps/s 1.4388 KOps/s $\color{#d91a1a}-0.09\%$
test_dqn_speed[False-None] 6.8365ms 1.5416ms 648.6933 Ops/s 647.9133 Ops/s $\color{#35bf28}+0.12\%$
test_dqn_speed[False-backward] 2.2071ms 2.1474ms 465.6840 Ops/s 461.0971 Ops/s $\color{#35bf28}+0.99\%$
test_dqn_speed[True-None] 0.7361ms 0.5601ms 1.7855 KOps/s 1.7552 KOps/s $\color{#35bf28}+1.72\%$
test_dqn_speed[True-backward] 1.1971ms 1.1242ms 889.4876 Ops/s 866.1522 Ops/s $\color{#35bf28}+2.69\%$
test_dqn_speed[reduce-overhead-None] 0.9998ms 0.6095ms 1.6406 KOps/s 1.7270 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_dqn_speed[reduce-overhead-backward] 1.1692ms 1.0356ms 965.6041 Ops/s 885.4283 Ops/s $\textbf{\color{#35bf28}+9.06\%}$
test_ddpg_speed[False-None] 3.2682ms 2.9219ms 342.2447 Ops/s 342.0857 Ops/s $\color{#35bf28}+0.05\%$
test_ddpg_speed[False-backward] 4.5827ms 4.2035ms 237.8994 Ops/s 231.0145 Ops/s $\color{#35bf28}+2.98\%$
test_ddpg_speed[True-None] 1.5564ms 1.1529ms 867.4116 Ops/s 887.1161 Ops/s $\color{#d91a1a}-2.22\%$
test_ddpg_speed[True-backward] 2.2530ms 2.2015ms 454.2399 Ops/s 417.1050 Ops/s $\textbf{\color{#35bf28}+8.90\%}$
test_ddpg_speed[reduce-overhead-None] 1.5979ms 1.1811ms 846.6591 Ops/s 878.3088 Ops/s $\color{#d91a1a}-3.60\%$
test_ddpg_speed[reduce-overhead-backward] 1.8402ms 1.6692ms 599.0779 Ops/s 539.5295 Ops/s $\textbf{\color{#35bf28}+11.04\%}$
test_sac_speed[False-None] 8.9300ms 8.1819ms 122.2203 Ops/s 121.8488 Ops/s $\color{#35bf28}+0.30\%$
test_sac_speed[False-backward] 11.7107ms 11.1986ms 89.2969 Ops/s 87.0954 Ops/s $\color{#35bf28}+2.53\%$
test_sac_speed[True-None] 1.9438ms 1.5779ms 633.7677 Ops/s 615.4312 Ops/s $\color{#35bf28}+2.98\%$
test_sac_speed[True-backward] 3.4097ms 3.2892ms 304.0208 Ops/s 298.7594 Ops/s $\color{#35bf28}+1.76\%$
test_sac_speed[reduce-overhead-None] 23.1818ms 12.9370ms 77.2974 Ops/s 78.5243 Ops/s $\color{#d91a1a}-1.56\%$
test_sac_speed[reduce-overhead-backward] 1.4955ms 1.3751ms 727.1963 Ops/s 678.4797 Ops/s $\textbf{\color{#35bf28}+7.18\%}$
test_redq_speed[False-None] 8.3514ms 7.5913ms 131.7296 Ops/s 129.7229 Ops/s $\color{#35bf28}+1.55\%$
test_redq_speed[False-backward] 12.5050ms 11.5261ms 86.7595 Ops/s 85.1039 Ops/s $\color{#35bf28}+1.95\%$
test_redq_speed[True-None] 2.1984ms 2.0172ms 495.7264 Ops/s 485.9364 Ops/s $\color{#35bf28}+2.01\%$
test_redq_speed[True-backward] 4.2948ms 3.9127ms 255.5784 Ops/s 263.4447 Ops/s $\color{#d91a1a}-2.99\%$
test_redq_speed[reduce-overhead-None] 2.2734ms 2.0954ms 477.2423 Ops/s 482.5666 Ops/s $\color{#d91a1a}-1.10\%$
test_redq_speed[reduce-overhead-backward] 3.8235ms 3.7202ms 268.8024 Ops/s 264.9530 Ops/s $\color{#35bf28}+1.45\%$
test_redq_deprec_speed[False-None] 9.7394ms 9.2271ms 108.3770 Ops/s 107.1418 Ops/s $\color{#35bf28}+1.15\%$
test_redq_deprec_speed[False-backward] 12.9333ms 12.2487ms 81.6411 Ops/s 80.6423 Ops/s $\color{#35bf28}+1.24\%$
test_redq_deprec_speed[True-None] 2.5633ms 2.4051ms 415.7755 Ops/s 412.5839 Ops/s $\color{#35bf28}+0.77\%$
test_redq_deprec_speed[True-backward] 4.2651ms 4.0954ms 244.1754 Ops/s 241.9526 Ops/s $\color{#35bf28}+0.92\%$
test_redq_deprec_speed[reduce-overhead-None] 2.6885ms 2.4867ms 402.1364 Ops/s 417.9973 Ops/s $\color{#d91a1a}-3.79\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.7130ms 4.1875ms 238.8045 Ops/s 240.9871 Ops/s $\color{#d91a1a}-0.91\%$
test_td3_speed[False-None] 8.4188ms 8.1008ms 123.4450 Ops/s 123.8486 Ops/s $\color{#d91a1a}-0.33\%$
test_td3_speed[False-backward] 11.2633ms 10.4651ms 95.5554 Ops/s 96.3362 Ops/s $\color{#d91a1a}-0.81\%$
test_td3_speed[True-None] 1.7549ms 1.7057ms 586.2732 Ops/s 610.3481 Ops/s $\color{#d91a1a}-3.94\%$
test_td3_speed[True-backward] 3.5049ms 3.1859ms 313.8848 Ops/s 293.9796 Ops/s $\textbf{\color{#35bf28}+6.77\%}$
test_td3_speed[reduce-overhead-None] 82.3560ms 27.0619ms 36.9523 Ops/s 35.6903 Ops/s $\color{#35bf28}+3.54\%$
test_td3_speed[reduce-overhead-backward] 1.3766ms 1.3357ms 748.6907 Ops/s 682.5973 Ops/s $\textbf{\color{#35bf28}+9.68\%}$
test_cql_speed[False-None] 18.2479ms 17.0337ms 58.7072 Ops/s 58.1744 Ops/s $\color{#35bf28}+0.92\%$
test_cql_speed[False-backward] 22.8195ms 22.2609ms 44.9218 Ops/s 44.5011 Ops/s $\color{#35bf28}+0.95\%$
test_cql_speed[True-None] 3.3633ms 3.0052ms 332.7532 Ops/s 328.8279 Ops/s $\color{#35bf28}+1.19\%$
test_cql_speed[True-backward] 5.7079ms 5.2136ms 191.8063 Ops/s 190.1087 Ops/s $\color{#35bf28}+0.89\%$
test_cql_speed[reduce-overhead-None] 22.5528ms 13.5613ms 73.7392 Ops/s 75.0910 Ops/s $\color{#d91a1a}-1.80\%$
test_cql_speed[reduce-overhead-backward] 1.6743ms 1.5460ms 646.8353 Ops/s 639.2000 Ops/s $\color{#35bf28}+1.19\%$
test_a2c_speed[False-None] 3.3699ms 3.2401ms 308.6352 Ops/s 302.9777 Ops/s $\color{#35bf28}+1.87\%$
test_a2c_speed[False-backward] 6.6194ms 6.1620ms 162.2843 Ops/s 158.8741 Ops/s $\color{#35bf28}+2.15\%$
test_a2c_speed[True-None] 1.4315ms 1.0371ms 964.2332 Ops/s 954.0620 Ops/s $\color{#35bf28}+1.07\%$
test_a2c_speed[True-backward] 2.7305ms 2.6511ms 377.1957 Ops/s 371.3167 Ops/s $\color{#35bf28}+1.58\%$
test_a2c_speed[reduce-overhead-None] 22.0925ms 11.6821ms 85.6014 Ops/s 87.9894 Ops/s $\color{#d91a1a}-2.71\%$
test_a2c_speed[reduce-overhead-backward] 1.0813ms 0.9930ms 1.0071 KOps/s 852.1353 Ops/s $\textbf{\color{#35bf28}+18.18\%}$
test_ppo_speed[False-None] 4.1198ms 3.7641ms 265.6697 Ops/s 266.0200 Ops/s $\color{#d91a1a}-0.13\%$
test_ppo_speed[False-backward] 7.5074ms 6.9390ms 144.1135 Ops/s 139.1795 Ops/s $\color{#35bf28}+3.55\%$
test_ppo_speed[True-None] 1.0298ms 0.9783ms 1.0222 KOps/s 975.0350 Ops/s $\color{#35bf28}+4.83\%$
test_ppo_speed[True-backward] 2.9365ms 2.7733ms 360.5821 Ops/s 348.4295 Ops/s $\color{#35bf28}+3.49\%$
test_ppo_speed[reduce-overhead-None] 0.6483ms 0.5279ms 1.8942 KOps/s 1.8518 KOps/s $\color{#35bf28}+2.29\%$
test_ppo_speed[reduce-overhead-backward] 1.1912ms 1.1333ms 882.3447 Ops/s 844.7375 Ops/s $\color{#35bf28}+4.45\%$
test_reinforce_speed[False-None] 2.4735ms 2.2965ms 435.4454 Ops/s 417.6027 Ops/s $\color{#35bf28}+4.27\%$
test_reinforce_speed[False-backward] 3.9117ms 3.4329ms 291.2990 Ops/s 282.6620 Ops/s $\color{#35bf28}+3.06\%$
test_reinforce_speed[True-None] 1.0043ms 0.8506ms 1.1757 KOps/s 1.1358 KOps/s $\color{#35bf28}+3.51\%$
test_reinforce_speed[True-backward] 2.6886ms 2.6089ms 383.3016 Ops/s 373.8720 Ops/s $\color{#35bf28}+2.52\%$
test_reinforce_speed[reduce-overhead-None] 22.3022ms 11.9600ms 83.6120 Ops/s 85.6157 Ops/s $\color{#d91a1a}-2.34\%$
test_reinforce_speed[reduce-overhead-backward] 1.2192ms 1.0644ms 939.4550 Ops/s 808.5203 Ops/s $\textbf{\color{#35bf28}+16.19\%}$
test_iql_speed[False-None] 9.9005ms 9.4176ms 106.1843 Ops/s 106.0798 Ops/s $\color{#35bf28}+0.10\%$
test_iql_speed[False-backward] 14.1673ms 13.1952ms 75.7853 Ops/s 74.4061 Ops/s $\color{#35bf28}+1.85\%$
test_iql_speed[True-None] 2.3128ms 1.8466ms 541.5259 Ops/s 552.4142 Ops/s $\color{#d91a1a}-1.97\%$
test_iql_speed[True-backward] 4.4358ms 4.3245ms 231.2433 Ops/s 227.3369 Ops/s $\color{#35bf28}+1.72\%$
test_iql_speed[reduce-overhead-None] 20.2689ms 11.6552ms 85.7987 Ops/s 85.9666 Ops/s $\color{#d91a1a}-0.20\%$
test_iql_speed[reduce-overhead-backward] 1.5465ms 1.4643ms 682.9097 Ops/s 643.6462 Ops/s $\textbf{\color{#35bf28}+6.10\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0733ms 6.4644ms 154.6931 Ops/s 154.2175 Ops/s $\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5789ms 0.3544ms 2.8220 KOps/s 3.4808 KOps/s $\textbf{\color{#d91a1a}-18.93\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7249ms 0.3193ms 3.1321 KOps/s 3.6797 KOps/s $\textbf{\color{#d91a1a}-14.88\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6721ms 6.1716ms 162.0314 Ops/s 160.7850 Ops/s $\color{#35bf28}+0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0334ms 0.3076ms 3.2509 KOps/s 2.9614 KOps/s $\textbf{\color{#35bf28}+9.78\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7095ms 0.2957ms 3.3821 KOps/s 3.1566 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7011ms 1.3349ms 749.1044 Ops/s 693.6946 Ops/s $\textbf{\color{#35bf28}+7.99\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6576ms 1.2464ms 802.3066 Ops/s 734.5433 Ops/s $\textbf{\color{#35bf28}+9.23\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.3661ms 6.3929ms 156.4223 Ops/s 157.7570 Ops/s $\color{#d91a1a}-0.85\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8290ms 0.4931ms 2.0281 KOps/s 2.1804 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8476ms 0.4039ms 2.4761 KOps/s 2.1719 KOps/s $\textbf{\color{#35bf28}+14.00\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6014ms 6.2228ms 160.6993 Ops/s 160.7039 Ops/s $-0.00\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.7382ms 0.3366ms 2.9712 KOps/s 2.7508 KOps/s $\textbf{\color{#35bf28}+8.01\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7340ms 0.3079ms 3.2481 KOps/s 3.0426 KOps/s $\textbf{\color{#35bf28}+6.75\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.7420ms 6.1816ms 161.7692 Ops/s 161.5689 Ops/s $\color{#35bf28}+0.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9807ms 0.3337ms 2.9970 KOps/s 2.9581 KOps/s $\color{#35bf28}+1.32\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5444ms 0.3011ms 3.3208 KOps/s 3.3519 KOps/s $\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.7339ms 6.3918ms 156.4495 Ops/s 157.5602 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2634ms 0.4727ms 2.1155 KOps/s 2.1973 KOps/s $\color{#d91a1a}-3.72\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5821ms 0.4249ms 2.3533 KOps/s 2.0391 KOps/s $\textbf{\color{#35bf28}+15.41\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.2856ms 5.5176ms 181.2373 Ops/s 180.2241 Ops/s $\color{#35bf28}+0.56\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.4294ms 2.2019ms 454.1496 Ops/s 441.6914 Ops/s $\color{#35bf28}+2.82\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.2431ms 1.1659ms 857.6846 Ops/s 772.5934 Ops/s $\textbf{\color{#35bf28}+11.01\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.7606ms 5.6341ms 177.4921 Ops/s 184.4806 Ops/s $\color{#d91a1a}-3.79\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 7.3956ms 2.1036ms 475.3779 Ops/s 436.1052 Ops/s $\textbf{\color{#35bf28}+9.01\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.6748ms 1.3620ms 734.2116 Ops/s 859.9220 Ops/s $\textbf{\color{#d91a1a}-14.62\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5030s 15.7492ms 63.4952 Ops/s 32.9240 Ops/s $\textbf{\color{#35bf28}+92.85\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.2267ms 2.3210ms 430.8566 Ops/s 451.3856 Ops/s $\color{#d91a1a}-4.55\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.8926ms 1.4802ms 675.5918 Ops/s 714.3962 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 18.4262ms 15.5387ms 64.3553 Ops/s 62.9521 Ops/s $\color{#35bf28}+2.23\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3641ms 17.2433ms 57.9935 Ops/s 55.8739 Ops/s $\color{#35bf28}+3.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 20.1453ms 19.8835ms 50.2929 Ops/s 47.5854 Ops/s $\textbf{\color{#35bf28}+5.69\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.5786ms 17.5826ms 56.8743 Ops/s 55.1688 Ops/s $\color{#35bf28}+3.09\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 20.0532ms 19.7892ms 50.5325 Ops/s 48.2259 Ops/s $\color{#35bf28}+4.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.6001ms 18.9292ms 52.8284 Ops/s 50.2847 Ops/s $\textbf{\color{#35bf28}+5.06\%}$

@vmoens vmoens added the performance Performance issue or suggestion for improvement label Dec 20, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 20, 2024
ghstack-source-id: a4dc1515d8b51f5ec150b2fae4e1a84254f2af09
Pull Request resolved: #2672
@vmoens vmoens merged commit 563e4a9 into gh/vmoens/62/base Dec 20, 2024
64 of 79 checks passed
vmoens added a commit that referenced this pull request Dec 20, 2024
ghstack-source-id: a4dc1515d8b51f5ec150b2fae4e1a84254f2af09
Pull Request resolved: #2672
@vmoens vmoens deleted the gh/vmoens/62/head branch December 20, 2024 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants