Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Fine grained DeviceCastTransform #2041

Merged
merged 9 commits into from
Mar 28, 2024
Merged

[Feature] Fine grained DeviceCastTransform #2041

merged 9 commits into from
Mar 28, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 26, 2024

Blocking merge until #2034 is incorporated

Copy link

pytorch-bot bot commented Mar 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2041

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 21 Unrelated Failures

As of commit 05ee1bb with merge base c98754f (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024
@vmoens vmoens marked this pull request as draft March 26, 2024 18:10
@vmoens vmoens added the enhancement New feature or request label Mar 26, 2024
Copy link

github-actions bot commented Mar 26, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 51.6262ms 51.0994ms 19.5697 Ops/s 18.5271 Ops/s $\textbf{\color{#35bf28}+5.63\%}$
test_sync 34.1883ms 28.0462ms 35.6555 Ops/s 33.0685 Ops/s $\textbf{\color{#35bf28}+7.82\%}$
test_async 49.6882ms 26.4977ms 37.7392 Ops/s 36.9189 Ops/s $\color{#35bf28}+2.22\%$
test_simple 0.3925s 0.3385s 2.9544 Ops/s 3.0761 Ops/s $\color{#d91a1a}-3.96\%$
test_transformed 0.4601s 0.4579s 2.1839 Ops/s 2.0893 Ops/s $\color{#35bf28}+4.53\%$
test_serial 1.2229s 1.1708s 0.8541 Ops/s 0.8560 Ops/s $\color{#d91a1a}-0.22\%$
test_parallel 1.0439s 1.0005s 0.9995 Ops/s 1.0009 Ops/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[True-True-True-True-True] 0.1957ms 21.2856μs 46.9801 KOps/s 46.8686 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-True-True-True-False] 49.6530μs 12.9554μs 77.1882 KOps/s 76.5889 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[True-True-True-False-True] 39.6640μs 12.4781μs 80.1404 KOps/s 80.0318 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[True-True-True-False-False] 32.1000μs 7.5683μs 132.1293 KOps/s 132.5528 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-True-False-True-True] 53.7400μs 22.6895μs 44.0733 KOps/s 43.6883 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-True-False-True-False] 37.5200μs 14.2261μs 70.2932 KOps/s 70.4234 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-True-False-False-True] 37.0090μs 13.5649μs 73.7199 KOps/s 74.0001 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[True-True-False-False-False] 34.5950μs 8.8729μs 112.7022 KOps/s 114.2005 KOps/s $\color{#d91a1a}-1.31\%$
test_step_mdp_speed[True-False-True-True-True] 58.5400μs 23.9534μs 41.7478 KOps/s 41.8569 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-False-True-True-False] 44.2730μs 15.7190μs 63.6173 KOps/s 64.6227 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[True-False-True-False-True] 39.9350μs 13.7045μs 72.9687 KOps/s 73.8827 KOps/s $\color{#d91a1a}-1.24\%$
test_step_mdp_speed[True-False-True-False-False] 35.4060μs 8.8229μs 113.3420 KOps/s 114.2803 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-False-True-True] 59.6610μs 25.4224μs 39.3353 KOps/s 39.9588 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[True-False-False-True-False] 41.3770μs 16.9080μs 59.1438 KOps/s 59.6538 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-False-False-False-True] 37.2190μs 14.6107μs 68.4431 KOps/s 68.1102 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[True-False-False-False-False] 26.7090μs 10.1257μs 98.7583 KOps/s 100.3163 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[False-True-True-True-True] 57.2870μs 23.7906μs 42.0334 KOps/s 41.5919 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-True-True-True-False] 41.4680μs 15.5662μs 64.2416 KOps/s 64.5600 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-True-True-False-True] 56.1550μs 16.0024μs 62.4905 KOps/s 62.5367 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-True-False-False] 33.8430μs 9.9639μs 100.3619 KOps/s 99.1894 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[False-True-False-True-True] 38.5520μs 25.3450μs 39.4555 KOps/s 38.4517 KOps/s $\color{#35bf28}+2.61\%$
test_step_mdp_speed[False-True-False-True-False] 47.9790μs 16.8309μs 59.4147 KOps/s 60.1541 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[False-True-False-False-True] 45.8260μs 16.9903μs 58.8571 KOps/s 57.8285 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[False-True-False-False-False] 40.5560μs 11.1963μs 89.3148 KOps/s 89.0549 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-True-True] 58.0990μs 26.4076μs 37.8679 KOps/s 37.5508 KOps/s $\color{#35bf28}+0.84\%$
test_step_mdp_speed[False-False-True-True-False] 46.5770μs 18.1023μs 55.2416 KOps/s 55.0844 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-False-True] 39.7650μs 17.1285μs 58.3824 KOps/s 58.0826 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-False-True-False-False] 33.6320μs 11.2716μs 88.7182 KOps/s 88.7576 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-False-True-True] 56.3850μs 27.5271μs 36.3279 KOps/s 36.3426 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-False-True-False] 47.7490μs 19.1622μs 52.1860 KOps/s 51.7586 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-False-False-False-True] 52.0570μs 18.1157μs 55.2009 KOps/s 54.8586 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[False-False-False-False-False] 38.3210μs 12.2786μs 81.4426 KOps/s 81.7991 KOps/s $\color{#d91a1a}-0.44\%$
test_values[generalized_advantage_estimate-True-True] 9.3082ms 9.1031ms 109.8531 Ops/s 109.1597 Ops/s $\color{#35bf28}+0.64\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.7088ms 33.2494ms 30.0758 Ops/s 28.4550 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_values[td0_return_estimate-False-False] 0.2324ms 0.1726ms 5.7930 KOps/s 6.1029 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_values[td1_return_estimate-False-False] 25.3061ms 22.7660ms 43.9251 Ops/s 43.7832 Ops/s $\color{#35bf28}+0.32\%$
test_values[vec_td1_return_estimate-False-False] 34.2761ms 33.1358ms 30.1789 Ops/s 26.7208 Ops/s $\textbf{\color{#35bf28}+12.94\%}$
test_values[td_lambda_return_estimate-True-False] 35.5103ms 32.5431ms 30.7285 Ops/s 30.3275 Ops/s $\color{#35bf28}+1.32\%$
test_values[vec_td_lambda_return_estimate-True-False] 34.0708ms 32.9284ms 30.3689 Ops/s 28.1435 Ops/s $\textbf{\color{#35bf28}+7.91\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.1805ms 8.0865ms 123.6625 Ops/s 123.0808 Ops/s $\color{#35bf28}+0.47\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2775ms 1.9945ms 501.3906 Ops/s 540.9245 Ops/s $\textbf{\color{#d91a1a}-7.31\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.3997ms 0.3391ms 2.9486 KOps/s 2.8891 KOps/s $\color{#35bf28}+2.06\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 41.1463ms 39.8231ms 25.1111 Ops/s 20.9196 Ops/s $\textbf{\color{#35bf28}+20.04\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6196ms 3.0002ms 333.3117 Ops/s 333.0719 Ops/s $\color{#35bf28}+0.07\%$
test_dqn_speed 6.8850ms 1.3307ms 751.4915 Ops/s 748.7925 Ops/s $\color{#35bf28}+0.36\%$
test_ddpg_speed 3.2116ms 2.6632ms 375.4933 Ops/s 381.1700 Ops/s $\color{#d91a1a}-1.49\%$
test_sac_speed 8.5614ms 8.0680ms 123.9462 Ops/s 124.2654 Ops/s $\color{#d91a1a}-0.26\%$
test_redq_speed 13.7058ms 12.9774ms 77.0571 Ops/s 78.7469 Ops/s $\color{#d91a1a}-2.15\%$
test_redq_deprec_speed 14.3489ms 12.6309ms 79.1709 Ops/s 78.3465 Ops/s $\color{#35bf28}+1.05\%$
test_td3_speed 11.6771ms 8.0665ms 123.9695 Ops/s 123.5014 Ops/s $\color{#35bf28}+0.38\%$
test_cql_speed 36.9957ms 35.4927ms 28.1748 Ops/s 27.8304 Ops/s $\color{#35bf28}+1.24\%$
test_a2c_speed 8.6423ms 7.2416ms 138.0918 Ops/s 138.9019 Ops/s $\color{#d91a1a}-0.58\%$
test_ppo_speed 8.7713ms 7.5360ms 132.6958 Ops/s 133.5072 Ops/s $\color{#d91a1a}-0.61\%$
test_reinforce_speed 7.1318ms 6.4609ms 154.7780 Ops/s 155.5235 Ops/s $\color{#d91a1a}-0.48\%$
test_iql_speed 0.1044s 35.8981ms 27.8567 Ops/s 31.4609 Ops/s $\textbf{\color{#d91a1a}-11.46\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.3191ms 2.1961ms 455.3514 Ops/s 473.7201 Ops/s $\color{#d91a1a}-3.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9963ms 0.4908ms 2.0376 KOps/s 2.0397 KOps/s $\color{#d91a1a}-0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7740ms 0.4646ms 2.1522 KOps/s 2.1589 KOps/s $\color{#d91a1a}-0.31\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.4950ms 2.1569ms 463.6378 Ops/s 482.7855 Ops/s $\color{#d91a1a}-3.97\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0621ms 0.4797ms 2.0848 KOps/s 2.0966 KOps/s $\color{#d91a1a}-0.56\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6824ms 0.4558ms 2.1939 KOps/s 2.1927 KOps/s $\color{#35bf28}+0.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7190ms 1.2068ms 828.6612 Ops/s 835.7293 Ops/s $\color{#d91a1a}-0.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 4.2383ms 1.1347ms 881.3249 Ops/s 884.4268 Ops/s $\color{#d91a1a}-0.35\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.0827ms 2.4075ms 415.3685 Ops/s 454.0136 Ops/s $\textbf{\color{#d91a1a}-8.51\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1476ms 0.6111ms 1.6364 KOps/s 1.6589 KOps/s $\color{#d91a1a}-1.36\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0031ms 0.5751ms 1.7390 KOps/s 1.7492 KOps/s $\color{#d91a1a}-0.58\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.2331ms 2.0840ms 479.8495 Ops/s 481.3325 Ops/s $\color{#d91a1a}-0.31\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0183ms 0.4921ms 2.0321 KOps/s 2.0296 KOps/s $\color{#35bf28}+0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6006ms 0.4633ms 2.1586 KOps/s 2.1587 KOps/s $-0.01\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2376ms 2.0851ms 479.5996 Ops/s 477.0680 Ops/s $\color{#35bf28}+0.53\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9165ms 0.4839ms 2.0666 KOps/s 2.0827 KOps/s $\color{#d91a1a}-0.77\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6017ms 0.4567ms 2.1898 KOps/s 2.1718 KOps/s $\color{#35bf28}+0.83\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.4876ms 2.2655ms 441.4030 Ops/s 459.0962 Ops/s $\color{#d91a1a}-3.85\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.6861ms 0.6000ms 1.6666 KOps/s 1.6703 KOps/s $\color{#d91a1a}-0.22\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.7308ms 0.5828ms 1.7159 KOps/s 1.7174 KOps/s $\color{#d91a1a}-0.09\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 86.3243ms 6.8658ms 145.6490 Ops/s 143.7423 Ops/s $\color{#35bf28}+1.33\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.3062ms 12.0747ms 82.8175 Ops/s 83.9017 Ops/s $\color{#d91a1a}-1.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5021ms 1.0680ms 936.3155 Ops/s 949.7060 Ops/s $\color{#d91a1a}-1.41\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 82.5388ms 5.2066ms 192.0648 Ops/s 148.7915 Ops/s $\textbf{\color{#35bf28}+29.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 91.1283ms 13.6166ms 73.4396 Ops/s 83.7965 Ops/s $\textbf{\color{#d91a1a}-12.36\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.4359ms 1.0415ms 960.1263 Ops/s 949.8287 Ops/s $\color{#35bf28}+1.08\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 84.1898ms 5.5483ms 180.2366 Ops/s 180.9042 Ops/s $\color{#d91a1a}-0.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.4382ms 12.3301ms 81.1024 Ops/s 71.9220 Ops/s $\textbf{\color{#35bf28}+12.76\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.9039ms 1.3973ms 715.6773 Ops/s 728.8968 Ops/s $\color{#d91a1a}-1.81\%$

Copy link

github-actions bot commented Mar 26, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1018s 0.1002s 9.9816 Ops/s 9.5485 Ops/s $\color{#35bf28}+4.54\%$
test_sync 95.6845ms 87.4083ms 11.4406 Ops/s 11.4281 Ops/s $\color{#35bf28}+0.11\%$
test_async 0.1633s 70.9785ms 14.0888 Ops/s 11.7276 Ops/s $\textbf{\color{#35bf28}+20.13\%}$
test_single_pixels 0.1115s 0.1113s 8.9838 Ops/s 9.0726 Ops/s $\color{#d91a1a}-0.98\%$
test_sync_pixels 68.6520ms 66.5519ms 15.0259 Ops/s 15.0775 Ops/s $\color{#d91a1a}-0.34\%$
test_async_pixels 0.1243s 55.7232ms 17.9459 Ops/s 17.9196 Ops/s $\color{#35bf28}+0.15\%$
test_simple 0.6682s 0.6677s 1.4977 Ops/s 1.4820 Ops/s $\color{#35bf28}+1.05\%$
test_transformed 0.9000s 0.8941s 1.1185 Ops/s 1.1164 Ops/s $\color{#35bf28}+0.19\%$
test_serial 2.1713s 2.1270s 0.4701 Ops/s 0.4797 Ops/s $\color{#d91a1a}-1.99\%$
test_parallel 1.8085s 1.7443s 0.5733 Ops/s 0.5686 Ops/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-True-True-True-True] 85.3720μs 33.5771μs 29.7822 KOps/s 29.7143 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-True-True-False] 34.8400μs 19.9471μs 50.1327 KOps/s 50.3572 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-True-True-False-True] 0.1186ms 19.2025μs 52.0767 KOps/s 53.0207 KOps/s $\color{#d91a1a}-1.78\%$
test_step_mdp_speed[True-True-True-False-False] 26.7510μs 11.4062μs 87.6719 KOps/s 88.3912 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-True-False-True-True] 71.2420μs 35.2393μs 28.3774 KOps/s 28.2278 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[True-True-False-True-False] 42.9410μs 21.8713μs 45.7219 KOps/s 45.8072 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-False-False-True] 45.5910μs 21.1840μs 47.2054 KOps/s 48.4493 KOps/s $\color{#d91a1a}-2.57\%$
test_step_mdp_speed[True-True-False-False-False] 71.1210μs 13.3807μs 74.7345 KOps/s 75.4030 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-False-True-True-True] 65.6710μs 37.2876μs 26.8186 KOps/s 26.8747 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-True-True-False] 58.7000μs 23.9201μs 41.8058 KOps/s 41.9359 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-False-True-False-True] 58.4520μs 20.8806μs 47.8914 KOps/s 48.6566 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[True-False-True-False-False] 38.5700μs 13.2741μs 75.3348 KOps/s 75.2661 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[True-False-False-True-True] 60.2110μs 39.0899μs 25.5820 KOps/s 25.6982 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-False-False-True-False] 51.0310μs 25.4469μs 39.2975 KOps/s 39.0406 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[True-False-False-False-True] 40.2900μs 22.3576μs 44.7276 KOps/s 44.7739 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-False-False-False-False] 49.6010μs 14.8623μs 67.2844 KOps/s 66.1983 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[False-True-True-True-True] 62.0020μs 37.3069μs 26.8047 KOps/s 27.0464 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[False-True-True-True-False] 46.6510μs 23.7708μs 42.0684 KOps/s 42.6558 KOps/s $\color{#d91a1a}-1.38\%$
test_step_mdp_speed[False-True-True-False-True] 88.8910μs 25.0020μs 39.9968 KOps/s 40.6076 KOps/s $\color{#d91a1a}-1.50\%$
test_step_mdp_speed[False-True-True-False-False] 81.1510μs 15.0837μs 66.2968 KOps/s 65.9031 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[False-True-False-True-True] 62.3410μs 39.1079μs 25.5703 KOps/s 25.3158 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[False-True-False-True-False] 57.1210μs 25.4356μs 39.3150 KOps/s 38.9710 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[False-True-False-False-True] 49.9500μs 26.7413μs 37.3954 KOps/s 37.6867 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[False-True-False-False-False] 0.1399ms 16.7668μs 59.6418 KOps/s 58.9917 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-True-True-True] 58.4110μs 40.6382μs 24.6074 KOps/s 24.2803 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[False-False-True-True-False] 57.5310μs 27.4775μs 36.3934 KOps/s 36.1544 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[False-False-True-False-True] 64.6810μs 26.8262μs 37.2769 KOps/s 37.5162 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[False-False-True-False-False] 71.2410μs 16.9366μs 59.0438 KOps/s 58.2247 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-False-False-True-True] 77.1020μs 42.5389μs 23.5079 KOps/s 23.1500 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-False-False-True-False] 54.8510μs 29.1804μs 34.2696 KOps/s 33.7991 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[False-False-False-False-True] 55.4710μs 28.2887μs 35.3498 KOps/s 35.5415 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-False-False-False-False] 35.4200μs 18.5007μs 54.0521 KOps/s 53.0393 KOps/s $\color{#35bf28}+1.91\%$
test_values[generalized_advantage_estimate-True-True] 23.5012ms 23.0228ms 43.4353 Ops/s 42.0491 Ops/s $\color{#35bf28}+3.30\%$
test_values[vec_generalized_advantage_estimate-True-True] 89.2001ms 3.3342ms 299.9211 Ops/s 313.3695 Ops/s $\color{#d91a1a}-4.29\%$
test_values[td0_return_estimate-False-False] 91.2410μs 61.7037μs 16.2065 KOps/s 16.1460 KOps/s $\color{#35bf28}+0.37\%$
test_values[td1_return_estimate-False-False] 52.7867ms 50.6921ms 19.7270 Ops/s 20.4094 Ops/s $\color{#d91a1a}-3.34\%$
test_values[vec_td1_return_estimate-False-False] 2.0765ms 1.7413ms 574.2765 Ops/s 579.6843 Ops/s $\color{#d91a1a}-0.93\%$
test_values[td_lambda_return_estimate-True-False] 81.5325ms 80.0706ms 12.4890 Ops/s 12.0581 Ops/s $\color{#35bf28}+3.57\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1582ms 1.7430ms 573.7330 Ops/s 579.2659 Ops/s $\color{#d91a1a}-0.96\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.4609ms 22.1547ms 45.1372 Ops/s 42.2871 Ops/s $\textbf{\color{#35bf28}+6.74\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8648ms 0.6723ms 1.4875 KOps/s 1.4897 KOps/s $\color{#d91a1a}-0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7280ms 0.6280ms 1.5922 KOps/s 1.6109 KOps/s $\color{#d91a1a}-1.16\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4728ms 1.4286ms 700.0102 Ops/s 701.8852 Ops/s $\color{#d91a1a}-0.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9207ms 0.6485ms 1.5421 KOps/s 1.5598 KOps/s $\color{#d91a1a}-1.13\%$
test_dqn_speed 8.0898ms 1.4356ms 696.5723 Ops/s 696.4597 Ops/s $\color{#35bf28}+0.02\%$
test_ddpg_speed 2.9211ms 2.7448ms 364.3312 Ops/s 371.7533 Ops/s $\color{#d91a1a}-2.00\%$
test_sac_speed 8.5782ms 8.1047ms 123.3857 Ops/s 126.1754 Ops/s $\color{#d91a1a}-2.21\%$
test_redq_speed 12.7311ms 10.3478ms 96.6393 Ops/s 98.1177 Ops/s $\color{#d91a1a}-1.51\%$
test_redq_deprec_speed 11.8628ms 11.2523ms 88.8705 Ops/s 91.9612 Ops/s $\color{#d91a1a}-3.36\%$
test_td3_speed 8.2091ms 8.0411ms 124.3612 Ops/s 125.8392 Ops/s $\color{#d91a1a}-1.17\%$
test_cql_speed 26.3584ms 25.3504ms 39.4471 Ops/s 40.6507 Ops/s $\color{#d91a1a}-2.96\%$
test_a2c_speed 6.3699ms 5.7085ms 175.1772 Ops/s 182.7128 Ops/s $\color{#d91a1a}-4.12\%$
test_ppo_speed 6.2647ms 6.0335ms 165.7414 Ops/s 171.3857 Ops/s $\color{#d91a1a}-3.29\%$
test_reinforce_speed 4.8758ms 4.5868ms 218.0147 Ops/s 224.4359 Ops/s $\color{#d91a1a}-2.86\%$
test_iql_speed 20.2923ms 19.4124ms 51.5135 Ops/s 52.2903 Ops/s $\color{#d91a1a}-1.49\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1632ms 2.9108ms 343.5521 Ops/s 349.8454 Ops/s $\color{#d91a1a}-1.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.3226ms 0.5483ms 1.8237 KOps/s 1.8621 KOps/s $\color{#d91a1a}-2.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7609ms 0.5215ms 1.9174 KOps/s 1.9473 KOps/s $\color{#d91a1a}-1.54\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1758ms 2.9185ms 342.6457 Ops/s 345.0473 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7412ms 0.5394ms 1.8540 KOps/s 1.8912 KOps/s $\color{#d91a1a}-1.97\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7956ms 0.5176ms 1.9321 KOps/s 1.9775 KOps/s $\color{#d91a1a}-2.29\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6280ms 1.4176ms 705.4065 Ops/s 719.4860 Ops/s $\color{#d91a1a}-1.96\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5694ms 1.3554ms 737.7966 Ops/s 749.6419 Ops/s $\color{#d91a1a}-1.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3109ms 3.0600ms 326.7952 Ops/s 333.8319 Ops/s $\color{#d91a1a}-2.11\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8666ms 0.6705ms 1.4915 KOps/s 1.5115 KOps/s $\color{#d91a1a}-1.32\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8810ms 0.6724ms 1.4872 KOps/s 1.3689 KOps/s $\textbf{\color{#35bf28}+8.64\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1464ms 2.9200ms 342.4665 Ops/s 346.2638 Ops/s $\color{#d91a1a}-1.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7231ms 0.5478ms 1.8256 KOps/s 1.8571 KOps/s $\color{#d91a1a}-1.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.7145ms 0.5308ms 1.8838 KOps/s 1.9262 KOps/s $\color{#d91a1a}-2.20\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1524ms 2.9564ms 338.2457 Ops/s 345.0272 Ops/s $\color{#d91a1a}-1.97\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7051ms 0.5395ms 1.8535 KOps/s 1.4877 KOps/s $\textbf{\color{#35bf28}+24.58\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7339ms 0.5184ms 1.9290 KOps/s 1.9636 KOps/s $\color{#d91a1a}-1.76\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2278ms 3.0575ms 327.0658 Ops/s 329.9026 Ops/s $\color{#d91a1a}-0.86\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8866ms 0.6708ms 1.4908 KOps/s 1.5027 KOps/s $\color{#d91a1a}-0.79\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 5.1057ms 0.6551ms 1.5266 KOps/s 1.5484 KOps/s $\color{#d91a1a}-1.41\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1162s 7.0731ms 141.3804 Ops/s 114.5769 Ops/s $\textbf{\color{#35bf28}+23.39\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.4568ms 15.2152ms 65.7236 Ops/s 67.6228 Ops/s $\color{#d91a1a}-2.81\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3319ms 1.1180ms 894.4152 Ops/s 935.3138 Ops/s $\color{#d91a1a}-4.37\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1029s 8.7653ms 114.0856 Ops/s 147.7455 Ops/s $\textbf{\color{#d91a1a}-22.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.6254ms 15.2137ms 65.7304 Ops/s 66.5585 Ops/s $\color{#d91a1a}-1.24\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.3192ms 1.1372ms 879.3433 Ops/s 916.2655 Ops/s $\color{#d91a1a}-4.03\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1012s 7.1066ms 140.7136 Ops/s 110.0828 Ops/s $\textbf{\color{#35bf28}+27.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.1159ms 15.4447ms 64.7471 Ops/s 66.3722 Ops/s $\color{#d91a1a}-2.45\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.7771ms 1.4694ms 680.5586 Ops/s 701.5309 Ops/s $\color{#d91a1a}-2.99\%$

@vmoens vmoens marked this pull request as ready for review March 28, 2024 17:40
@vmoens vmoens merged commit 2c485dd into main Mar 28, 2024
9 of 10 checks passed
@vmoens vmoens deleted the cast-transform branch March 28, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants