Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] auto-batch-size in dispatch #1109

Merged
merged 2 commits into from
Nov 25, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 25, 2024

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 25, 2024
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: fe5f7f5b04d08d0eb150ee9bf0fd4698171d43d2
Pull Request resolved: #1109
@vmoens vmoens changed the title [BugFix] auto-batch-size in dipatch [BugFix] auto-batch-size in dispatch Nov 25, 2024
@vmoens vmoens added the bug Something isn't working label Nov 25, 2024
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: fe5f7f5b04d08d0eb150ee9bf0fd4698171d43d2
Pull Request resolved: #1109
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: ca5b36195c28da65a20d42699346fbc06083181c
Pull Request resolved: #1109
@vmoens vmoens merged commit 7b15637 into gh/vmoens/40/base Nov 25, 2024
25 of 37 checks passed
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: ca5b36195c28da65a20d42699346fbc06083181c
Pull Request resolved: #1109
@vmoens vmoens deleted the gh/vmoens/40/head branch November 25, 2024 08:53
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}31$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 41.6520μs 10.3883μs 96.2626 KOps/s 92.1210 KOps/s $\color{#35bf28}+4.50\%$
test_plain_set_stack_nested 34.8920μs 10.4739μs 95.4757 KOps/s 92.3599 KOps/s $\color{#35bf28}+3.37\%$
test_plain_set_nested_inplace 41.9120μs 11.3885μs 87.8083 KOps/s 85.3981 KOps/s $\color{#35bf28}+2.82\%$
test_plain_set_stack_nested_inplace 37.3220μs 11.3691μs 87.9577 KOps/s 85.5795 KOps/s $\color{#35bf28}+2.78\%$
test_items 37.2520μs 2.9160μs 342.9370 KOps/s 342.0708 KOps/s $\color{#35bf28}+0.25\%$
test_items_nested 0.4079ms 0.3216ms 3.1091 KOps/s 3.0974 KOps/s $\color{#35bf28}+0.38\%$
test_items_nested_locked 0.3676ms 0.3248ms 3.0791 KOps/s 3.0754 KOps/s $\color{#35bf28}+0.12\%$
test_items_nested_leaf 0.1007ms 58.2977μs 17.1533 KOps/s 17.0492 KOps/s $\color{#35bf28}+0.61\%$
test_items_stack_nested 0.3784ms 0.3244ms 3.0824 KOps/s 3.0986 KOps/s $\color{#d91a1a}-0.52\%$
test_items_stack_nested_leaf 88.4250μs 60.0635μs 16.6490 KOps/s 16.7141 KOps/s $\color{#d91a1a}-0.39\%$
test_items_stack_nested_locked 0.3838ms 0.3231ms 3.0948 KOps/s 3.1009 KOps/s $\color{#d91a1a}-0.20\%$
test_keys 35.3820μs 3.5048μs 285.3225 KOps/s 288.5344 KOps/s $\color{#d91a1a}-1.11\%$
test_keys_nested 0.1080ms 70.7479μs 14.1347 KOps/s 14.1423 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_nested_locked 0.7852ms 75.7746μs 13.1970 KOps/s 13.0538 KOps/s $\color{#35bf28}+1.10\%$
test_keys_nested_leaf 0.1028ms 61.7607μs 16.1915 KOps/s 16.2281 KOps/s $\color{#d91a1a}-0.23\%$
test_keys_stack_nested 0.1151ms 71.0634μs 14.0719 KOps/s 14.1995 KOps/s $\color{#d91a1a}-0.90\%$
test_keys_stack_nested_leaf 94.9050μs 62.7813μs 15.9283 KOps/s 16.1101 KOps/s $\color{#d91a1a}-1.13\%$
test_keys_stack_nested_locked 0.1150ms 76.9137μs 13.0016 KOps/s 13.1017 KOps/s $\color{#d91a1a}-0.76\%$
test_values 4.9185μs 0.8468μs 1.1809 MOps/s 1.1793 MOps/s $\color{#35bf28}+0.14\%$
test_values_nested 64.1130μs 31.3850μs 31.8623 KOps/s 32.0966 KOps/s $\color{#d91a1a}-0.73\%$
test_values_nested_locked 72.4340μs 32.8884μs 30.4059 KOps/s 30.4367 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested_leaf 91.3350μs 33.5721μs 29.7866 KOps/s 29.8642 KOps/s $\color{#d91a1a}-0.26\%$
test_values_stack_nested 61.7340μs 31.8036μs 31.4429 KOps/s 31.5442 KOps/s $\color{#d91a1a}-0.32\%$
test_values_stack_nested_leaf 57.1730μs 34.5041μs 28.9821 KOps/s 29.2624 KOps/s $\color{#d91a1a}-0.96\%$
test_values_stack_nested_locked 65.8830μs 33.5546μs 29.8022 KOps/s 30.0766 KOps/s $\color{#d91a1a}-0.91\%$
test_membership 1.9121μs 0.5078μs 1.9692 MOps/s 1.9871 MOps/s $\color{#d91a1a}-0.90\%$
test_membership_nested 16.1310μs 1.9032μs 525.4302 KOps/s 516.5602 KOps/s $\color{#35bf28}+1.72\%$
test_membership_nested_leaf 13.9960μs 1.9335μs 517.1840 KOps/s 506.2981 KOps/s $\color{#35bf28}+2.15\%$
test_membership_stacked_nested 32.6420μs 2.0123μs 496.9414 KOps/s 492.0776 KOps/s $\color{#35bf28}+0.99\%$
test_membership_stacked_nested_leaf 45.9220μs 1.9384μs 515.8920 KOps/s 497.8925 KOps/s $\color{#35bf28}+3.62\%$
test_membership_nested_last 38.1520μs 2.8460μs 351.3699 KOps/s 347.6808 KOps/s $\color{#35bf28}+1.06\%$
test_membership_nested_leaf_last 29.3810μs 2.8678μs 348.6979 KOps/s 346.1113 KOps/s $\color{#35bf28}+0.75\%$
test_membership_stacked_nested_last 31.8110μs 3.3054μs 302.5343 KOps/s 207.8609 KOps/s $\textbf{\color{#35bf28}+45.55\%}$
test_membership_stacked_nested_leaf_last 31.9810μs 3.3061μs 302.4668 KOps/s 206.5862 KOps/s $\textbf{\color{#35bf28}+46.41\%}$
test_nested_getleaf 49.0120μs 5.9844μs 167.1009 KOps/s 168.3840 KOps/s $\color{#d91a1a}-0.76\%$
test_nested_get 33.6120μs 5.7214μs 174.7818 KOps/s 175.5190 KOps/s $\color{#d91a1a}-0.42\%$
test_stacked_getleaf 44.1820μs 5.9654μs 167.6327 KOps/s 165.1576 KOps/s $\color{#35bf28}+1.50\%$
test_stacked_get 29.6110μs 5.6885μs 175.7930 KOps/s 175.4637 KOps/s $\color{#35bf28}+0.19\%$
test_nested_getitemleaf 28.7520μs 6.0727μs 164.6714 KOps/s 162.0122 KOps/s $\color{#35bf28}+1.64\%$
test_nested_getitem 27.5210μs 5.7815μs 172.9659 KOps/s 171.3101 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getitemleaf 35.3620μs 6.0783μs 164.5192 KOps/s 163.8056 KOps/s $\color{#35bf28}+0.44\%$
test_stacked_getitem 28.7420μs 5.7421μs 174.1523 KOps/s 172.3546 KOps/s $\color{#35bf28}+1.04\%$
test_lock_nested 9.1912ms 0.3724ms 2.6851 KOps/s 2.6829 KOps/s $\color{#35bf28}+0.08\%$
test_lock_stack_nested 0.3900ms 0.3302ms 3.0284 KOps/s 2.9759 KOps/s $\color{#35bf28}+1.76\%$
test_unlock_nested 0.6501ms 0.3069ms 3.2579 KOps/s 3.2323 KOps/s $\color{#35bf28}+0.79\%$
test_unlock_stack_nested 0.3097ms 0.2727ms 3.6673 KOps/s 3.6307 KOps/s $\color{#35bf28}+1.01\%$
test_flatten_speed 97.0450μs 72.5279μs 13.7878 KOps/s 13.4329 KOps/s $\color{#35bf28}+2.64\%$
test_unflatten_speed 0.3334ms 0.2988ms 3.3462 KOps/s 3.3661 KOps/s $\color{#d91a1a}-0.59\%$
test_common_ops 1.6569ms 0.5680ms 1.7605 KOps/s 1.6690 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_creation 96.9350μs 1.4656μs 682.2943 KOps/s 699.3221 KOps/s $\color{#d91a1a}-2.43\%$
test_creation_empty 28.4520μs 6.9333μs 144.2324 KOps/s 130.7191 KOps/s $\textbf{\color{#35bf28}+10.34\%}$
test_creation_nested_1 45.5720μs 8.5054μs 117.5729 KOps/s 107.5781 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_creation_nested_2 29.9820μs 11.1179μs 89.9451 KOps/s 85.0303 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_clone 93.7540μs 10.3722μs 96.4118 KOps/s 92.4463 KOps/s $\color{#35bf28}+4.29\%$
test_getitem[int] 1.7352ms 10.4963μs 95.2713 KOps/s 92.7088 KOps/s $\color{#35bf28}+2.76\%$
test_getitem[slice_int] 0.1159ms 20.0078μs 49.9806 KOps/s 48.5400 KOps/s $\color{#35bf28}+2.97\%$
test_getitem[range] 0.1599ms 36.5183μs 27.3835 KOps/s 26.9104 KOps/s $\color{#35bf28}+1.76\%$
test_getitem[tuple] 0.1098ms 17.7100μs 56.4652 KOps/s 54.8308 KOps/s $\color{#35bf28}+2.98\%$
test_getitem[list] 0.6004ms 34.3817μs 29.0852 KOps/s 30.5075 KOps/s $\color{#d91a1a}-4.66\%$
test_setitem_dim[int] 43.2120μs 20.1826μs 49.5477 KOps/s 53.2414 KOps/s $\textbf{\color{#d91a1a}-6.94\%}$
test_setitem_dim[slice_int] 90.2440μs 40.6043μs 24.6279 KOps/s 26.7571 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_setitem_dim[range] 0.1222ms 56.7839μs 17.6106 KOps/s 18.7662 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_setitem_dim[tuple] 82.0840μs 33.8517μs 29.5406 KOps/s 31.7355 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_setitem 83.5640μs 15.4208μs 64.8474 KOps/s 63.9621 KOps/s $\color{#35bf28}+1.38\%$
test_set 87.3240μs 14.8161μs 67.4941 KOps/s 65.5914 KOps/s $\color{#35bf28}+2.90\%$
test_set_shared 1.6246ms 0.1452ms 6.8852 KOps/s 6.8037 KOps/s $\color{#35bf28}+1.20\%$
test_update 0.3599ms 16.3002μs 61.3491 KOps/s 56.4068 KOps/s $\textbf{\color{#35bf28}+8.76\%}$
test_update_nested 86.2350μs 21.4389μs 46.6442 KOps/s 44.3219 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_update__nested 0.8083ms 24.0806μs 41.5272 KOps/s 40.0244 KOps/s $\color{#35bf28}+3.75\%$
test_set_nested 76.9540μs 15.0398μs 66.4905 KOps/s 60.9344 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_set_nested_new 94.4950μs 16.9898μs 58.8590 KOps/s 54.3965 KOps/s $\textbf{\color{#35bf28}+8.20\%}$
test_select 90.0940μs 28.9199μs 34.5782 KOps/s 32.9701 KOps/s $\color{#35bf28}+4.88\%$
test_select_nested 84.0250μs 41.6223μs 24.0256 KOps/s 23.5657 KOps/s $\color{#35bf28}+1.95\%$
test_exclude_nested 90.1250μs 59.1752μs 16.8990 KOps/s 16.8304 KOps/s $\color{#35bf28}+0.41\%$
test_empty[True] 0.3024ms 0.2573ms 3.8870 KOps/s 3.9096 KOps/s $\color{#d91a1a}-0.58\%$
test_empty[False] 3.6942μs 0.7396μs 1.3520 MOps/s 1.3448 MOps/s $\color{#35bf28}+0.54\%$
test_to 88.1450μs 54.7129μs 18.2772 KOps/s 18.5069 KOps/s $\color{#d91a1a}-1.24\%$
test_to_nonblocking 90.9140μs 45.1148μs 22.1657 KOps/s 21.8980 KOps/s $\color{#35bf28}+1.22\%$
test_unbind_speed 1.7570ms 0.2299ms 4.3499 KOps/s 4.2727 KOps/s $\color{#35bf28}+1.81\%$
test_unbind_speed_stack0 0.2917ms 0.2287ms 4.3734 KOps/s 4.2744 KOps/s $\color{#35bf28}+2.31\%$
test_unbind_speed_stack1 91.4211ms 0.6396ms 1.5634 KOps/s 1.5422 KOps/s $\color{#35bf28}+1.37\%$
test_split 93.9186ms 1.5659ms 638.5958 Ops/s 622.3903 Ops/s $\color{#35bf28}+2.60\%$
test_chunk 93.7139ms 1.5463ms 646.7209 Ops/s 626.2122 Ops/s $\color{#35bf28}+3.28\%$
test_consolidate[False-None] 95.6074ms 2.8084ms 356.0797 Ops/s 349.4076 Ops/s $\color{#35bf28}+1.91\%$
test_consolidate[default-None] 2.0265ms 1.6326ms 612.5155 Ops/s 589.7904 Ops/s $\color{#35bf28}+3.85\%$
test_consolidate[reduce-overhead-None] 2.0633ms 1.6654ms 600.4581 Ops/s 578.4181 Ops/s $\color{#35bf28}+3.81\%$
test_consolidate_njt[False-None] 6.8771ms 6.5066ms 153.6906 Ops/s 111.1635 Ops/s $\textbf{\color{#35bf28}+38.26\%}$
test_to[False-False-None] 2.0730ms 1.6645ms 600.7768 Ops/s 598.4191 Ops/s $\color{#35bf28}+0.39\%$
test_to[True-False-None] 1.6442ms 1.2540ms 797.4741 Ops/s 792.9677 Ops/s $\color{#35bf28}+0.57\%$
test_to[within-False-None] 4.3141ms 3.9528ms 252.9852 Ops/s 249.2839 Ops/s $\color{#35bf28}+1.48\%$
test_to[True-default-None] 5.5071ms 5.1059ms 195.8534 Ops/s 185.5636 Ops/s $\textbf{\color{#35bf28}+5.55\%}$
test_to_njt[False-False-None] 7.3496ms 6.9370ms 144.1540 Ops/s 140.0808 Ops/s $\color{#35bf28}+2.91\%$
test_to_njt[True-False-None] 6.3057ms 5.4943ms 182.0079 Ops/s 180.7839 Ops/s $\color{#35bf28}+0.68\%$
test_to_njt[within-False-None] 12.6000ms 12.0881ms 82.7260 Ops/s 81.4417 Ops/s $\color{#35bf28}+1.58\%$
test_creation[device0] 0.4940ms 78.7996μs 12.6904 KOps/s 12.6300 KOps/s $\color{#35bf28}+0.48\%$
test_creation_from_tensor 0.5164ms 82.4832μs 12.1237 KOps/s 12.2688 KOps/s $\color{#d91a1a}-1.18\%$
test_add_one[memmap_tensor0] 0.2666ms 6.5760μs 152.0675 KOps/s 142.3964 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_contiguous[memmap_tensor0] 19.3470μs 0.3920μs 2.5511 MOps/s 2.3956 MOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_stack[memmap_tensor0] 44.3220μs 4.3320μs 230.8420 KOps/s 213.0591 KOps/s $\textbf{\color{#35bf28}+8.35\%}$
test_memmaptd_index 1.7632ms 0.2475ms 4.0411 KOps/s 3.9677 KOps/s $\color{#35bf28}+1.85\%$
test_memmaptd_index_astensor 0.6712ms 0.3037ms 3.2930 KOps/s 3.2234 KOps/s $\color{#35bf28}+2.16\%$
test_memmaptd_index_op 0.9739ms 0.5620ms 1.7794 KOps/s 1.6937 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_serialize_model 0.1308s 0.1298s 7.7022 Ops/s 7.6943 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_model_pickle 1.3468s 1.2149s 0.8231 Ops/s 0.8113 Ops/s $\color{#35bf28}+1.46\%$
test_serialize_weights 0.1306s 0.1291s 7.7456 Ops/s 7.7356 Ops/s $\color{#35bf28}+0.13\%$
test_serialize_weights_returnearly 0.2961s 55.4824ms 18.0237 Ops/s 15.1530 Ops/s $\textbf{\color{#35bf28}+18.94\%}$
test_serialize_weights_pickle 1.4047s 1.2038s 0.8307 Ops/s 0.8127 Ops/s $\color{#35bf28}+2.21\%$
test_reshape_pytree 55.6920μs 22.5719μs 44.3029 KOps/s 45.3490 KOps/s $\color{#d91a1a}-2.31\%$
test_reshape_td 0.4052ms 27.8941μs 35.8498 KOps/s 37.5842 KOps/s $\color{#d91a1a}-4.61\%$
test_view_pytree 68.2330μs 21.9408μs 45.5772 KOps/s 45.5755 KOps/s $+0.00\%$
test_view_td 0.1503ms 29.5266μs 33.8677 KOps/s 32.5920 KOps/s $\color{#35bf28}+3.91\%$
test_unbind_pytree 84.5740μs 27.6489μs 36.1678 KOps/s 35.5400 KOps/s $\color{#35bf28}+1.77\%$
test_unbind_td 0.5854ms 34.8455μs 28.6981 KOps/s 27.5321 KOps/s $\color{#35bf28}+4.24\%$
test_split_pytree 72.2830μs 29.5134μs 33.8829 KOps/s 33.6460 KOps/s $\color{#35bf28}+0.70\%$
test_split_td 0.7689ms 37.7650μs 26.4796 KOps/s 25.8262 KOps/s $\color{#35bf28}+2.53\%$
test_add_pytree 69.5640μs 34.2981μs 29.1562 KOps/s 28.6855 KOps/s $\color{#35bf28}+1.64\%$
test_add_td 78.5340μs 44.4317μs 22.5065 KOps/s 21.2116 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_compile_add_one_nested[tensordict-compile] 0.2364ms 0.1185ms 8.4418 KOps/s 7.9761 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_compile_add_one_nested[tensordict-eager] 0.2176ms 0.1236ms 8.0881 KOps/s 8.0509 KOps/s $\color{#35bf28}+0.46\%$
test_compile_add_one_nested[pytree-compile] 0.1728ms 95.9050μs 10.4270 KOps/s 9.9900 KOps/s $\color{#35bf28}+4.37\%$
test_compile_add_one_nested[pytree-eager] 1.3799ms 0.1461ms 6.8462 KOps/s 6.5996 KOps/s $\color{#35bf28}+3.74\%$
test_compile_copy_nested[tensordict-compile] 62.1330μs 23.1799μs 43.1409 KOps/s 44.7382 KOps/s $\color{#d91a1a}-3.57\%$
test_compile_copy_nested[tensordict-eager] 0.3895ms 26.2702μs 38.0660 KOps/s 37.3399 KOps/s $\color{#35bf28}+1.94\%$
test_compile_copy_nested[pytree-compile] 0.4444ms 64.0826μs 15.6048 KOps/s 15.1943 KOps/s $\color{#35bf28}+2.70\%$
test_compile_copy_nested[pytree-eager] 87.0140μs 49.6570μs 20.1382 KOps/s 19.8969 KOps/s $\color{#35bf28}+1.21\%$
test_compile_add_one_flat[tensordict-compile] 0.2285ms 0.1424ms 7.0227 KOps/s 7.0886 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_add_one_flat[tensordict-eager] 0.2981ms 0.2063ms 4.8479 KOps/s 4.8922 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_one_flat[tensorclass-compile] 0.2468ms 97.0813μs 10.3006 KOps/s 10.0749 KOps/s $\color{#35bf28}+2.24\%$
test_compile_add_one_flat[tensorclass-eager] 0.1779ms 51.3157μs 19.4872 KOps/s 18.9075 KOps/s $\color{#35bf28}+3.07\%$
test_compile_add_one_flat[pytree-compile] 0.2814ms 0.1359ms 7.3596 KOps/s 7.3744 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_add_one_flat[pytree-eager] 0.6159ms 0.4681ms 2.1364 KOps/s 1.8865 KOps/s $\textbf{\color{#35bf28}+13.25\%}$
test_compile_add_self_flat[tensordict-eager] 0.3621ms 0.2460ms 4.0657 KOps/s 4.0701 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_add_self_flat[tensordict-compile] 0.2529ms 0.1434ms 6.9723 KOps/s 7.0714 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_add_self_flat[tensorclass-eager] 0.1444ms 60.6676μs 16.4833 KOps/s 16.3224 KOps/s $\color{#35bf28}+0.99\%$
test_compile_add_self_flat[tensorclass-compile] 0.1592ms 98.1524μs 10.1882 KOps/s 10.3731 KOps/s $\color{#d91a1a}-1.78\%$
test_compile_add_self_flat[pytree-eager] 0.5728ms 0.4046ms 2.4713 KOps/s 2.5209 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_add_self_flat[pytree-compile] 0.1862ms 0.1359ms 7.3570 KOps/s 7.4326 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_copy_flat[tensordict-compile] 71.5840μs 19.5771μs 51.0801 KOps/s 53.7579 KOps/s $\color{#d91a1a}-4.98\%$
test_compile_copy_flat[tensordict-eager] 87.8340μs 27.4783μs 36.3923 KOps/s 37.7465 KOps/s $\color{#d91a1a}-3.59\%$
test_compile_copy_flat[pytree-compile] 0.1118ms 69.6306μs 14.3615 KOps/s 14.3353 KOps/s $\color{#35bf28}+0.18\%$
test_compile_copy_flat[pytree-eager] 0.1594ms 51.2070μs 19.5286 KOps/s 19.3939 KOps/s $\color{#35bf28}+0.69\%$
test_compile_assign_and_add[tensordict-compile] 1.6082ms 0.3891ms 2.5701 KOps/s 2.1386 KOps/s $\textbf{\color{#35bf28}+20.18\%}$
test_compile_assign_and_add[tensordict-eager] 2.7019ms 2.5636ms 390.0778 Ops/s 371.7801 Ops/s $\color{#35bf28}+4.92\%$
test_compile_assign_and_add[pytree-compile] 1.5960ms 0.4345ms 2.3013 KOps/s 2.2666 KOps/s $\color{#35bf28}+1.53\%$
test_compile_assign_and_add[pytree-eager] 2.7757ms 2.5872ms 386.5157 Ops/s 378.0587 Ops/s $\color{#35bf28}+2.24\%$
test_compile_indexing[tensor-tensordict-compile] 0.1536ms 0.1117ms 8.9490 KOps/s 8.8309 KOps/s $\color{#35bf28}+1.34\%$
test_compile_indexing[tensor-tensordict-eager] 0.5618ms 78.7741μs 12.6945 KOps/s 12.1089 KOps/s $\color{#35bf28}+4.84\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1930ms 0.1114ms 8.9732 KOps/s 9.4243 KOps/s $\color{#d91a1a}-4.79\%$
test_compile_indexing[tensor-tensorclass-eager] 0.4893ms 68.5638μs 14.5849 KOps/s 14.7420 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[tensor-pytree-compile] 0.5031ms 0.1108ms 9.0219 KOps/s 9.5687 KOps/s $\textbf{\color{#d91a1a}-5.71\%}$
test_compile_indexing[tensor-pytree-eager] 0.4556ms 69.5702μs 14.3740 KOps/s 14.9355 KOps/s $\color{#d91a1a}-3.76\%$
test_compile_indexing[slice-tensordict-compile] 0.1362ms 0.1002ms 9.9831 KOps/s 9.6562 KOps/s $\color{#35bf28}+3.39\%$
test_compile_indexing[slice-tensordict-eager] 0.4234ms 18.6077μs 53.7411 KOps/s 55.0598 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_indexing[slice-tensorclass-compile] 0.4921ms 98.4844μs 10.1539 KOps/s 10.1673 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[slice-tensorclass-eager] 0.4017ms 16.5786μs 60.3189 KOps/s 63.1165 KOps/s $\color{#d91a1a}-4.43\%$
test_compile_indexing[slice-pytree-compile] 0.5028ms 98.6277μs 10.1391 KOps/s 9.9578 KOps/s $\color{#35bf28}+1.82\%$
test_compile_indexing[slice-pytree-eager] 56.7830μs 16.3186μs 61.2798 KOps/s 62.9969 KOps/s $\color{#d91a1a}-2.73\%$
test_compile_indexing[int-tensordict-compile] 0.5165ms 0.1050ms 9.5204 KOps/s 9.7360 KOps/s $\color{#d91a1a}-2.21\%$
test_compile_indexing[int-tensordict-eager] 0.5792ms 18.0808μs 55.3072 KOps/s 56.8424 KOps/s $\color{#d91a1a}-2.70\%$
test_compile_indexing[int-tensorclass-compile] 0.4953ms 0.1013ms 9.8760 KOps/s 10.0660 KOps/s $\color{#d91a1a}-1.89\%$
test_compile_indexing[int-tensorclass-eager] 0.4223ms 16.2312μs 61.6099 KOps/s 63.7013 KOps/s $\color{#d91a1a}-3.28\%$
test_compile_indexing[int-pytree-compile] 0.4889ms 99.5374μs 10.0465 KOps/s 9.9563 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[int-pytree-eager] 0.3993ms 15.9177μs 62.8230 KOps/s 64.0035 KOps/s $\color{#d91a1a}-1.84\%$
test_mod_add[eager] 84.6940μs 33.0938μs 30.2172 KOps/s 28.5809 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_mod_add[compile] 0.4833ms 79.1903μs 12.6278 KOps/s 12.3792 KOps/s $\color{#35bf28}+2.01\%$
test_mod_add[compile-overhead] 0.3225ms 0.1653ms 6.0512 KOps/s 5.7853 KOps/s $\color{#35bf28}+4.60\%$
test_mod_wrap[eager] 0.6415ms 0.2399ms 4.1691 KOps/s 4.0808 KOps/s $\color{#35bf28}+2.16\%$
test_mod_wrap[compile] 0.3556ms 0.2952ms 3.3880 KOps/s 3.5094 KOps/s $\color{#d91a1a}-3.46\%$
test_mod_wrap[compile-overhead] 7.2048ms 3.7289ms 268.1740 Ops/s 262.2447 Ops/s $\color{#35bf28}+2.26\%$
test_mod_wrap_and_backward[eager] 1.6142ms 1.4743ms 678.2945 Ops/s 693.1764 Ops/s $\color{#d91a1a}-2.15\%$
test_mod_wrap_and_backward[compile] 1.4305ms 1.3503ms 740.5621 Ops/s 730.9096 Ops/s $\color{#35bf28}+1.32\%$
test_mod_wrap_and_backward[compile-overhead] 1.5262ms 1.0305ms 970.4423 Ops/s 967.0572 Ops/s $\color{#35bf28}+0.35\%$
test_seq_add[eager] 0.2252ms 98.1831μs 10.1851 KOps/s 9.4774 KOps/s $\textbf{\color{#35bf28}+7.47\%}$
test_seq_add[compile] 0.1499ms 91.6416μs 10.9121 KOps/s 11.3923 KOps/s $\color{#d91a1a}-4.22\%$
test_seq_add[compile-overhead] 0.2740ms 0.1299ms 7.6958 KOps/s 7.7799 KOps/s $\color{#d91a1a}-1.08\%$
test_seq_wrap[eager] 0.4727ms 0.3987ms 2.5080 KOps/s 2.5615 KOps/s $\color{#d91a1a}-2.09\%$
test_seq_wrap[compile] 0.5208ms 0.3124ms 3.2010 KOps/s 3.3240 KOps/s $\color{#d91a1a}-3.70\%$
test_seq_wrap[compile-overhead] 0.2918ms 0.2271ms 4.4032 KOps/s 4.3284 KOps/s $\color{#35bf28}+1.73\%$
test_func_call_runtime[False-eager] 0.9157ms 0.7628ms 1.3110 KOps/s 1.2466 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_func_call_runtime[False-compile] 0.7749ms 0.7337ms 1.3629 KOps/s 1.3278 KOps/s $\color{#35bf28}+2.64\%$
test_func_call_runtime[False-compile-overhead] 0.4424ms 0.3569ms 2.8018 KOps/s 2.7472 KOps/s $\color{#35bf28}+1.99\%$
test_func_call_runtime[True-eager] 1.0446ms 0.8819ms 1.1339 KOps/s 1.1072 KOps/s $\color{#35bf28}+2.41\%$
test_func_call_runtime[True-compile] 0.8699ms 0.7541ms 1.3261 KOps/s 1.2963 KOps/s $\color{#35bf28}+2.30\%$
test_func_call_runtime[True-compile-overhead] 0.4558ms 0.3783ms 2.6433 KOps/s 2.6013 KOps/s $\color{#35bf28}+1.61\%$
test_func_call_cm_runtime[False-eager] 0.8139ms 0.7199ms 1.3890 KOps/s 1.3543 KOps/s $\color{#35bf28}+2.56\%$
test_func_call_cm_runtime[False-compile] 1.1293ms 0.7377ms 1.3555 KOps/s 1.3272 KOps/s $\color{#35bf28}+2.13\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4464ms 0.3593ms 2.7833 KOps/s 2.7424 KOps/s $\color{#35bf28}+1.49\%$
test_func_call_cm_runtime[True-eager] 1.3994ms 0.9860ms 1.0142 KOps/s 995.2619 Ops/s $\color{#35bf28}+1.90\%$
test_func_call_cm_runtime[True-compile] 1.1737ms 0.7849ms 1.2740 KOps/s 1.2499 KOps/s $\color{#35bf28}+1.93\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8084ms 0.4063ms 2.4610 KOps/s 2.4397 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_func_call_cm_runtime[eager] 2.5742ms 2.0606ms 485.2909 Ops/s 481.1133 Ops/s $\color{#35bf28}+0.87\%$
test_vmap_func_call_cm_runtime[compile] 0.9474ms 0.7919ms 1.2628 KOps/s 1.1964 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4635ms 0.4072ms 2.4559 KOps/s 2.4209 KOps/s $\color{#35bf28}+1.45\%$
test_distributed 1.7970ms 0.1995ms 5.0131 KOps/s 8.8136 KOps/s $\textbf{\color{#d91a1a}-43.12\%}$
test_tdmodule 0.3268ms 15.3518μs 65.1391 KOps/s 62.7260 KOps/s $\color{#35bf28}+3.85\%$
test_tdmodule_dispatch 47.6520μs 28.9899μs 34.4947 KOps/s 28.5413 KOps/s $\textbf{\color{#35bf28}+20.86\%}$
test_tdseq 38.1720μs 15.0190μs 66.5825 KOps/s 63.6091 KOps/s $\color{#35bf28}+4.67\%$
test_tdseq_dispatch 57.9530μs 31.0929μs 32.1617 KOps/s 30.2727 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_instantiation_functorch 1.6672ms 1.5566ms 642.4258 Ops/s 636.0971 Ops/s $\color{#35bf28}+0.99\%$
test_exec_functorch 0.2133ms 0.1408ms 7.1010 KOps/s 6.7190 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_exec_functional_call 0.1851ms 0.1347ms 7.4218 KOps/s 6.9495 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_exec_td_decorator 0.3758ms 0.1812ms 5.5173 KOps/s 5.2392 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_vmap_mlp_speed_decorator[True-True] 0.7510ms 0.6704ms 1.4917 KOps/s 1.4819 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-False] 0.8746ms 0.6739ms 1.4838 KOps/s 1.4750 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed_decorator[False-True] 0.7574ms 0.5900ms 1.6950 KOps/s 1.6869 KOps/s $\color{#35bf28}+0.48\%$
test_vmap_mlp_speed_decorator[False-False] 0.7307ms 0.5928ms 1.6869 KOps/s 1.6935 KOps/s $\color{#d91a1a}-0.39\%$
test_vmap_transformer_speed_decorator[True-True] 19.0358ms 18.9365ms 52.8081 Ops/s 52.4536 Ops/s $\color{#35bf28}+0.68\%$
test_vmap_transformer_speed_decorator[True-False] 19.6687ms 19.0745ms 52.4259 Ops/s 52.3682 Ops/s $\color{#35bf28}+0.11\%$
test_vmap_transformer_speed_decorator[False-True] 18.9837ms 18.8532ms 53.0414 Ops/s 52.8819 Ops/s $\color{#35bf28}+0.30\%$
test_vmap_transformer_speed_decorator[False-False] 19.4567ms 18.8810ms 52.9633 Ops/s 52.7297 Ops/s $\color{#35bf28}+0.44\%$
test_to_module_speed[True] 1.0358ms 0.9376ms 1.0665 KOps/s 1.0659 KOps/s $\color{#35bf28}+0.06\%$
test_to_module_speed[False] 1.3907ms 0.9184ms 1.0888 KOps/s 1.0852 KOps/s $\color{#35bf28}+0.33\%$
test_tc_init 72.5940μs 35.3663μs 28.2755 KOps/s 27.7717 KOps/s $\color{#35bf28}+1.81\%$
test_tc_init_nested 0.1116ms 74.3781μs 13.4448 KOps/s 13.5339 KOps/s $\color{#d91a1a}-0.66\%$
test_tc_first_layer_tensor 6.5161μs 0.7027μs 1.4230 MOps/s 1.4228 MOps/s $\color{#35bf28}+0.01\%$
test_tc_first_layer_nontensor 26.3810μs 2.3226μs 430.5588 KOps/s 430.3570 KOps/s $\color{#35bf28}+0.05\%$
test_tc_second_layer_tensor 8.7580μs 1.4170μs 705.7208 KOps/s 688.2588 KOps/s $\color{#35bf28}+2.54\%$
test_tc_second_layer_nontensor 27.2210μs 3.0458μs 328.3177 KOps/s 327.0112 KOps/s $\color{#35bf28}+0.40\%$
test_unbind 0.2225s 9.8275ms 101.7550 Ops/s 151.3914 Ops/s $\textbf{\color{#d91a1a}-32.79\%}$
test_full_like 10.1885ms 9.1851ms 108.8717 Ops/s 108.9215 Ops/s $\color{#d91a1a}-0.05\%$
test_zeros_like 9.1577ms 7.1231ms 140.3875 Ops/s 115.4698 Ops/s $\textbf{\color{#35bf28}+21.58\%}$
test_ones_like 5.2000ms 4.3285ms 231.0251 Ops/s 231.9148 Ops/s $\color{#d91a1a}-0.38\%$
test_clone 6.6500ms 6.3953ms 156.3656 Ops/s 156.6784 Ops/s $\color{#d91a1a}-0.20\%$
test_squeeze 62.0930μs 9.6295μs 103.8475 KOps/s 107.5779 KOps/s $\color{#d91a1a}-3.47\%$
test_unsqueeze 0.1184ms 70.1767μs 14.2498 KOps/s 14.0573 KOps/s $\color{#35bf28}+1.37\%$
test_split 0.4207ms 0.1566ms 6.3852 KOps/s 6.3348 KOps/s $\color{#35bf28}+0.80\%$
test_permute 0.2086ms 0.1748ms 5.7219 KOps/s 5.5119 KOps/s $\color{#35bf28}+3.81\%$
test_stack 50.7552ms 50.4560ms 19.8193 Ops/s 20.0543 Ops/s $\color{#d91a1a}-1.17\%$
test_cat 50.9543ms 50.3177ms 19.8737 Ops/s 23.8375 Ops/s $\textbf{\color{#d91a1a}-16.63\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants