-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] auto-batch-size in dispatch #1109
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Nov 25, 2024
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Nov 25, 2024
vmoens
added a commit
that referenced
this pull request
Nov 25, 2024
ghstack-source-id: fe5f7f5b04d08d0eb150ee9bf0fd4698171d43d2 Pull Request resolved: #1109
vmoens
changed the title
[BugFix] auto-batch-size in dipatch
[BugFix] auto-batch-size in dispatch
Nov 25, 2024
vmoens
added a commit
that referenced
this pull request
Nov 25, 2024
ghstack-source-id: fe5f7f5b04d08d0eb150ee9bf0fd4698171d43d2 Pull Request resolved: #1109
vmoens
added a commit
that referenced
this pull request
Nov 25, 2024
ghstack-source-id: ca5b36195c28da65a20d42699346fbc06083181c Pull Request resolved: #1109
vmoens
added a commit
that referenced
this pull request
Nov 25, 2024
ghstack-source-id: ca5b36195c28da65a20d42699346fbc06083181c Pull Request resolved: #1109
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 41.6520μs | 10.3883μs | 96.2626 KOps/s | 92.1210 KOps/s | |
test_plain_set_stack_nested | 34.8920μs | 10.4739μs | 95.4757 KOps/s | 92.3599 KOps/s | |
test_plain_set_nested_inplace | 41.9120μs | 11.3885μs | 87.8083 KOps/s | 85.3981 KOps/s | |
test_plain_set_stack_nested_inplace | 37.3220μs | 11.3691μs | 87.9577 KOps/s | 85.5795 KOps/s | |
test_items | 37.2520μs | 2.9160μs | 342.9370 KOps/s | 342.0708 KOps/s | |
test_items_nested | 0.4079ms | 0.3216ms | 3.1091 KOps/s | 3.0974 KOps/s | |
test_items_nested_locked | 0.3676ms | 0.3248ms | 3.0791 KOps/s | 3.0754 KOps/s | |
test_items_nested_leaf | 0.1007ms | 58.2977μs | 17.1533 KOps/s | 17.0492 KOps/s | |
test_items_stack_nested | 0.3784ms | 0.3244ms | 3.0824 KOps/s | 3.0986 KOps/s | |
test_items_stack_nested_leaf | 88.4250μs | 60.0635μs | 16.6490 KOps/s | 16.7141 KOps/s | |
test_items_stack_nested_locked | 0.3838ms | 0.3231ms | 3.0948 KOps/s | 3.1009 KOps/s | |
test_keys | 35.3820μs | 3.5048μs | 285.3225 KOps/s | 288.5344 KOps/s | |
test_keys_nested | 0.1080ms | 70.7479μs | 14.1347 KOps/s | 14.1423 KOps/s | |
test_keys_nested_locked | 0.7852ms | 75.7746μs | 13.1970 KOps/s | 13.0538 KOps/s | |
test_keys_nested_leaf | 0.1028ms | 61.7607μs | 16.1915 KOps/s | 16.2281 KOps/s | |
test_keys_stack_nested | 0.1151ms | 71.0634μs | 14.0719 KOps/s | 14.1995 KOps/s | |
test_keys_stack_nested_leaf | 94.9050μs | 62.7813μs | 15.9283 KOps/s | 16.1101 KOps/s | |
test_keys_stack_nested_locked | 0.1150ms | 76.9137μs | 13.0016 KOps/s | 13.1017 KOps/s | |
test_values | 4.9185μs | 0.8468μs | 1.1809 MOps/s | 1.1793 MOps/s | |
test_values_nested | 64.1130μs | 31.3850μs | 31.8623 KOps/s | 32.0966 KOps/s | |
test_values_nested_locked | 72.4340μs | 32.8884μs | 30.4059 KOps/s | 30.4367 KOps/s | |
test_values_nested_leaf | 91.3350μs | 33.5721μs | 29.7866 KOps/s | 29.8642 KOps/s | |
test_values_stack_nested | 61.7340μs | 31.8036μs | 31.4429 KOps/s | 31.5442 KOps/s | |
test_values_stack_nested_leaf | 57.1730μs | 34.5041μs | 28.9821 KOps/s | 29.2624 KOps/s | |
test_values_stack_nested_locked | 65.8830μs | 33.5546μs | 29.8022 KOps/s | 30.0766 KOps/s | |
test_membership | 1.9121μs | 0.5078μs | 1.9692 MOps/s | 1.9871 MOps/s | |
test_membership_nested | 16.1310μs | 1.9032μs | 525.4302 KOps/s | 516.5602 KOps/s | |
test_membership_nested_leaf | 13.9960μs | 1.9335μs | 517.1840 KOps/s | 506.2981 KOps/s | |
test_membership_stacked_nested | 32.6420μs | 2.0123μs | 496.9414 KOps/s | 492.0776 KOps/s | |
test_membership_stacked_nested_leaf | 45.9220μs | 1.9384μs | 515.8920 KOps/s | 497.8925 KOps/s | |
test_membership_nested_last | 38.1520μs | 2.8460μs | 351.3699 KOps/s | 347.6808 KOps/s | |
test_membership_nested_leaf_last | 29.3810μs | 2.8678μs | 348.6979 KOps/s | 346.1113 KOps/s | |
test_membership_stacked_nested_last | 31.8110μs | 3.3054μs | 302.5343 KOps/s | 207.8609 KOps/s | |
test_membership_stacked_nested_leaf_last | 31.9810μs | 3.3061μs | 302.4668 KOps/s | 206.5862 KOps/s | |
test_nested_getleaf | 49.0120μs | 5.9844μs | 167.1009 KOps/s | 168.3840 KOps/s | |
test_nested_get | 33.6120μs | 5.7214μs | 174.7818 KOps/s | 175.5190 KOps/s | |
test_stacked_getleaf | 44.1820μs | 5.9654μs | 167.6327 KOps/s | 165.1576 KOps/s | |
test_stacked_get | 29.6110μs | 5.6885μs | 175.7930 KOps/s | 175.4637 KOps/s | |
test_nested_getitemleaf | 28.7520μs | 6.0727μs | 164.6714 KOps/s | 162.0122 KOps/s | |
test_nested_getitem | 27.5210μs | 5.7815μs | 172.9659 KOps/s | 171.3101 KOps/s | |
test_stacked_getitemleaf | 35.3620μs | 6.0783μs | 164.5192 KOps/s | 163.8056 KOps/s | |
test_stacked_getitem | 28.7420μs | 5.7421μs | 174.1523 KOps/s | 172.3546 KOps/s | |
test_lock_nested | 9.1912ms | 0.3724ms | 2.6851 KOps/s | 2.6829 KOps/s | |
test_lock_stack_nested | 0.3900ms | 0.3302ms | 3.0284 KOps/s | 2.9759 KOps/s | |
test_unlock_nested | 0.6501ms | 0.3069ms | 3.2579 KOps/s | 3.2323 KOps/s | |
test_unlock_stack_nested | 0.3097ms | 0.2727ms | 3.6673 KOps/s | 3.6307 KOps/s | |
test_flatten_speed | 97.0450μs | 72.5279μs | 13.7878 KOps/s | 13.4329 KOps/s | |
test_unflatten_speed | 0.3334ms | 0.2988ms | 3.3462 KOps/s | 3.3661 KOps/s | |
test_common_ops | 1.6569ms | 0.5680ms | 1.7605 KOps/s | 1.6690 KOps/s | |
test_creation | 96.9350μs | 1.4656μs | 682.2943 KOps/s | 699.3221 KOps/s | |
test_creation_empty | 28.4520μs | 6.9333μs | 144.2324 KOps/s | 130.7191 KOps/s | |
test_creation_nested_1 | 45.5720μs | 8.5054μs | 117.5729 KOps/s | 107.5781 KOps/s | |
test_creation_nested_2 | 29.9820μs | 11.1179μs | 89.9451 KOps/s | 85.0303 KOps/s | |
test_clone | 93.7540μs | 10.3722μs | 96.4118 KOps/s | 92.4463 KOps/s | |
test_getitem[int] | 1.7352ms | 10.4963μs | 95.2713 KOps/s | 92.7088 KOps/s | |
test_getitem[slice_int] | 0.1159ms | 20.0078μs | 49.9806 KOps/s | 48.5400 KOps/s | |
test_getitem[range] | 0.1599ms | 36.5183μs | 27.3835 KOps/s | 26.9104 KOps/s | |
test_getitem[tuple] | 0.1098ms | 17.7100μs | 56.4652 KOps/s | 54.8308 KOps/s | |
test_getitem[list] | 0.6004ms | 34.3817μs | 29.0852 KOps/s | 30.5075 KOps/s | |
test_setitem_dim[int] | 43.2120μs | 20.1826μs | 49.5477 KOps/s | 53.2414 KOps/s | |
test_setitem_dim[slice_int] | 90.2440μs | 40.6043μs | 24.6279 KOps/s | 26.7571 KOps/s | |
test_setitem_dim[range] | 0.1222ms | 56.7839μs | 17.6106 KOps/s | 18.7662 KOps/s | |
test_setitem_dim[tuple] | 82.0840μs | 33.8517μs | 29.5406 KOps/s | 31.7355 KOps/s | |
test_setitem | 83.5640μs | 15.4208μs | 64.8474 KOps/s | 63.9621 KOps/s | |
test_set | 87.3240μs | 14.8161μs | 67.4941 KOps/s | 65.5914 KOps/s | |
test_set_shared | 1.6246ms | 0.1452ms | 6.8852 KOps/s | 6.8037 KOps/s | |
test_update | 0.3599ms | 16.3002μs | 61.3491 KOps/s | 56.4068 KOps/s | |
test_update_nested | 86.2350μs | 21.4389μs | 46.6442 KOps/s | 44.3219 KOps/s | |
test_update__nested | 0.8083ms | 24.0806μs | 41.5272 KOps/s | 40.0244 KOps/s | |
test_set_nested | 76.9540μs | 15.0398μs | 66.4905 KOps/s | 60.9344 KOps/s | |
test_set_nested_new | 94.4950μs | 16.9898μs | 58.8590 KOps/s | 54.3965 KOps/s | |
test_select | 90.0940μs | 28.9199μs | 34.5782 KOps/s | 32.9701 KOps/s | |
test_select_nested | 84.0250μs | 41.6223μs | 24.0256 KOps/s | 23.5657 KOps/s | |
test_exclude_nested | 90.1250μs | 59.1752μs | 16.8990 KOps/s | 16.8304 KOps/s | |
test_empty[True] | 0.3024ms | 0.2573ms | 3.8870 KOps/s | 3.9096 KOps/s | |
test_empty[False] | 3.6942μs | 0.7396μs | 1.3520 MOps/s | 1.3448 MOps/s | |
test_to | 88.1450μs | 54.7129μs | 18.2772 KOps/s | 18.5069 KOps/s | |
test_to_nonblocking | 90.9140μs | 45.1148μs | 22.1657 KOps/s | 21.8980 KOps/s | |
test_unbind_speed | 1.7570ms | 0.2299ms | 4.3499 KOps/s | 4.2727 KOps/s | |
test_unbind_speed_stack0 | 0.2917ms | 0.2287ms | 4.3734 KOps/s | 4.2744 KOps/s | |
test_unbind_speed_stack1 | 91.4211ms | 0.6396ms | 1.5634 KOps/s | 1.5422 KOps/s | |
test_split | 93.9186ms | 1.5659ms | 638.5958 Ops/s | 622.3903 Ops/s | |
test_chunk | 93.7139ms | 1.5463ms | 646.7209 Ops/s | 626.2122 Ops/s | |
test_consolidate[False-None] | 95.6074ms | 2.8084ms | 356.0797 Ops/s | 349.4076 Ops/s | |
test_consolidate[default-None] | 2.0265ms | 1.6326ms | 612.5155 Ops/s | 589.7904 Ops/s | |
test_consolidate[reduce-overhead-None] | 2.0633ms | 1.6654ms | 600.4581 Ops/s | 578.4181 Ops/s | |
test_consolidate_njt[False-None] | 6.8771ms | 6.5066ms | 153.6906 Ops/s | 111.1635 Ops/s | |
test_to[False-False-None] | 2.0730ms | 1.6645ms | 600.7768 Ops/s | 598.4191 Ops/s | |
test_to[True-False-None] | 1.6442ms | 1.2540ms | 797.4741 Ops/s | 792.9677 Ops/s | |
test_to[within-False-None] | 4.3141ms | 3.9528ms | 252.9852 Ops/s | 249.2839 Ops/s | |
test_to[True-default-None] | 5.5071ms | 5.1059ms | 195.8534 Ops/s | 185.5636 Ops/s | |
test_to_njt[False-False-None] | 7.3496ms | 6.9370ms | 144.1540 Ops/s | 140.0808 Ops/s | |
test_to_njt[True-False-None] | 6.3057ms | 5.4943ms | 182.0079 Ops/s | 180.7839 Ops/s | |
test_to_njt[within-False-None] | 12.6000ms | 12.0881ms | 82.7260 Ops/s | 81.4417 Ops/s | |
test_creation[device0] | 0.4940ms | 78.7996μs | 12.6904 KOps/s | 12.6300 KOps/s | |
test_creation_from_tensor | 0.5164ms | 82.4832μs | 12.1237 KOps/s | 12.2688 KOps/s | |
test_add_one[memmap_tensor0] | 0.2666ms | 6.5760μs | 152.0675 KOps/s | 142.3964 KOps/s | |
test_contiguous[memmap_tensor0] | 19.3470μs | 0.3920μs | 2.5511 MOps/s | 2.3956 MOps/s | |
test_stack[memmap_tensor0] | 44.3220μs | 4.3320μs | 230.8420 KOps/s | 213.0591 KOps/s | |
test_memmaptd_index | 1.7632ms | 0.2475ms | 4.0411 KOps/s | 3.9677 KOps/s | |
test_memmaptd_index_astensor | 0.6712ms | 0.3037ms | 3.2930 KOps/s | 3.2234 KOps/s | |
test_memmaptd_index_op | 0.9739ms | 0.5620ms | 1.7794 KOps/s | 1.6937 KOps/s | |
test_serialize_model | 0.1308s | 0.1298s | 7.7022 Ops/s | 7.6943 Ops/s | |
test_serialize_model_pickle | 1.3468s | 1.2149s | 0.8231 Ops/s | 0.8113 Ops/s | |
test_serialize_weights | 0.1306s | 0.1291s | 7.7456 Ops/s | 7.7356 Ops/s | |
test_serialize_weights_returnearly | 0.2961s | 55.4824ms | 18.0237 Ops/s | 15.1530 Ops/s | |
test_serialize_weights_pickle | 1.4047s | 1.2038s | 0.8307 Ops/s | 0.8127 Ops/s | |
test_reshape_pytree | 55.6920μs | 22.5719μs | 44.3029 KOps/s | 45.3490 KOps/s | |
test_reshape_td | 0.4052ms | 27.8941μs | 35.8498 KOps/s | 37.5842 KOps/s | |
test_view_pytree | 68.2330μs | 21.9408μs | 45.5772 KOps/s | 45.5755 KOps/s | |
test_view_td | 0.1503ms | 29.5266μs | 33.8677 KOps/s | 32.5920 KOps/s | |
test_unbind_pytree | 84.5740μs | 27.6489μs | 36.1678 KOps/s | 35.5400 KOps/s | |
test_unbind_td | 0.5854ms | 34.8455μs | 28.6981 KOps/s | 27.5321 KOps/s | |
test_split_pytree | 72.2830μs | 29.5134μs | 33.8829 KOps/s | 33.6460 KOps/s | |
test_split_td | 0.7689ms | 37.7650μs | 26.4796 KOps/s | 25.8262 KOps/s | |
test_add_pytree | 69.5640μs | 34.2981μs | 29.1562 KOps/s | 28.6855 KOps/s | |
test_add_td | 78.5340μs | 44.4317μs | 22.5065 KOps/s | 21.2116 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.2364ms | 0.1185ms | 8.4418 KOps/s | 7.9761 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2176ms | 0.1236ms | 8.0881 KOps/s | 8.0509 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1728ms | 95.9050μs | 10.4270 KOps/s | 9.9900 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 1.3799ms | 0.1461ms | 6.8462 KOps/s | 6.5996 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 62.1330μs | 23.1799μs | 43.1409 KOps/s | 44.7382 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.3895ms | 26.2702μs | 38.0660 KOps/s | 37.3399 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.4444ms | 64.0826μs | 15.6048 KOps/s | 15.1943 KOps/s | |
test_compile_copy_nested[pytree-eager] | 87.0140μs | 49.6570μs | 20.1382 KOps/s | 19.8969 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2285ms | 0.1424ms | 7.0227 KOps/s | 7.0886 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.2981ms | 0.2063ms | 4.8479 KOps/s | 4.8922 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2468ms | 97.0813μs | 10.3006 KOps/s | 10.0749 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1779ms | 51.3157μs | 19.4872 KOps/s | 18.9075 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2814ms | 0.1359ms | 7.3596 KOps/s | 7.3744 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6159ms | 0.4681ms | 2.1364 KOps/s | 1.8865 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3621ms | 0.2460ms | 4.0657 KOps/s | 4.0701 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2529ms | 0.1434ms | 6.9723 KOps/s | 7.0714 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1444ms | 60.6676μs | 16.4833 KOps/s | 16.3224 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1592ms | 98.1524μs | 10.1882 KOps/s | 10.3731 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5728ms | 0.4046ms | 2.4713 KOps/s | 2.5209 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1862ms | 0.1359ms | 7.3570 KOps/s | 7.4326 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 71.5840μs | 19.5771μs | 51.0801 KOps/s | 53.7579 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 87.8340μs | 27.4783μs | 36.3923 KOps/s | 37.7465 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1118ms | 69.6306μs | 14.3615 KOps/s | 14.3353 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1594ms | 51.2070μs | 19.5286 KOps/s | 19.3939 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 1.6082ms | 0.3891ms | 2.5701 KOps/s | 2.1386 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.7019ms | 2.5636ms | 390.0778 Ops/s | 371.7801 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 1.5960ms | 0.4345ms | 2.3013 KOps/s | 2.2666 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 2.7757ms | 2.5872ms | 386.5157 Ops/s | 378.0587 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1536ms | 0.1117ms | 8.9490 KOps/s | 8.8309 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5618ms | 78.7741μs | 12.6945 KOps/s | 12.1089 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1930ms | 0.1114ms | 8.9732 KOps/s | 9.4243 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.4893ms | 68.5638μs | 14.5849 KOps/s | 14.7420 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.5031ms | 0.1108ms | 9.0219 KOps/s | 9.5687 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.4556ms | 69.5702μs | 14.3740 KOps/s | 14.9355 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1362ms | 0.1002ms | 9.9831 KOps/s | 9.6562 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.4234ms | 18.6077μs | 53.7411 KOps/s | 55.0598 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.4921ms | 98.4844μs | 10.1539 KOps/s | 10.1673 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.4017ms | 16.5786μs | 60.3189 KOps/s | 63.1165 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.5028ms | 98.6277μs | 10.1391 KOps/s | 9.9578 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 56.7830μs | 16.3186μs | 61.2798 KOps/s | 62.9969 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.5165ms | 0.1050ms | 9.5204 KOps/s | 9.7360 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5792ms | 18.0808μs | 55.3072 KOps/s | 56.8424 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.4953ms | 0.1013ms | 9.8760 KOps/s | 10.0660 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.4223ms | 16.2312μs | 61.6099 KOps/s | 63.7013 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.4889ms | 99.5374μs | 10.0465 KOps/s | 9.9563 KOps/s | |
test_compile_indexing[int-pytree-eager] | 0.3993ms | 15.9177μs | 62.8230 KOps/s | 64.0035 KOps/s | |
test_mod_add[eager] | 84.6940μs | 33.0938μs | 30.2172 KOps/s | 28.5809 KOps/s | |
test_mod_add[compile] | 0.4833ms | 79.1903μs | 12.6278 KOps/s | 12.3792 KOps/s | |
test_mod_add[compile-overhead] | 0.3225ms | 0.1653ms | 6.0512 KOps/s | 5.7853 KOps/s | |
test_mod_wrap[eager] | 0.6415ms | 0.2399ms | 4.1691 KOps/s | 4.0808 KOps/s | |
test_mod_wrap[compile] | 0.3556ms | 0.2952ms | 3.3880 KOps/s | 3.5094 KOps/s | |
test_mod_wrap[compile-overhead] | 7.2048ms | 3.7289ms | 268.1740 Ops/s | 262.2447 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.6142ms | 1.4743ms | 678.2945 Ops/s | 693.1764 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.4305ms | 1.3503ms | 740.5621 Ops/s | 730.9096 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.5262ms | 1.0305ms | 970.4423 Ops/s | 967.0572 Ops/s | |
test_seq_add[eager] | 0.2252ms | 98.1831μs | 10.1851 KOps/s | 9.4774 KOps/s | |
test_seq_add[compile] | 0.1499ms | 91.6416μs | 10.9121 KOps/s | 11.3923 KOps/s | |
test_seq_add[compile-overhead] | 0.2740ms | 0.1299ms | 7.6958 KOps/s | 7.7799 KOps/s | |
test_seq_wrap[eager] | 0.4727ms | 0.3987ms | 2.5080 KOps/s | 2.5615 KOps/s | |
test_seq_wrap[compile] | 0.5208ms | 0.3124ms | 3.2010 KOps/s | 3.3240 KOps/s | |
test_seq_wrap[compile-overhead] | 0.2918ms | 0.2271ms | 4.4032 KOps/s | 4.3284 KOps/s | |
test_func_call_runtime[False-eager] | 0.9157ms | 0.7628ms | 1.3110 KOps/s | 1.2466 KOps/s | |
test_func_call_runtime[False-compile] | 0.7749ms | 0.7337ms | 1.3629 KOps/s | 1.3278 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4424ms | 0.3569ms | 2.8018 KOps/s | 2.7472 KOps/s | |
test_func_call_runtime[True-eager] | 1.0446ms | 0.8819ms | 1.1339 KOps/s | 1.1072 KOps/s | |
test_func_call_runtime[True-compile] | 0.8699ms | 0.7541ms | 1.3261 KOps/s | 1.2963 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4558ms | 0.3783ms | 2.6433 KOps/s | 2.6013 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8139ms | 0.7199ms | 1.3890 KOps/s | 1.3543 KOps/s | |
test_func_call_cm_runtime[False-compile] | 1.1293ms | 0.7377ms | 1.3555 KOps/s | 1.3272 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4464ms | 0.3593ms | 2.7833 KOps/s | 2.7424 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.3994ms | 0.9860ms | 1.0142 KOps/s | 995.2619 Ops/s | |
test_func_call_cm_runtime[True-compile] | 1.1737ms | 0.7849ms | 1.2740 KOps/s | 1.2499 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.8084ms | 0.4063ms | 2.4610 KOps/s | 2.4397 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5742ms | 2.0606ms | 485.2909 Ops/s | 481.1133 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9474ms | 0.7919ms | 1.2628 KOps/s | 1.1964 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.4635ms | 0.4072ms | 2.4559 KOps/s | 2.4209 KOps/s | |
test_distributed | 1.7970ms | 0.1995ms | 5.0131 KOps/s | 8.8136 KOps/s | |
test_tdmodule | 0.3268ms | 15.3518μs | 65.1391 KOps/s | 62.7260 KOps/s | |
test_tdmodule_dispatch | 47.6520μs | 28.9899μs | 34.4947 KOps/s | 28.5413 KOps/s | |
test_tdseq | 38.1720μs | 15.0190μs | 66.5825 KOps/s | 63.6091 KOps/s | |
test_tdseq_dispatch | 57.9530μs | 31.0929μs | 32.1617 KOps/s | 30.2727 KOps/s | |
test_instantiation_functorch | 1.6672ms | 1.5566ms | 642.4258 Ops/s | 636.0971 Ops/s | |
test_exec_functorch | 0.2133ms | 0.1408ms | 7.1010 KOps/s | 6.7190 KOps/s | |
test_exec_functional_call | 0.1851ms | 0.1347ms | 7.4218 KOps/s | 6.9495 KOps/s | |
test_exec_td_decorator | 0.3758ms | 0.1812ms | 5.5173 KOps/s | 5.2392 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.7510ms | 0.6704ms | 1.4917 KOps/s | 1.4819 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8746ms | 0.6739ms | 1.4838 KOps/s | 1.4750 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7574ms | 0.5900ms | 1.6950 KOps/s | 1.6869 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7307ms | 0.5928ms | 1.6869 KOps/s | 1.6935 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.0358ms | 18.9365ms | 52.8081 Ops/s | 52.4536 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.6687ms | 19.0745ms | 52.4259 Ops/s | 52.3682 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 18.9837ms | 18.8532ms | 53.0414 Ops/s | 52.8819 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.4567ms | 18.8810ms | 52.9633 Ops/s | 52.7297 Ops/s | |
test_to_module_speed[True] | 1.0358ms | 0.9376ms | 1.0665 KOps/s | 1.0659 KOps/s | |
test_to_module_speed[False] | 1.3907ms | 0.9184ms | 1.0888 KOps/s | 1.0852 KOps/s | |
test_tc_init | 72.5940μs | 35.3663μs | 28.2755 KOps/s | 27.7717 KOps/s | |
test_tc_init_nested | 0.1116ms | 74.3781μs | 13.4448 KOps/s | 13.5339 KOps/s | |
test_tc_first_layer_tensor | 6.5161μs | 0.7027μs | 1.4230 MOps/s | 1.4228 MOps/s | |
test_tc_first_layer_nontensor | 26.3810μs | 2.3226μs | 430.5588 KOps/s | 430.3570 KOps/s | |
test_tc_second_layer_tensor | 8.7580μs | 1.4170μs | 705.7208 KOps/s | 688.2588 KOps/s | |
test_tc_second_layer_nontensor | 27.2210μs | 3.0458μs | 328.3177 KOps/s | 327.0112 KOps/s | |
test_unbind | 0.2225s | 9.8275ms | 101.7550 Ops/s | 151.3914 Ops/s | |
test_full_like | 10.1885ms | 9.1851ms | 108.8717 Ops/s | 108.9215 Ops/s | |
test_zeros_like | 9.1577ms | 7.1231ms | 140.3875 Ops/s | 115.4698 Ops/s | |
test_ones_like | 5.2000ms | 4.3285ms | 231.0251 Ops/s | 231.9148 Ops/s | |
test_clone | 6.6500ms | 6.3953ms | 156.3656 Ops/s | 156.6784 Ops/s | |
test_squeeze | 62.0930μs | 9.6295μs | 103.8475 KOps/s | 107.5779 KOps/s | |
test_unsqueeze | 0.1184ms | 70.1767μs | 14.2498 KOps/s | 14.0573 KOps/s | |
test_split | 0.4207ms | 0.1566ms | 6.3852 KOps/s | 6.3348 KOps/s | |
test_permute | 0.2086ms | 0.1748ms | 5.7219 KOps/s | 5.5119 KOps/s | |
test_stack | 50.7552ms | 50.4560ms | 19.8193 Ops/s | 20.0543 Ops/s | |
test_cat | 50.9543ms | 50.3177ms | 19.8737 Ops/s | 23.8375 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):