-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First batch of shortcuts to HLO ops #267
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Co-authored-by: Paul Berg <[email protected]>
…lehlo.broadcast_in_dim`, `stablehlo.dot_general`, `stablehlo.einsum`, `stablehlo.unary_einsum`
so CI apparently fails |
should be fixed now I've left some ops without testing due to bugs (mainly #196 but also some weird problem generating the IRCode for @wsmoses macOS-x64 is failing because it can't find symbol |
test_broken is fine if it doesn't crash, a commented or disabled test is also fine |
then I will leave it commented because some crash |
@wsmoses any idea of how to solve macOS-x86? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: c964ca3 | Previous: 4981557 | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1228899804 ns |
1257741783 ns |
0.98 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1228857056 ns |
1443950901 ns |
0.85 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1187142008 ns |
1385605290 ns |
0.86 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2744157498 ns |
2794781252 ns |
0.98 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
201183306 ns |
203499640 ns |
0.99 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
5148362884 ns |
5710482855 ns |
0.90 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5169143859 ns |
5341487680 ns |
0.97 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
5836423713 ns |
6062348395 ns |
0.96 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
7725273905 ns |
7286146462 ns |
1.06 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
31675920029 ns |
34742615164 ns |
0.91 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1183799363 ns |
1198386252 ns |
0.99 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1171359615 ns |
1183740364 ns |
0.99 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1190391617 ns |
1226242113 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3022041013 ns |
2824009338 ns |
1.07 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8589423 ns |
8581758.5 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1607601251 ns |
1659437123 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1609918717 ns |
1663747727 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1612382221 ns |
1575216589 ns |
1.02 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3613423758 ns |
3330508539 ns |
1.08 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
3155788085 ns |
2697236030 ns |
1.17 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1222408843 ns |
1177894801 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1213439326 ns |
1231369417 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1393287668 ns |
1281022614 ns |
1.09 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2928373979 ns |
2992466689 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
22794435.5 ns |
22743533 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2246173769 ns |
2143636724 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2279844491 ns |
2146473226 ns |
1.06 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2244477597 ns |
2157303648 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
4159307861 ns |
3959878673 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
5673307318 ns |
5865019185 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1219499596 ns |
1387842900 ns |
0.88 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1339647904 ns |
1256093379 ns |
1.07 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1394076872 ns |
1257804932 ns |
1.11 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3007274926 ns |
3119512525 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7170956.5 ns |
6970194 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1473810038 ns |
1427711032 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1468103895 ns |
1416276590 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1467963829 ns |
1497474638 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3352201866 ns |
3162263527 ns |
1.06 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1267514332 ns |
1224458504 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1218209141 ns |
1258743352 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1219848341 ns |
1272174301 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1250532234 ns |
1286613721 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3309073423 ns |
2941431002 ns |
1.12 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
12341004 ns |
12308639.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
1776840440 ns |
1728521529 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1769682086 ns |
1726871692 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
1746813993 ns |
1724333072 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
3650381123 ns |
3485347197 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
2876260488 ns |
3314985446 ns |
0.87 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1268709194 ns |
1296002633 ns |
0.98 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1239986983 ns |
1304207360 ns |
0.95 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1435508421 ns |
1255941581 ns |
1.14 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3146289052 ns |
3046234878 ns |
1.03 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
27442593.5 ns |
27396872 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
2242135877 ns |
2236205101 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2199696092 ns |
2198709936 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
2204608993 ns |
2296730805 ns |
0.96 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
4054284127 ns |
3958176139 ns |
1.02 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
5354219989 ns |
5608465822 ns |
0.95 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1254489328 ns |
1251122132 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1288445019 ns |
1342649088 ns |
0.96 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1311838413 ns |
1266853946 ns |
1.04 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3037480498 ns |
3134977417 ns |
0.97 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
53140835 ns |
53144411 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
3069243254 ns |
3046201375 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
3100569155 ns |
3063894685 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
3050601072 ns |
3087462133 ns |
0.99 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
4901907923 ns |
4895902746 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
13323425227 ns |
9797674342 ns |
1.36 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1411930117 ns |
1295665825 ns |
1.09 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1409847962 ns |
1436872653 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1355468017 ns |
1277008406 ns |
1.06 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3046720551 ns |
3239595052 ns |
0.94 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
71191954 ns |
71339137 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
3211022497 ns |
3189935441 ns |
1.01 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3283577977 ns |
3207736040 ns |
1.02 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
3205245566 ns |
3222417327 ns |
0.99 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
5118932238 ns |
5319487172 ns |
0.96 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
13742003124 ns |
12850280060 ns |
1.07 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1271142454 ns |
1274372415 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1279685346 ns |
1287517573 ns |
0.99 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1299608393 ns |
1311075350 ns |
0.99 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3102484563 ns |
3155538600 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
20876844 ns |
20891523 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1874409049 ns |
1909585662 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1872785183 ns |
1907821565 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1856532831 ns |
1918208063 ns |
0.97 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3589196661 ns |
3741288084 ns |
0.96 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
3120367142 ns |
3617112173 ns |
0.86 |
This comment was automatically generated by workflow using github-action-benchmark.
…ne` tests for macOS x86
function constant( | ||
x::DenseArray{T,N}; | ||
location=MLIR.IR.Location( | ||
"stablehlo.constant", MLIR.IR.Location(@__FILE__, @__LINE__, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still should refactor this into a location macro
macro location()
return MLIR.IR.Location(FILE, LINE)
end
which is then used everywhere (and can be overridden to be that more full back trace as mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I have sth in mind but will open another PR
rewriteTracedR*
methods