-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IFRT #164
base: main
Are you sure you want to change the base?
Support IFRT #164
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: 7a96406 | Previous: deefd18 | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1301347763 ns |
1354313354 ns |
0.96 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
217103056 ns |
208305734 ns |
1.04 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5365149260 ns |
5150619393 ns |
1.04 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
22423672030 ns |
19034043181 ns |
1.18 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1257814655 ns |
1337880973 ns |
0.94 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8327637 ns |
9140912.5 ns |
0.91 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1634348478 ns |
1693455123 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
2178014879 ns |
2263974104 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1257786527 ns |
1332284786.5 ns |
0.94 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
90765896 ns |
86734383.5 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2171489336 ns |
2287781106 ns |
0.95 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
4967371369 ns |
6118937863 ns |
0.81 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1329525357.5 ns |
1274319613 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7794196 ns |
7577830 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1474849210 ns |
1512493678 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1724297638 ns |
1456174515 ns |
1.18 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1306781721.5 ns |
1263335461 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
11562433 ns |
11439682 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1768896684 ns |
1805020552 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
2646587434.5 ns |
2506386037 ns |
1.06 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1309249187 ns |
1307096998 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
88858488 ns |
87573137 ns |
1.01 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2229256097 ns |
2278669956 ns |
0.98 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
3817263196 ns |
3700585497 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1307760748 ns |
1310298018 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
118030839 ns |
109805738 ns |
1.07 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
3029932924 ns |
3158445388 ns |
0.96 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
9978512015 ns |
14341152106 ns |
0.70 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1356889531 ns |
1384173668 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
122479512.5 ns |
125016343 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3198347884 ns |
3254191833 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
7412499032 ns |
6513914428 ns |
1.14 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1335763412 ns |
1339992292 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
88481951 ns |
81713817 ns |
1.08 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1893395827 ns |
1963454757 ns |
0.96 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
2734552651 ns |
2503177627 ns |
1.09 |
This comment was automatically generated by workflow using github-action-benchmark.
i added some utilities for Julia-like conversion in C++ and better communication between Julia and C.
for example, imagine a function that returns a extern "C" span<Device*> the_function(...) {
return convert(Type<span<Device*>>(), some_func_that_returns_a_absl_Span(...))
} conceptually works, in practice a need to add some particular implementations so it works (i'm sure it can be done more generically but i don't have the time nor the energy to learn the terribly complicated C++) |
speeds up compile time
This time for real.
Sharding::devices
devices
argument toifrt_sharding_with_device_assignment
OpaqueSharding
ConcreteSharding
ConcreteEvenSharding
ShardingParamsSharding
DeserializeShardingOptions
xla::Topology::GetDefaultLayout
andxla::Topology::Attributes
(and then Julia API)descriptions
(ifrt_topology_device_descriptions
)on_done_with_host_buffer
argument toifrt_client_make_array_from_host_buffer
RemapArrays
,Attributes
,GetAllDevices
,GetDefaultDeviceAssignment
methodifrt_loadedexecutable_parameter_shardings
ifrt_loadedexecutable_output_shardings
ifrt_loadedexecutable_parameter_layouts
ifrt_loadedexecutable_output_layouts
GetOutputMemoryKinds
GetCostAnalysis
Execute
DeserializeExecutableOptions
andCompileOptions
are really deprecatedissues
xla::PjRtFuture<>
to a pointer so we can pass it through the C-API? or can we wrap it around a opaque block?