-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-facing API like vmap #117
Comments
|
What would be a good API for fusing matmuls and dots with the |
This allows I never thought to try, but kernels which take slices of larger arrays seem to work fine on the CPU. Somehow the linked DiffEqGPU.jl source is able to do this for GPU too, what's the trick? Edit: mcabbott/Tullio.jl#20 |
I posted my long form comment on discourse, but here are the parts related to this discussion:
|
KA kernels are all about changing what's inside of |
Something like this where "depth" refers to the depth in the call stack. Does KA do this? That's awesome! |
I believe depth here refers to LA op/kernel fusion and (where applicable) reordering/lifting of operations. |
I am off on vacation, so I won't partake in this discussion for the next
two weeks.
KA is build on top of Cassette so it can indeed change the entire
callgraph, bit I would argue that this is the wrong level of abstraction.
For fusing one might build a DSL that in the end lowers to
KernelAbstractions as an execution engine.
…On Thu, Jul 30, 2020, 18:15 Brian Chen ***@***.***> wrote:
I believe depth here refers to LA op/kernel fusion and (where applicable)
reordering/lifting of operations.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDO2UH6UR3C5FWA7VW7JDR6GMBVANCNFSM4PMBHS7Q>
.
|
Apologies for reviving this thread with yet more questions, but what would be the most appropriate place to define such a DSL (if indeed one exists at all)? In Python land one would likely pick up XLA or TVM, but such monolithic frameworks seem like a poor fit given that all of CUDA, GPUCompiler, KernelAbstractions and Dagger(GPU) exist. |
I think the design space is still quite open. Tullio.jl is something like that for fans of Einstein notation. I have my own playground where I explore infrastructure ideas. I might also be convinced that it is a value add for KA, but in general we have orthogonal packages in Julia. |
I wanted to write a short message that there is definitely user demand for some flavor of There are at least two reasons
I have not myself seen point 2 discussed much and would like to add that I believe there is great value here from the users' perspective, particularly for those who are either newer to the language, or aren't interested in getting into too many details. JAX' main difficulty from my perspective is a significant quantity of boilerplate mess the surrounding ecosystem generates (think repeating Haiku's |
@ChrisRackauckas suggests that this package provides much of the utilities that would make broadcasting over specified axes efficient. This can be seen in DiffEqGPU.jl.
Can we discuss a user facing API so we can directly compare against JAX vmap.
For instance if I have a function
How can I efficiently broadcast over collections of inputs stored in collections with axes like multidimensional arrays ("tensors").
Further, is it possible to provide these as defaults for something like
eachslice
so that broadcasting Just Works?The text was updated successfully, but these errors were encountered: