-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional use of GPU to offload computation #806
Comments
I think it's a fantastic idea. Do you think there's any change we should/could make to Gnocchi to leverage even more that? |
This is an awesome idea. |
Thanks for the support. I think writting a cuda kernel can be pretty challenging, and in case of gnocchi the kernel Since we were relying on numpy's intelligence in case of absence of data, to deal I did this test on a very basic combination of input form the benchmark, the next task is |
@sjamgade That sounds great. Feel free to ping us if you need hint or help. Also, don't hesitate to send small and early patch to get feedback and so we can have an understanding of what's going on. :) |
I've read your blog post and it's great, good job @sjamgade. From what I've understood, you'd need larger span of points to be more efficient? I'm not sure how this can be achieved TBH. If you have suggestions, feel free to write about them. Other than that, that looks like an interesting feature we'd be happy to merge when it's ready! |
i'm curious, in your initial implementation, are there any significant changes to the workflow or data model required to make Gnocchi work with GPU? it'd be interesting to see if maybe there's another design we can take at Gnocchi. |
I have a vague idea of the kind of changes Gnocchi can take which could ease out the offloading. But I like to imagine that as a last option. I will keep working on other parts of the library in the repo and possibly post results with full scale testing. It will promising to have it tested in a more production like deployment, but currently I am not aware of any probable opportunities. In other news, I have published the part 2 and final part of my experiment which are also available on the wiki |
What:
I wanted to have some discussion around a possible way to have gnocchi
offload its processing to GPGPU.
Why:
This would increase performance drastically and also increase
the volume of data that could be processed in one go.
How:
The type of computation currently done by gnocchi are fairly simple
and straight forward, for exmaple the sum function in carborana library:
lets take an exmple:
Computation done in the benchmark functionof the AggregateTimeSerie Class:
(the example uses the pycuda library and was tested on nvidia Quadro K620)
with a kernel like shown above, one could easily beat the benchmark by a factor of 10 atleast.
Here I had to limit the computation, as my GPU was not really designed to be GPGPU.
the parameters of grid and block are hardcoded in the example however they could be
easily calculated based on the
SplitKey.POINTS_PER_SPLIT
.These values control the amount of parallel work to be done, and are a deterministic
function of
(GPU-compute-capability, SplitKey.POINTS_PER_SPLIT, resampleFactor)
.The values of
6
passed to the kernel call, is just ResampleFactor:In the exmple
changing the granularity input data from 5s to 35
35/5 = 7
combine 7 values to produce 1 value.
Since all the values can be easily calculated before launching the computation
on GPU, this "algorithm" can be easily incorporated into library as a subclass.
And the problem of being dependent on the pycuda(cuda) in general:
there are libraries which help us abstract out the hardware and the propitiatory
blob as it is an already solved problem in the area machine learning and such.
Any thoughts ?
Which version of Gnocchi are you using
master git hash: 816fd83
The text was updated successfully, but these errors were encountered: