-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] WebGPU support #73
Comments
Thanks for the resources :) For the most part, we are waiting for onnxruntime-web to add webgpu as a supported backend. Here is the associated PR to track its progress: However, we do plan to support other model formats/backends (in a similar way to how the python library supports PyTorch, tensorflow and ONNX). I don't want to spoil anything... but things are in the work 😉 |
AFAIU ORT has merged WebGPU support: microsoft/onnxruntime#11695 What's needed to take advantage of this on the transformers.js side? |
For reference, the webgpu operators implemented: https://github.com/microsoft/onnxruntime/blob/main/js/web/docs/webgpu-operators.md |
Unfortunately the WebGPU implementation is currently slower than the WASM version, though: microsoft/onnxruntime#18754 (comment) Would be great to know what's needed to support WebGPU in transformers.js assuming that perf issue gets resolved at some point, but not super urgent/important at the moment |
I have some models running in jsep webGPU and are 10 times faster than wasm. I.E. clip To me, the main problem is the current backend design: It's global (as far as I know). We should be able to setup the preferred backend to our model. |
@DavidGOrtega that's great news! to be clear, are you running your models directly on ORT? or using JSEP through transformers.js somehow? would love to hear more details about exactly what your setup looks like, and which other models you've found this perf improvement on! |
Im running them with vanilla onnx. I can do a PR to support WebGPU here (I did node), its trivial. However I think that we should rethink the backend a bit to be more flexible and be able to choose the backend and options per model. Also the onnx fallback is not perfect i.e. I have models that despite the session can be loaded the infer do not work, thats a step after the onnx fallback... @xenova can also do webgpu and its testing it among other backends like candle. Probably not done yet just because not all the models supports wgpu? |
@DavidGOrtega What model can you run? I tried some BERT and got "cannot resolve operator 'Erf' with opsets: ai.onnx v11" with direct call of https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.min.js and the cache of model weights by transformers.js 2.14.2. |
@luweigen the v3 branch is still a work-in-progress, and will be marked as non-draft when reading for testing 👍 |
@xenova May you share with us what is now blocking transformers.js to take advantage of WebGPU? |
I wrote a blog to remix transformers.js and onnxruntime webgpu https://medium.com/@GenerationAI/transformers-js-onnx-runtime-webgpu-46c3e58d547c |
@luweigen thanks for this post. The cpu to WebGPU comparison is fair, but not all the results are obvious. In the tests you declare: Execution time: 6169.100000ms
Batch Execution time: 23191.899999ms
WebGPU Execution time: 20445.0999994ms
WegGPU Batch Execution time: 2231 ms hence for processing a batch of size ~ 100 you get a But when the inference is just one sequence you |
all-MiniLM-L6-v2 is very small. CPU can handle well enough if batch-size is also small. |
FWIW, even with batch size = 1, I get a 5x speedup for the WebGPU backend on 5x is certainly worth it for me! Really looking forward to the WebGPU backend stabilizing (and hoping Chrome team gets Linux WebGPU sorted soon 🤞 - also, looks like Safari isn't tooo far away from a decent/stable WebGPU release, surprisingly). |
WebGPU
Chrome shipped WebGPU today in Chrome 113 Beta.
Reason for request
WebGPU is currently a WIP in Firefox and Safari, in addition to the beta Chrome. Also TensorflowJS is supporting WebGPU already in several operators.
Additional context
It's worth to note the Google's project Dawn, a C++ native WebGPU implementation will support NodeJS soon. WIP of Node bindings here.
The text was updated successfully, but these errors were encountered: