[Feature request] WebGPU support #73

loretoparisi · 2023-04-07T16:44:35Z

WebGPU

Chrome shipped WebGPU today in Chrome 113 Beta.

Reason for request

WebGPU is currently a WIP in Firefox and Safari, in addition to the beta Chrome. Also TensorflowJS is supporting WebGPU already in several operators.

Additional context
It's worth to note the Google's project Dawn, a C++ native WebGPU implementation will support NodeJS soon. WIP of Node bindings here.

xenova · 2023-04-07T18:55:58Z

Thanks for the resources :) For the most part, we are waiting for onnxruntime-web to add webgpu as a supported backend.

Here is the associated PR to track its progress:

[js/web] WebGPU backend via JSEP microsoft/onnxruntime#14579

However, we do plan to support other model formats/backends (in a similar way to how the python library supports PyTorch, tensorflow and ONNX). I don't want to spoil anything... but things are in the work 😉

gabrielgrant · 2023-10-27T06:23:15Z

AFAIU ORT has merged WebGPU support: microsoft/onnxruntime#11695

What's needed to take advantage of this on the transformers.js side?

sroussey · 2023-12-22T23:01:07Z

For reference, the webgpu operators implemented:

https://github.com/microsoft/onnxruntime/blob/main/js/web/docs/webgpu-operators.md

gabrielgrant · 2023-12-23T14:35:12Z

Unfortunately the WebGPU implementation is currently slower than the WASM version, though: microsoft/onnxruntime#18754 (comment)

Would be great to know what's needed to support WebGPU in transformers.js assuming that perf issue gets resolved at some point, but not super urgent/important at the moment

DavidGOrtega · 2024-01-16T10:22:56Z

Unfortunately the WebGPU implementation is currently slower than the WASM version,

I have some models running in jsep webGPU and are 10 times faster than wasm. I.E. clip

To me, the main problem is the current backend design: It's global (as far as I know). We should be able to setup the preferred backend to our model.

gabrielgrant · 2024-01-16T16:28:05Z

@DavidGOrtega that's great news! to be clear, are you running your models directly on ORT? or using JSEP through transformers.js somehow? would love to hear more details about exactly what your setup looks like, and which other models you've found this perf improvement on!

DavidGOrtega · 2024-01-16T17:54:27Z

Im running them with vanilla onnx.

I can do a PR to support WebGPU here (I did node), its trivial. However I think that we should rethink the backend a bit to be more flexible and be able to choose the backend and options per model. Also the onnx fallback is not perfect i.e. I have models that despite the session can be loaded the infer do not work, thats a step after the onnx fallback...

@xenova can also do webgpu and its testing it among other backends like candle. Probably not done yet just because not all the models supports wgpu?

luweigen · 2024-02-05T08:14:45Z

Unfortunately the WebGPU implementation is currently slower than the WASM version,

I have some models running in jsep webGPU and are 10 times faster than wasm. I.E. clip

To me, the main problem is the current backend design: It's global (as far as I know). We should be able to setup the preferred backend to our model.

@DavidGOrtega What model can you run?

I tried some BERT and got "cannot resolve operator 'Erf' with opsets: ai.onnx v11" with direct call of https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.min.js and the cache of model weights by transformers.js 2.14.2.

luweigen · 2024-02-05T08:30:28Z

I also tried the v3 branch of transformers.js and got a syntax error. It seems that the commit 66da130 was overwritten by 8c465a9. A simple fix as follows leads to other errors. It seems still a long way to go?

xenova · 2024-02-05T08:48:26Z

@luweigen the v3 branch is still a work-in-progress, and will be marked as non-draft when reading for testing 👍

beaufortfrancois · 2024-02-15T14:30:21Z

@xenova May you share with us what is now blocking transformers.js to take advantage of WebGPU?
I think we're pretty much all excited to be able to try it and compare performances (CPU vs GPU). Thank you! ❤️

luweigen · 2024-02-19T09:32:42Z

@xenova May you share with us what is now blocking transformers.js to take advantage of WebGPU?
I think we're pretty much all excited to be able to try it and compare performances (CPU vs GPU). Thank you! ❤️

I wrote a blog to remix transformers.js and onnxruntime webgpu https://medium.com/@GenerationAI/transformers-js-onnx-runtime-webgpu-46c3e58d547c
and a little bit of comparison of CPU vs GPU https://medium.com/@GenerationAI/performance-of-onnxruntime-webgpu-44a25d9897a9
some functions are adapted from transformers.js to make it work as mentioned in the code comments.

loretoparisi · 2024-02-20T23:09:19Z

@luweigen thanks for this post. The cpu to WebGPU comparison is fair, but not all the results are obvious. In the tests you declare:

Execution time: 6169.100000ms
Batch Execution time: 23191.899999ms

WebGPU Execution time: 20445.0999994ms
WegGPU Batch Execution time: 2231 ms

hence for processing a batch of size ~ 100 you get a
cpu/webgpu ratio of ~10x i.e. a clear WebGPU speedup.

But when the inference is just one sequence you
have the cpu/webgpu ~=0.3 i.e. this results to a ~3.3x of cpu over WebGPU so it seems that offloading to the GPU it's not that efficient with a batch size = 1. So according to your tests with MiniLM, when WebGPU becomes useful, in other words for which batch size the ratio cpu/webgpu is > 0?

luweigen · 2024-02-21T15:32:43Z

@luweigen thanks for this post. The cpu to WebGPU comparison is fair, but not all the results are obvious. In the tests you declare:
Execution time: 6169.100000ms
Batch Execution time: 23191.899999ms

WebGPU Execution time: 20445.0999994ms
WegGPU Batch Execution time: 2231 ms
hence for processing a batch of size ~ 100 you get a cpu/webgpu ratio of ~10x i.e. a clear WebGPU speedup.

But when the inference is just one sequence you have the cpu/webgpu ~=0.3 i.e. this results to a ~3.3x of cpu over WebGPU so it seems that offloading to the GPU it's not that efficient with a batch size = 1. So according to your tests with MiniLM, when WebGPU becomes useful, in other words for which batch size the ratio cpu/webgpu is > 0?

all-MiniLM-L6-v2 is very small. CPU can handle well enough if batch-size is also small.
I guess in larger model we can see the advantage of GPU in small batch-size too.
This was a very preliminary version of the code therefore not shared in GitHub yet, but will be, with more test results on other models and hyperparameters.
I/O binding to GPU is not implemented yet but overall improvement won't be very much, i guess.

josephrocca · 2024-05-11T13:14:02Z

But when the inference is just one sequence you have the cpu/webgpu ~=0.3 i.e. this results to a ~3.3x of cpu over WebGPU so it seems that offloading to the GPU it's not that efficient with a batch size = 1.

FWIW, even with batch size = 1, I get a 5x speedup for the WebGPU backend on bge-base-en-v1.5 according to Xenova's excellent webgpu-embedding-benchmark. Note that this model is 109M params - i.e. about 5x larger than all-MiniLM-L6-v2, but it can still embed a couple of passages per second on my Android phone even with the Wasm backend, and is "only" ~100mb 8-bit quantized (fine for my use case).

5x is certainly worth it for me! Really looking forward to the WebGPU backend stabilizing (and hoping Chrome team gets Linux WebGPU sorted soon 🤞 - also, looks like Safari isn't tooo far away from a decent/stable WebGPU release, surprisingly).

loretoparisi added the enhancement New feature or request label Apr 7, 2023

jozefchutka mentioned this issue Jul 7, 2023

WebGPU Support #20

Closed

felladrin mentioned this issue Nov 4, 2023

GPU Acceleration to increase performance #377

Closed

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova linked a pull request Jan 27, 2024 that will close this issue

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova closed this as completed in #545 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] WebGPU support #73

[Feature request] WebGPU support #73

loretoparisi commented Apr 7, 2023 •

edited

Loading

xenova commented Apr 7, 2023

gabrielgrant commented Oct 27, 2023

sroussey commented Dec 22, 2023

gabrielgrant commented Dec 23, 2023

DavidGOrtega commented Jan 16, 2024

gabrielgrant commented Jan 16, 2024

DavidGOrtega commented Jan 16, 2024 •

edited

Loading

luweigen commented Feb 5, 2024 •

edited

Loading

luweigen commented Feb 5, 2024

xenova commented Feb 5, 2024

beaufortfrancois commented Feb 15, 2024

luweigen commented Feb 19, 2024 •

edited

Loading

loretoparisi commented Feb 20, 2024

luweigen commented Feb 21, 2024 •

edited

Loading

josephrocca commented May 11, 2024 •

edited

Loading

[Feature request] WebGPU support #73

[Feature request] WebGPU support #73

Comments

loretoparisi commented Apr 7, 2023 • edited Loading

xenova commented Apr 7, 2023

gabrielgrant commented Oct 27, 2023

sroussey commented Dec 22, 2023

gabrielgrant commented Dec 23, 2023

DavidGOrtega commented Jan 16, 2024

gabrielgrant commented Jan 16, 2024

DavidGOrtega commented Jan 16, 2024 • edited Loading

luweigen commented Feb 5, 2024 • edited Loading

luweigen commented Feb 5, 2024

xenova commented Feb 5, 2024

beaufortfrancois commented Feb 15, 2024

luweigen commented Feb 19, 2024 • edited Loading

loretoparisi commented Feb 20, 2024

luweigen commented Feb 21, 2024 • edited Loading

josephrocca commented May 11, 2024 • edited Loading

loretoparisi commented Apr 7, 2023 •

edited

Loading

DavidGOrtega commented Jan 16, 2024 •

edited

Loading

luweigen commented Feb 5, 2024 •

edited

Loading

luweigen commented Feb 19, 2024 •

edited

Loading

luweigen commented Feb 21, 2024 •

edited

Loading

josephrocca commented May 11, 2024 •

edited

Loading