[js/web] WebGPU backend via JSEP #14579

fs-eire · 2023-02-04T01:46:17Z

Description

This change introduced the following new components into ONNX Runtime Web:

JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
WebGPU backend implemented in TypeScript
- initial implementation of kernels:
  - elementwise operators (22)
  - binary operators (5)
  - tensor: Shape, Reshape, Transpose, Gemm
  - nn: Conv, {Global}Maxpool, {Global}AveragePool

Code need to be polished. still working on it.

Q&A

What is JSEP?

JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model.

Why JSEP?

JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP:

the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation.

the requirement of asynchronized execution from JavaScript API (eg. buffer.mapAsync()) makes it impossible to run OrtRun() in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify.

What is WebGPU?

WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL).
WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available.

What is the async problem and why we have the problem?

The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code:
// C-style declarations (API)
typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);

// implementation
DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
  // how to implement?
}
The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible.

WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As OrtRun() will eventually call into DataTransfer for copy data from GPU to CPU, and OrtRun() is a synchronized function, this cannot be done in normal way.

What is Emscripten? How is the Asyncify feature resolved the problem?

Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers.

Asyncify is a compiler feature that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside OrtRun() call.

Design Overview

Inter-op

JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js:

// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
    Module.jsepBackend = backend;
    Module.jsepAlloc = alloc;
    Module.jsepFree = free;
    Module.jsepCopy = copy;
    Module.jsepCopyAsync = copyAsync;
    Module.jsepCreateKernel = createKernel;
    Module.jsepReleaseKernel = releaseKernel;
    Module.jsepRun = run;
};

This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime:

jsepBackend: assign the singleton object to webassembly module
jsepAlloc and jsepFree: implementation of data transfer's Alloc() and Free()
jsepCopy: synchronized copy ( GPU to GPU, CPU to GPU)
jsepCopyAsync: asynchronized copy ( GPU to CPU)
jsepCreateKernel and jsepReleaseKernel: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT
jsepRun: OpKernel::Compute() should call into this

The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript.

Resource Management

Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly.

For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes.

about data transfer
js::DataTransfer::CopyTensor implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro EM_ASYNC_JS is used to wrap the async function to be called in the synchronized context.

run kernel in JS

Kernel class constructor calls once jsepCreateKernel() with an optional per-kernel specific serialization to pass attributes into JavaScript.

Compute() are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro EM_ASM_*.

disabled features
memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it.

prefer channels last
JSEP prefers channels last and returns DataLayout::NHWC in method GetPreferredLayout(). This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used.

Testing code
It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API.

commit 340c88b Author: Yulong Wang <[email protected]> Date: Thu Sep 8 13:40:31 2022 -0700 batch mode commit b160840 Author: Yulong Wang <[email protected]> Date: Tue Jul 26 17:00:39 2022 -0700 sum commit 306a19b Author: Yulong Wang <[email protected]> Date: Mon Jul 25 19:04:48 2022 -0700 squeeze + transpose commit 86d8d3a Author: Yulong Wang <[email protected]> Date: Mon Jul 18 16:31:59 2022 -0700 fix webgpu test launch commit e104d17 Author: Yulong Wang <[email protected]> Date: Tue Jul 12 16:52:54 2022 -0700 shape commit a2197f0 Author: Yulong Wang <[email protected]> Date: Tue Jul 12 13:49:15 2022 -0700 pool commit 59b10fb Author: Yulong Wang <[email protected]> Date: Thu Jul 7 17:32:56 2022 -0700 upgrade to latest webgpu spec commit 4ed1bfb Author: Yulong Wang <[email protected]> Date: Tue Jun 28 14:23:08 2022 -0700 naive conv commit 7c5e446 Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:37:12 2022 -0700 check webgpu backend in execution loop commit b0d7dfa Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:31:19 2022 -0700 dump shader source only in debug mode commit 7fca0ea Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:17:27 2022 -0700 add verbose log for buffer upload/download commit 179712b Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:06:03 2022 -0700 fix program key commit 67ea4cb Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:05:20 2022 -0700 concat: fix 1 input commit 21b5dfe Author: Yulong Wang <[email protected]> Date: Tue Jun 7 16:13:12 2022 -0700 matmul (no-broadcast) commit a8def8e Author: Yulong Wang <[email protected]> Date: Thu Jun 2 17:56:15 2022 -0700 ... commit e871138 Author: Yulong Wang <[email protected]> Date: Fri May 27 16:12:56 2022 -0700 slice (scalar) commit 75c7941 Author: Yulong Wang <[email protected]> Date: Thu May 26 16:54:53 2022 -0700 slice (...) commit 40b15e4 Author: Yulong Wang <[email protected]> Date: Thu May 26 12:45:16 2022 -0700 slice commit 9d92513 Author: Yulong Wang <[email protected]> Date: Wed May 25 22:37:48 2022 -0700 gemm (scalar) commit c1185b4 Author: Yulong Wang <[email protected]> Date: Tue May 24 16:54:43 2022 -0700 gemm... commit 99653f5 Author: Yulong Wang <[email protected]> Date: Tue May 24 16:54:20 2022 -0700 format code commit 86c75bb Author: Yulong Wang <[email protected]> Date: Tue May 24 11:39:35 2022 -0700 gemm commit 79dd539 Author: Yulong Wang <[email protected]> Date: Fri Apr 8 04:46:03 2022 -0700 concat commit 25c9d2a Author: Yulong Wang <[email protected]> Date: Thu Apr 7 19:32:48 2022 -0700 gather commit 6627349 Author: Yulong Wang <[email protected]> Date: Thu Apr 7 18:46:53 2022 -0700 binary ops commit fb81d7f Author: Yulong Wang <[email protected]> Date: Wed Apr 6 17:55:07 2022 -0700 binary - add commit 073695f Author: Yulong Wang <[email protected]> Date: Wed Apr 6 17:54:24 2022 -0700 optimize types commit e9775fe Author: Yulong Wang <[email protected]> Date: Tue Apr 5 16:45:27 2022 -0700 working commit cba119c Author: Yulong Wang <[email protected]> Date: Tue Apr 5 15:10:26 2022 -0700 upgrade @webgpu/[email protected] commit ed17c57 Author: Yulong Wang <[email protected]> Date: Tue Apr 5 03:37:29 2022 -0700 neg commit e8e4d88 Author: Yulong Wang <[email protected]> Date: Mon Apr 4 16:28:52 2022 -0700 other f32 unary operators commit a1fbcfd Author: Yulong Wang <[email protected]> Date: Fri Apr 1 17:24:10 2022 -0700 leaky relu commit dbe57fe Author: Yulong Wang <[email protected]> Date: Fri Apr 1 17:09:27 2022 -0700 exp, floor commit 3b883b9 Author: Yulong Wang <[email protected]> Date: Fri Apr 1 16:43:15 2022 -0700 elu commit aac2fc6 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 20:30:54 2022 -0700 always create storage buffer with 16 bytes alignment commit ad6bd01 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 20:30:07 2022 -0700 fix unary funcs async signature commit a782667 Author: Yulong Wang <[email protected]> Date: Wed Mar 23 19:57:38 2022 -0700 fix upload commit b6e7fba Author: Yulong Wang <[email protected]> Date: Wed Mar 23 15:36:58 2022 -0700 reshape commit dfbf6f3 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 16:11:31 2022 -0700 clip and ceil commit 55af08e Author: Yulong Wang <[email protected]> Date: Thu Mar 24 15:57:58 2022 -0700 fix clip commit 41274ba Author: Yulong Wang <[email protected]> Date: Thu Mar 24 14:58:23 2022 -0700 try more unary ops commit fe850d1 Author: Yulong Wang <[email protected]> Date: Mon Mar 14 16:15:58 2022 -0700 first operator (correctness validated) commit ba09337 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 17:50:56 2022 -0800 enable initialization of webgpu commit 3fb2712 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 17:50:24 2022 -0800 install webgpu typescript type declaration commit ed35262 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 14:53:50 2022 -0800 [POC] __blank ( npm test -- -b=webgpu )

### Description This PR resolves a part of non-critical comments from code review comments in #14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes

### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (*ON_COMPLETE)(PVOID state, DATA *data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview **Inter-op** JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. **Resource Management** Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. **about data transfer** `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. **run kernel in JS** Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_*`. **disabled features** memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. **prefer channels last** JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. **Testing code** It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <[email protected]>

### Description This PR resolves a part of non-critical comments from code review comments in microsoft#14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes

redthing1 · 2023-07-26T20:19:13Z

Can this be used to execute models with WebGPU on desktop?

fs-eire · 2023-07-26T22:22:15Z

Can this be used to execute models with WebGPU on desktop?

not now, but probably can do via dawn nodejs binding in future

guschmue · 2023-07-27T17:46:36Z

to set expectations: I don't think there will be webgpu support for native desktop apps anytime soon.
electron apps might work.

loretoparisi · 2024-02-13T12:19:35Z

Can this be used to execute models with WebGPU on desktop?

not now, but probably can do via dawn nodejs binding in future

Hopefully something it is moving on the Google's DART side of the moon
https://dawn.googlesource.com/dawn/+/refs/heads/main/src/dawn/node/

beaufortfrancois · 2024-03-11T12:08:41Z

js/web/lib/wasm/jsep/backend-webgpu.ts

+      throw new Error('WebGpuBackend: WebGPU is not available.');
+    }
+
+    const adapter = await navigator.gpu.requestAdapter();


May I recommend passing powerPreference when requesting a GPU adapter so that developers can request which type of GPU they're looking for.

In xenova/transformers.js#545 for instance, it would be preferable to test the "high-performance" GPU.

Suggested change

const adapter = await navigator.gpu.requestAdapter();

const adapter = await navigator.gpu.requestAdapter({

powerPreference: 'high-performance'

});

That would be great! 🔥 It will also be useful if we can get the selected adapter, without having to re-request an adapter. cc @guschmue

Thanks for the suggestion. Will think about this.

#19857 is created to address this. Please take a look

Per my understanding on some OSes, like Windows, if we have multi-GPU and integrated GPU is the one chosen by Chrome during startup, simply set powerPreference to high-performance will not force WebGPU to utilize the discrete GPU.

That's currently true on Windows. See https://source.chromium.org/chromium/chromium/src/+/main:gpu/command_buffer/service/webgpu_decoder_impl.cc;drc=5bc3326c91b582cbec543bc9896201a1c56bdebd;l=1621
On macOS, this is not the case.

FYI for Windows, here's the bug: https://issues.chromium.org/issues/329211593

@xenova

### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois

@xenova

### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois

@xenova

### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois

### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (*ON_COMPLETE)(PVOID state, DATA *data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview **Inter-op** JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. **Resource Management** Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. **about data transfer** `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. **run kernel in JS** Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_*`. **disabled features** memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. **prefer channels last** JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. **Testing code** It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <[email protected]>

### Description This PR resolves a part of non-critical comments from code review comments in microsoft#14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes

fs-eire added 30 commits February 3, 2023 16:57

enable JSEP (draft)

3f76000

working

93e6d0a

1

af6bad6

2

41050d5

3

083d430

4

65cc09b

5

f6cd92c

6

e520023

7

2ec6178

8

c087b47

9

4697af7

10

e5dc7f7

11

9e81a04

12

c9b36ea

13

f75ffeb

w

7e3a412

w2

764715b

w3

81c0c04

fix build

7cdbb71

use larger asyncify stack

19f3a45

support temp buffer

1c45f34

fix build

193bf99

fix copy tensor

8ee4c5d

fix temp allocation cleanup

1a70d40

less output to console

cfecdce

remove unused internal tensor def

39a37b6

add reshape impl

2fd03b1

remove 'HANDLE' memory type

bdb55f6

transpose

1c28286

fs-eire added 2 commits April 18, 2023 16:00

Merge remote-tracking branch 'origin/main' into fs-eire/js-ep-pr

eaa818d

format code

ae5e444

fs-eire dismissed guschmue’s stale review via ae5e444 April 19, 2023 01:41

xenova mentioned this pull request Apr 21, 2023

[Feature request] Add Support for vicuna-13b-delta-v1.1 xenova/transformers.js#96

Open

guschmue approved these changes Apr 21, 2023

View reviewed changes

snnn approved these changes Apr 24, 2023

View reviewed changes

fs-eire merged commit 14cc02c into main Apr 24, 2023

fs-eire deleted the fs-eire/js-ep-pr branch April 24, 2023 22:21

fs-eire mentioned this pull request Apr 25, 2023

[js/webgpu] following up for JSEP/WebGPU code cleanup #15666

Merged

xueyuanl mentioned this pull request Apr 25, 2023

Daily Hacker News 25-04-2023 xueyuanl/daily-hackernews#960

Open

DK013 mentioned this pull request Apr 25, 2023

Whisper on webGPU? xenova/transformers.js#100

Open

sandorkonya mentioned this pull request Apr 25, 2023

Whisper in web-llm with WebGPU? mlc-ai/web-llm#68

Open

github-actions bot mentioned this pull request Apr 26, 2023

2023-04-25 Hot Posts jiacai2050/mofish#264

Open

novohool mentioned this pull request Apr 26, 2023

what different with mlc-ai/web-llm 0hq/WebGPT#21

Closed

tuna2134 mentioned this pull request May 15, 2023

WebGPUに対応させる VOICEVOX/voicevox_core#491

Open

decahedron1 mentioned this pull request Sep 30, 2023

ONNX support VivekPanyam/carton#165

Open

beaufortfrancois reviewed Mar 11, 2024

View reviewed changes

fs-eire mentioned this pull request Mar 11, 2024

[js/webgpu] expose a few properties in WebGPU API #19857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[js/web] WebGPU backend via JSEP #14579

[js/web] WebGPU backend via JSEP #14579

fs-eire commented Feb 4, 2023 •

edited

Loading

redthing1 commented Jul 26, 2023

fs-eire commented Jul 26, 2023

guschmue commented Jul 27, 2023

loretoparisi commented Feb 13, 2024

beaufortfrancois Mar 11, 2024

xenova Mar 11, 2024

fs-eire Mar 11, 2024

fs-eire Mar 11, 2024

gyagp Mar 12, 2024

beaufortfrancois Mar 12, 2024

beaufortfrancois Mar 12, 2024

-    const adapter = await navigator.gpu.requestAdapter();
+    const adapter = await navigator.gpu.requestAdapter({
+      powerPreference: 'high-performance'
+    });

[js/web] WebGPU backend via JSEP #14579

[js/web] WebGPU backend via JSEP #14579

Conversation

fs-eire commented Feb 4, 2023 • edited Loading

Description

Q&A

Design Overview

redthing1 commented Jul 26, 2023

fs-eire commented Jul 26, 2023

guschmue commented Jul 27, 2023

loretoparisi commented Feb 13, 2024

beaufortfrancois Mar 11, 2024

Choose a reason for hiding this comment

xenova Mar 11, 2024

Choose a reason for hiding this comment

fs-eire Mar 11, 2024

Choose a reason for hiding this comment

fs-eire Mar 11, 2024

Choose a reason for hiding this comment

gyagp Mar 12, 2024

Choose a reason for hiding this comment

beaufortfrancois Mar 12, 2024

Choose a reason for hiding this comment

beaufortfrancois Mar 12, 2024

Choose a reason for hiding this comment

fs-eire commented Feb 4, 2023 •

edited

Loading