[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

xenova · 2024-01-27T17:53:18Z

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

How to use WebGPU

First, install the development branch

npm install xenova/transformers.js#v3

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());

HuggingFaceDocBuilderDev · 2024-01-27T17:56:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Huguet57 · 2024-01-31T00:34:02Z

Hey! This is great. Is this already in alpha?

kishorekaruppusamy · 2024-02-06T11:10:57Z

Team, is there any tentative time to release this v3 alpha ???

jhpassion0621 · 2024-02-12T17:24:01Z

I can't wait anymore :) Please update me when it will be released!

NawarA · 2024-03-06T22:37:15Z

@xenova it looks like #596 is part of this release?! I think that means onnx_data files will be supported?

If true, I'm stoked!

Beyond upgrading ort to 1.17, are there other changes needed to support models with onnx_data files? Happy to try to lend a hand if possible

xenova · 2024-03-09T01:10:55Z

Hi everyone! Today we released our first WebGPU x Transformers.js demo: The WebGPU Embedding Benchmark (online demo). If you'd like to help with testing, please run the benchmark and share your results! Thanks!

khmyznikov · 2024-03-11T18:17:40Z

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

xenova · 2024-03-11T22:16:33Z

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

Not currently, but this is being worked on here: microsoft/onnxruntime#19857. We will add support here once ready.

examples/webgpu-embedding-benchmark/main.js

xenova · 2024-03-13T18:14:18Z

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

webgpu-modnet.mp4

Model used: https://huggingface.co/Xenova/modnet (~4 years old, and it clearly struggles on hands moving quickly). I will try on more up-to-date models soon.
Video tested: https://www.youtube.com/watch?v=NXpdyAWLDas
Online demo: https://huggingface.co/spaces/Xenova/webgpu-video-background-removal

beaufortfrancois · 2024-03-14T13:34:06Z

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

You rock. Thanks! It's a cool demo! 👍

I've been wondering how we could improve it:

I've noticed you read the current frame of the video on the main thread. Would it help to move the entire demo to a web worker?
output[0].mul(255).to('uint8') takes some non negligible time to run. Is there a faster path?
How much you expect fp16 to improve perf? In https://developer.chrome.com/blog/new-in-webgpu-120#support_for_16-bit_floating-point_values_in_wgsl, we've noticed on an Apple M1 Pro device that the f16 implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32 implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed.
A way to feed a GPUExternalTexture to the model as an input could also come handy.

beaufortfrancois · 2024-03-14T14:27:15Z

src/utils/devices.js

@@ -0,0 +1,3 @@
+/**
+ * @typedef {'cpu'|'gpu'|'wasm'|'webgpu'|null} DeviceType


Out of curiosity, what is 'gpu'?

It's meant to be a "catch-all" for the different ways that the library can be used with GPU support (not just in the browser with WebGPU). The idea is that it will simplify documentation, as transformers.js will select the best execution provider depending on the environment. For example, DML/CUDA support in onnxruntime-node (see microsoft/onnxruntime#16050 (comment))

Of course, this is still a work in progress, so it can definitely change!

…allback tokens

… set

This was linked to issues Jan 27, 2024

[Feature request] WebGPU support #73

Open

[Feature request] Deno Support #78

Open

YOLOS model extremely slow #533

Open

xenova marked this pull request as draft January 27, 2024 18:02

xenova mentioned this pull request Feb 3, 2024

Compatibility with the latest onnxruntime 1.17.0 #560

Open

This comment was marked as outdated.

Sign in to view

beaufortfrancois mentioned this pull request Mar 11, 2024

[js/web] WebGPU backend via JSEP microsoft/onnxruntime#14579

Merged

This was referenced Mar 11, 2024

Update package.json to latest sharp #640

Closed

Update sharp.js version (+ add Deno NPM support) #463

Closed

xenova mentioned this pull request Mar 11, 2024

Update onnxruntime to recent version (works with bun) #489

Closed

felladrin reviewed Mar 12, 2024

View reviewed changes

examples/webgpu-embedding-benchmark/main.js Outdated Show resolved Hide resolved

xenova linked an issue Mar 12, 2024 that may be closed by this pull request

Does WEBGPU Truly Enhance Inference Time Acceleration? #586

Open

beaufortfrancois reviewed Mar 14, 2024

View reviewed changes

This was referenced Mar 14, 2024

Errors uploading PDF files jacoblee93/fully-local-pdf-chatbot#6

Closed

Library no longer maintained? #646

Closed

Segmentation fault (core dumped) when using Transformers.js oven-sh/bun#4619

Open

xenova added 30 commits September 9, 2024 17:10

Rather make a copy of this.added_tokens

d7df575

Fix .tokenize with fuse_unk=true

a519379

Add blenderbot tokenizer tests

89ddccf

Add t5 tokenizer tests

36ad144

Add falcon tokenizer tests

4765dd6

Run prettier

fd8b9a2

Add ESM tokenizer tests

710816e

Run unit tests in parallel

0d3cd30

Fix fuse_unk for tokenizers with byte_fallback=true but no byte f…

cc258c2

…allback tokens

Add llama tokenizer unit tests

4798755

Update emoji test string names

c6c3ae1

Move whisper-specific unit tests to subfolder

79a7409

Code formatting

1a38804

Bump versions

dabe6ae

[version] Update to 3.0.0-alpha.15

54f1f21

Add emoji tokenizer test cases for LlamaTokenizer

a912d79

Attempt to fix encoder-decoder memory leak

969d10e

Remove unused code

072cbbc

Fix BertNormalizer (strip Mn unicode characters)

14b4bd4

Handle ZERO WIDTH JOINER (U+200D) characters

6797771

Add more spm normalization characters

f148afd

Add emoji unit tests for bert/t5

ca4b5b9

[WebNN] Add support for specifying free_dimension_overrides in config

113c81e

Log warning if webnn is selected by free_dimension_overrides is not…

9005acc

… set

Fix unigram for multi-byte tokens

682c7d0

Add gemma tokenizer tests

4a31e54

Allow user to specify device and dtype in config.json

7a16065

Update dependency versions

4c1d21b

Bump versions

3c6a95a

[version] Update to 3.0.0-alpha.16

ac391d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

xenova commented Jan 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 27, 2024

Huguet57 commented Jan 31, 2024

kishorekaruppusamy commented Feb 6, 2024

jhpassion0621 commented Feb 12, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

NawarA commented Mar 6, 2024

xenova commented Mar 9, 2024

khmyznikov commented Mar 11, 2024

xenova commented Mar 11, 2024 •

edited

Loading

xenova commented Mar 13, 2024

beaufortfrancois commented Mar 14, 2024 •

edited

Loading

beaufortfrancois Mar 14, 2024

xenova Mar 14, 2024

		@@ -0,0 +1,3 @@
		/**
		* @typedef {'cpu'\|'gpu'\|'wasm'\|'webgpu'\|null} DeviceType

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Are you sure you want to change the base?

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Conversation

xenova commented Jan 27, 2024 • edited Loading

How to use WebGPU

HuggingFaceDocBuilderDev commented Jan 27, 2024

Huguet57 commented Jan 31, 2024

kishorekaruppusamy commented Feb 6, 2024

jhpassion0621 commented Feb 12, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

NawarA commented Mar 6, 2024

xenova commented Mar 9, 2024

khmyznikov commented Mar 11, 2024

xenova commented Mar 11, 2024 • edited Loading

xenova commented Mar 13, 2024

beaufortfrancois commented Mar 14, 2024 • edited Loading

beaufortfrancois Mar 14, 2024

Choose a reason for hiding this comment

xenova Mar 14, 2024

Choose a reason for hiding this comment

xenova commented Jan 27, 2024 •

edited

Loading

xenova commented Mar 11, 2024 •

edited

Loading

beaufortfrancois commented Mar 14, 2024 •

edited

Loading