Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Open
wants to merge 447 commits into
base: main
Choose a base branch
from
Open

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

wants to merge 447 commits into from

Conversation

xenova
Copy link
Owner

@xenova xenova commented Jan 27, 2024

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

How to use WebGPU

First, install the development branch

npm install xenova/transformers.js#v3

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova marked this pull request as draft January 27, 2024 18:02
@Huguet57
Copy link

Hey! This is great. Is this already in alpha?

@kishorekaruppusamy
Copy link

Team, is there any tentative time to release this v3 alpha ???

@jhpassion0621
Copy link

I can't wait anymore :) Please update me when it will be released!

@jhpassion0621

This comment was marked as outdated.

@kishorekaruppusamy

This comment was marked as outdated.

@jhpassion0621

This comment was marked as outdated.

@kishorekaruppusamy

This comment was marked as outdated.

@jhpassion0621

This comment was marked as outdated.

@NawarA
Copy link
Contributor

NawarA commented Mar 6, 2024

@xenova it looks like #596 is part of this release?! I think that means onnx_data files will be supported?

If true, I'm stoked!

Beyond upgrading ort to 1.17, are there other changes needed to support models with onnx_data files? Happy to try to lend a hand if possible

@xenova
Copy link
Owner Author

xenova commented Mar 9, 2024

Hi everyone! Today we released our first WebGPU x Transformers.js demo: The WebGPU Embedding Benchmark (online demo). If you'd like to help with testing, please run the benchmark and share your results! Thanks!

webgpu-benchmark

@khmyznikov
Copy link

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

@xenova
Copy link
Owner Author

xenova commented Mar 11, 2024

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

Not currently, but this is being worked on here: microsoft/onnxruntime#19857. We will add support here once ready.

@xenova xenova linked an issue Mar 12, 2024 that may be closed by this pull request
@xenova
Copy link
Owner Author

xenova commented Mar 13, 2024

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

webgpu-modnet.mp4

@beaufortfrancois
Copy link

beaufortfrancois commented Mar 14, 2024

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

You rock. Thanks! It's a cool demo! 👍

I've been wondering how we could improve it:

  • I've noticed you read the current frame of the video on the main thread. Would it help to move the entire demo to a web worker?
  • output[0].mul(255).to('uint8') takes some non negligible time to run. Is there a faster path?
  • How much you expect fp16 to improve perf? In https://developer.chrome.com/blog/new-in-webgpu-120#support_for_16-bit_floating-point_values_in_wgsl, we've noticed on an Apple M1 Pro device that the f16 implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32 implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed.
  • A way to feed a GPUExternalTexture to the model as an input could also come handy.

@@ -0,0 +1,3 @@
/**
* @typedef {'cpu'|'gpu'|'wasm'|'webgpu'|null} DeviceType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is 'gpu'?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to be a "catch-all" for the different ways that the library can be used with GPU support (not just in the browser with WebGPU). The idea is that it will simplify documentation, as transformers.js will select the best execution provider depending on the environment. For example, DML/CUDA support in onnxruntime-node (see microsoft/onnxruntime#16050 (comment))

Of course, this is still a work in progress, so it can definitely change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment