Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOS model extremely slow #533

Closed
1 of 5 tasks
tarekziade opened this issue Jan 23, 2024 · 4 comments · Fixed by #545
Closed
1 of 5 tasks

YOLOS model extremely slow #533

tarekziade opened this issue Jan 23, 2024 · 4 comments · Fixed by #545
Labels
bug Something isn't working

Comments

@tarekziade
Copy link

tarekziade commented Jan 23, 2024

System Info

latest wasm version

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I am trying to run https://huggingface.co/hustvl/yolos-tiny using a quantized version (similar to Xenova/yolos-tiny) and it works by using the object-detection pipeline but it is extremely slow.

An image that gets infered using the same model in transformers python takes around 15 seconds on my M1. The python version takes 190 ms.

I tried to run the web dev tool, and the curlpit is in the ONNX runtime at wasm-function[10863] @ ort-wasm-simd.wasm:0x801bfa but I don't have the debug symbols so it's kind of useless...

Is there a way to force transformers.js to run with a debug version of the ort runtime?

Reproduction

Runs the object detection demo at https://xenova.github.io/transformers.js/, swap the detr-resnet model with the yolo-tiny

@tarekziade tarekziade added the bug Something isn't working label Jan 23, 2024
@xenova
Copy link
Collaborator

xenova commented Jan 23, 2024

Can you try using the unquantized version? Done by specifying:

const pipe = await pipeline('task', 'model', { quantized: false });

@tarekziade
Copy link
Author

It's slightly faster,

  • non quantized : 15s
  • quantized: 20s

Maybe it's the image encoding step, I will try to measure each step

@tarekziade
Copy link
Author

I tried this:

    const model_name = "Xenova/yolos-tiny";
    let model = await AutoModelForObjectDetection.from_pretrained(model_name);

    var start = Date.now();
    const processor = await AutoProcessor.from_pretrained(model_name);
    const image = await RawImage.read(imageElement.src);
    const image_inputs = await processor(image);
    var end = Date.now();

    console.log(`Image processing Execution time: ${end - start} ms.`);

    var start = Date.now();
    const { image_embeds } = await model(image_inputs);
    var end = Date.now();
    console.log(`Inference Execution time: ${end - start} ms.`);

and that gives:

Image processing Execution time: 161 ms.
Inference Execution time: 14652 ms.

here's the full session recorded with Firefox's profiler

https://share.firefox.dev/497JdCi

the function that is slow is _OrtRun in the onnx runtime. I don't think I can get more info unless I run it with symbols.
Can you specify the runtime somewhere in the config? I could point one with the debug symbols

@xenova
Copy link
Collaborator

xenova commented Jan 27, 2024

This might just be a limitation of onnxruntime-web's WASM execution provider, and can be fixed with the new WebGPU execution provider (coming soon).

@fs-eire @guschmue might be able to do more in-depth profiling.

@xenova xenova linked a pull request Jan 27, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants