YOLOS model extremely slow #533

tarekziade · 2024-01-23T17:11:31Z

System Info

latest wasm version

Environment/Platform

Description

I am trying to run https://huggingface.co/hustvl/yolos-tiny using a quantized version (similar to Xenova/yolos-tiny) and it works by using the object-detection pipeline but it is extremely slow.

An image that gets infered using the same model in transformers python takes around 15 seconds on my M1. The python version takes 190 ms.

I tried to run the web dev tool, and the curlpit is in the ONNX runtime at wasm-function[10863] @ ort-wasm-simd.wasm:0x801bfa but I don't have the debug symbols so it's kind of useless...

Is there a way to force transformers.js to run with a debug version of the ort runtime?

Reproduction

Runs the object detection demo at https://xenova.github.io/transformers.js/, swap the detr-resnet model with the yolo-tiny

The text was updated successfully, but these errors were encountered:

xenova · 2024-01-23T17:45:07Z

Can you try using the unquantized version? Done by specifying:

const pipe = await pipeline('task', 'model', { quantized: false });

tarekziade · 2024-01-24T07:29:40Z

It's slightly faster,

non quantized : 15s
quantized: 20s

Maybe it's the image encoding step, I will try to measure each step

tarekziade · 2024-01-24T08:11:18Z

I tried this:

    const model_name = "Xenova/yolos-tiny";
    let model = await AutoModelForObjectDetection.from_pretrained(model_name);

    var start = Date.now();
    const processor = await AutoProcessor.from_pretrained(model_name);
    const image = await RawImage.read(imageElement.src);
    const image_inputs = await processor(image);
    var end = Date.now();

    console.log(`Image processing Execution time: ${end - start} ms.`);

    var start = Date.now();
    const { image_embeds } = await model(image_inputs);
    var end = Date.now();
    console.log(`Inference Execution time: ${end - start} ms.`);

and that gives:

Image processing Execution time: 161 ms.
Inference Execution time: 14652 ms.

here's the full session recorded with Firefox's profiler

https://share.firefox.dev/497JdCi

the function that is slow is _OrtRun in the onnx runtime. I don't think I can get more info unless I run it with symbols.
Can you specify the runtime somewhere in the config? I could point one with the debug symbols

xenova · 2024-01-27T17:28:26Z

This might just be a limitation of onnxruntime-web's WASM execution provider, and can be fixed with the new WebGPU execution provider (coming soon).

@fs-eire @guschmue might be able to do more in-depth profiling.

tarekziade added the bug Something isn't working label Jan 23, 2024

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova linked a pull request Jan 27, 2024 that will close this issue

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova closed this as completed in #545 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YOLOS model extremely slow #533

YOLOS model extremely slow #533

tarekziade commented Jan 23, 2024 •

edited

Loading

xenova commented Jan 23, 2024

tarekziade commented Jan 24, 2024

tarekziade commented Jan 24, 2024

xenova commented Jan 27, 2024

YOLOS model extremely slow #533

YOLOS model extremely slow #533

Comments

tarekziade commented Jan 23, 2024 • edited Loading

System Info

Environment/Platform

Description

Reproduction

xenova commented Jan 23, 2024

tarekziade commented Jan 24, 2024

tarekziade commented Jan 24, 2024

xenova commented Jan 27, 2024

tarekziade commented Jan 23, 2024 •

edited

Loading