-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wgpu backend of image-classification-web example cannot work #2118
Comments
@nathanielsimard @louisfd just be aware of this jit related bug on WASM/WebGPU |
I would be great if we had WebGPU tests on CI #810 |
Is this bug caused by
|
This bug is caused by the limited default precision of Rust's display of f32 values. It should be an easy fix. I changed the code from https://github.com/tracel-ai/cubecl/blob/cfe0b0204380cbd0931f478194a053a6ac35d1cb/crates/cubecl-wgpu/src/compiler/wgsl/base.rs#L262-L263 : FloatKind::F32 => f.write_fmt(format_args!("{}f", *val as f32)),
FloatKind::F64 => f.write_fmt(format_args!("{}f", { *val })), to: FloatKind::F32 => f.write_fmt(format_args!("{:.9}f", *val as f32)),
FloatKind::F64 => f.write_fmt(format_args!("{:.17}f", { *val })), It can fix the bug. |
I thought maybe it was due to incompatible versions between the main branch and the examples, but I have the same issue after downloading the 0.13.2 release out-of-the-box then Then I tried with 0.13.1 release as well as the 0.13.0 release, and this produces the same issue as described in the first place (also, the Candle backend does not work either, as per issue #1034 ). I am just not able to reproduce the @antimora do you by any chance kept around the original repo you used to make your published version work? Would be very helpful to diff it and check what's wrong. For instance, how did you make the Candle backend work without the By the way, thanks for sharing this great project :) |
In fact, I've tried all the release versions of Burn since 10.0.0 on my device, but the wgpu backend in the |
Reproduction steps: # Clone repo
git clone [email protected]:tracel-ai/burn.git
# get into repo
cd burn
# Change cubecl dependency to revision that include the suggested bugfix (https://github.com/tracel-ai/cubecl/commit/32feabc5140170d45d4365a56106db930ed79a33)
# For reproduction purposes, here I use the sd utility: (https://github.com/chmln/sd), but one can just change it manually in the Cargo.toml for both cubecl AND cubecl-common
sd '(cubecl.* rev =).*(\})' '$1 "32feabc5140170d45d4365a56106db930ed79a33" $2' Cargo.toml
# cd into the relevant example
cd examples/image-classification-web
# compile the example
./build-for-web.sh
# run the server
./run-server.sh RESULTS: ✅ NdArray backend: working (slower than 0.13.2 version by an order of magnitude, but still okay) ❗ Candle backend: backend LOADS correctly but cannot do inference, ❌ Wgpu backend: cannot load, Did I miss something here? |
@Jonarod You also need to disable the use of the autotune feature for burn-wgpu. As for Candle, I don't know how to make it work either. |
Hey @wcshds 👋 I was going to update the cubecl dep to the latest to include some fixes, but some wgpu tests are failing with your merged PR.
Tried to understand the problem in this issue but I think I'm missing a bit of context.. could you explain why the precision change was required? |
Worked like a charm. Thanks for your help. I submitted corresponding PR that should solve this issue. If you read this while PR is not merged to main, basically the solution, as suggested by @wcshds is to:
I think this can be closed now. |
@laggui This is because I saw in the browser console that -3.40282347E+38f32 was rounded to -340282350000000000000000000000000000000f, which caused a WGSL compilation error, so I believe this is an issue with Rust's default display precision. It's strange that the test failed, but changing the precision to 13 decimal places solved the problem. FloatKind::F32 => f.write_fmt(format_args!("{:.13}f", *val as f32)), I originally thought that a precision of 9 decimal places would be sufficient for f32. |
Weird 😅 Not sure if the fix is the proper way to address this or if it's just a patch for a more specific issue. It doesn't seem to happen anywhere else 🤔 /edit: fyi, we have decided to revert the changes applied to the precision for now. The current workaround is at least documented for users to try while we investigate why this happens in this specific example. |
When I choose the wgpu backend, I get errors in the console.
After I disable the
autotune
feature of burn-wgpu, the wgpu backend still cannot work. The live demo works fine, so I think it's not my device's problem.The text was updated successfully, but these errors were encountered: