Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in MUL_MAT with GGML_VULKAN_CHECK_RESULTS #941

Open
SRHMorris opened this issue Aug 29, 2024 · 1 comment
Open

Error in MUL_MAT with GGML_VULKAN_CHECK_RESULTS #941

SRHMorris opened this issue Aug 29, 2024 · 1 comment

Comments

@SRHMorris
Copy link
Contributor

SRHMorris commented Aug 29, 2024

This is using whisper.cpp from commit c4e1861d2c24b186cbbac6c07480aaa298b0e6d9 compiled with GGML_VULKAN=ON and GGML_VULKAN_CHECK_RESULTS=ON (enabled because I was trying to debug a very poor transcription on a specific GPU).

...
421751 node_307 op=ADD avg_err=0
421752 node_310 op=MUL_MAT avg_err=0.00160538
421753 node_311 op=SOFT_MAX avg_err=0.00104975
ERROR: avg_err=0.106302 in MUL_MAT (check 421754)
tensor=00000151636AEB90 tensor->name=node_312 tensor->type: f32 ne0=64 nb0=4 ne1=1 nb1=256 ne2=8 nb2=256 ne3=1 nb3=2048 offset=0
src0=00000151636AE740 op=VIEW type=f16 ne0=1500 nb0=2 ne1=64 nb1=3000 ne2=8 nb2=192000 ne3=1 nb3=1536000 offset=7680000
src1=00000151636AEA20 op=SOFT_MAX type=f32 ne0=1500 nb0=4 ne1=1 nb1=6000 ne2=8 nb2=6000 ne3=1 nb3=48000 offset=0
First error: result=-0.58789 correct=-0.362305 i3=0 i2=0 i1=0 i0=0

Result:
               0       1       2       3       4       5       6       7       8       9
      0:   -0.59
      1:   -0.19
      2:    1.50
      3:    0.31
      4:    1.36
      5:    1.56
      6:   -1.42
      7:   -0.06
      8:   -0.38
      9:   -0.13

Correct:
               0       1       2       3       4       5       6       7       8       9
      0:   -0.36
      1:    0.17
      2:    0.26
      3:   -0.04
      4:    0.20
      5:   -0.30
      6:   -0.06
      7:   -0.88
      8:   -0.49
      9:   -0.83

MUL_MAT gpu=1
 VIEW gpu=1
  NONE gpu=1
 SOFT_MAX gpu=1
  MUL_MAT gpu=1
   VIEW gpu=1
    NONE gpu=1
   PERMUTE gpu=1
    RESHAPE gpu=1
     ADD gpu=1
      MUL_MAT gpu=1
       NONE gpu=1
       ADD gpu=1
        MUL gpu=1
         NORM gpu=1
          ADD gpu=1
         NONE gpu=1
        NONE gpu=1
      NONE gpu=1
C:\Users\...\whisper.cpp\ggml\src\ggml-vulkan.cpp:7367: fatal error

I'm unsure if this is related to the poor transcription or not, as I also get a similar issue on a GPU that gives a good transcription. The above message is from an AMD RX 7900 XT.

I can see that mul_mat_vec.comp contains some barrier() calls. But it also has an early return before this. As barriers() should be executed by all work items, this leads to undefined behaviour. It's possible this could be the cause of the bug?

@jeffbolznv
Copy link
Contributor

I can see that mul_mat_vec.comp contains some barrier() calls. But it also has an early return before this. As barriers() should be executed by all work items, this leads to undefined behaviour. It's possible this could be the cause of the bug?

I agree it was undefined behavior and might cause such a bug. I've removed this early return in ggerganov/llama.cpp@772703c. Can you retry after that commit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants