Skip to content

b3804

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 23 Sep 03:48
c35e586
musa: enable building fat binaries, enable unified memory, and disabl…

…e Flash Attention on QY1 (MTT S80) (#9526)

* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <[email protected]>

* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <[email protected]>

* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <[email protected]>

* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>