Releases: kp-forks/llama.cpp
Releases · kp-forks/llama.cpp
b4388
llama : the WPM vocabs use the CLS token as BOS (#10930) * llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment
b4387
ggml : use wstring for backend search paths (#10960) ggml-ci
b4385
ggml : fix const usage in SSE path (#10962)
b4384
server : fix missing model id in /model endpoint (#10957) * server : fix missing model id in /model endpoint * fix ci
b4375
vulkan: build fixes for 32b (#10927) * vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables
b4372
ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0…
b4371
SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) * Migrate to tensor->buffer for checking backend buffer type: 1 * SYCL: common.cpp try to migrate away from tensor->backend * SYCL: fix assertions and add proper comments * SYCL: remove extra space * SYCL: Add back static to ggml_backend_buffer_is_sycl_split function * SYCL: Add pragma directive to suppress warning spam * SYCL: Integrate debug logs with GGML_LOG and other fixes * Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97. Let's keep the current SYCL specific logging mechanism for now * SYCL: Use GGML_SYCL_DEBUG after reverting * SYCL: reg_get_proc_address func, update to the current func signature * SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
b4369
ggml : add test for SVE and disable when it fails (#10906)
b4367
clip : disable GPU support (#10896) ggml-ci
b4361
convert : Add support for Microsoft Phi-4 model (#10817) * convert : use GPT2 vocab for Phi-4 model * convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models * llama : do not use sliding window attention mask for Phi-4 model --------- Co-authored-by: Stanisław Szymczyk <[email protected]>