Releases: kp-forks/llama.cpp
Releases · kp-forks/llama.cpp
b4404
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>
b4402
server : add OAI compat for /v1/completions (#10974) * server : add OAI compat for /v1/completions * add test * add docs * better docs
b4400
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…
b4399
vulkan: optimize mul_mat for small values of N (#10991) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.
b4398
android : fix llama_batch free (#11014)
b4397
vulkan: im2col and matmul optimizations for stable diffusion (#10942) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup
b4395
server: added more docs for response_fields field (#10995)
b4393
vulkan: multi-row k quants (#10846) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default
b4388
llama : the WPM vocabs use the CLS token as BOS (#10930) * llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment
b4387
ggml : use wstring for backend search paths (#10960) ggml-ci