Skip to content

Releases: kp-forks/llama.cpp

b4404

31 Dec 21:48
0827b2c
Compare
Choose a tag to compare
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)

* Fixes for clang AVX VNNI

* enable AVX VNNI and alder lake build for MSVC

* Apply suggestions from code review

---------

Co-authored-by: slaren <[email protected]>

b4402

31 Dec 13:45
5896c65
Compare
Choose a tag to compare
server : add OAI compat for /v1/completions (#10974)

* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs

b4400

31 Dec 05:38
6e1531a
Compare
Choose a tag to compare
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

b4399

30 Dec 22:08
716bd6d
Compare
Choose a tag to compare
vulkan: optimize mul_mat for small values of N (#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

b4398

30 Dec 13:39
c250ecb
Compare
Choose a tag to compare
android : fix llama_batch free (#11014)

b4397

29 Dec 13:43
a813bad
Compare
Choose a tag to compare
vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

b4395

28 Dec 21:45
f865ea1
Compare
Choose a tag to compare
server: added more docs for response_fields field (#10995)

b4393

26 Dec 21:42
d79d8f3
Compare
Choose a tag to compare
vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4388

24 Dec 16:55
30caac3
Compare
Choose a tag to compare
llama : the WPM vocabs use the CLS token as BOS (#10930)

* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment

b4387

24 Dec 05:55
60cfa72
Compare
Choose a tag to compare
ggml : use wstring for backend search paths (#10960)

ggml-ci