Releases · kp-forks/llama.cpp

31 Dec 21:48

0827b2c

b4404 Latest

Latest

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)

* Fixes for clang AVX VNNI

* enable AVX VNNI and alder lake build for MSVC

* Apply suggestions from code review

---------

Co-authored-by: slaren <[email protected]>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-31T21:48:36Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-31T21:48:45Z
llama-b4404-bin-macos-arm64.zip

60.2 MB 2024-12-31T21:48:56Z
llama-b4404-bin-macos-x64.zip

61 MB 2024-12-31T21:48:59Z
llama-b4404-bin-ubuntu-x64.zip

66 MB 2024-12-31T21:49:02Z
llama-b4404-bin-win-avx-x64.zip

9.76 MB 2024-12-31T21:49:04Z
llama-b4404-bin-win-avx2-x64.zip

9.77 MB 2024-12-31T21:49:05Z
llama-b4404-bin-win-avx512-x64.zip

9.78 MB 2024-12-31T21:49:05Z
llama-b4404-bin-win-cuda-cu11.7-x64.zip

147 MB 2024-12-31T21:49:06Z
llama-b4404-bin-win-cuda-cu12.4-x64.zip

147 MB 2024-12-31T21:49:12Z
Source code (zip)

2024-12-31T14:23:33Z
Source code (tar.gz)

2024-12-31T14:23:33Z

31 Dec 13:45

github-actions

b4402

5896c65

b4402

server : add OAI compat for /v1/completions (#10974)

* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs

Assets 23

31 Dec 05:38

github-actions

b4400

6e1531a

b4400

common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

Assets 23

30 Dec 22:08

github-actions

b4399

716bd6d

b4399

vulkan: optimize mul_mat for small values of N (#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

Assets 23

30 Dec 13:39

github-actions

b4398

c250ecb

b4398

android : fix llama_batch free (#11014)

Assets 23

29 Dec 13:43

github-actions

b4397

a813bad

b4397

vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

Assets 23

28 Dec 21:45

github-actions

b4395

f865ea1

b4395

server: added more docs for response_fields field (#10995)

Assets 23

26 Dec 21:42

github-actions

b4393

d79d8f3

b4393

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

24 Dec 16:55

github-actions

b4388

30caac3

b4388

llama : the WPM vocabs use the CLS token as BOS (#10930)

* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment

Assets 23

24 Dec 05:55

github-actions

b4387

60cfa72

b4387

ggml : use wstring for backend search paths (#10960)

ggml-ci

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: kp-forks/llama.cpp

b4404

b4402

b4400

b4399

b4398

b4397

b4395

b4393

b4388

b4387