sync : llama.cpp #1020

ggerganov · 2024-11-17T12:02:31Z

No description provided.

…ags (llama/10314)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.

ggml-ci

ggerganov/llama.cpp#10352

* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci

ggml-ci

Copilot reviewed 7 out of 27 changed files in this pull request and generated no suggestions.

Files not reviewed (20)

CMakeLists.txt: Language not supported
scripts/sync-llama.last: Language not supported
src/ggml-aarch64.c: Language not supported
src/ggml-amx/ggml-amx.cpp: Language not supported
src/ggml-backend.cpp: Language not supported
src/ggml-cpu/CMakeLists.txt: Language not supported
src/ggml-cpu/ggml-cpu-aarch64.c: Language not supported
src/ggml-cpu/ggml-cpu.c: Language not supported
src/ggml-cpu/llamafile/sgemm.cpp: Language not supported
src/ggml-cuda/CMakeLists.txt: Language not supported
src/ggml-cuda/ggml-cuda.cu: Language not supported
src/ggml-cuda/mmv.cu: Language not supported
src/ggml-cuda/mmv.cuh: Language not supported
src/ggml-hip/CMakeLists.txt: Language not supported
src/ggml-metal/CMakeLists.txt: Language not supported
src/ggml-metal/ggml-metal-impl.h: Language not supported
src/ggml-musa/CMakeLists.txt: Language not supported
src/ggml-opt.cpp: Language not supported
src/ggml-vulkan/ggml-vulkan.cpp: Language not supported
src/ggml-vulkan/vulkan-shaders/mul_mat_vec.comp: Language not supported

ggml-ci

ggerganov · 2024-11-18T08:37:46Z

@JohannesGaessler The test-opt seg faults from time to time:

https://github.com/ggml-org/ci/tree/results/ggml/17/8ebfcc5f125085d51e0953b2d8230c21358650/ggml-4-x86-cuda-v100#ctest_release

I can reproduce this also on master with my CUDA box by letting the following command run for a while:

while ./bin/test-opt  ; do date ; done

The CPU backend would also occasionally fail in the test_gradient_accumulation test:

  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=2, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=2, subtest=results): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=3, subtest=grads): FAIL
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=3, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=3, subtest=results): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=4, subtest=grads): FAIL
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=4, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=1, loss_type=mean, epoch=4, subtest=results): OK
  test_regression(subtest=weights): OK
  116/118 tests passed
  Backend CPU: FAIL

ggerganov · 2024-11-18T08:48:08Z

I managed to get a stacktrace for one of the seg faults:

Backend 1/3: Metal
  Device description: Apple M2 Ultra
  Device memory: 147456 MB (147450 MB free)

ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=1): OK
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=2): OK
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=3): OK
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=4): OK
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=5): OK
  test_dataset(shuffle=no, ndata_shard=1, ndata_batch=6): OK
  test_dataset(shuffle=no, ndata_shard=2, ndata_batch=2): OK
  test_dataset(shuffle=no, ndata_shard=2, ndata_batch=4): OK
  test_dataset(shuffle=no, ndata_shard=2, ndata_batch=6): OK
  test_dataset(shuffle=no, ndata_shard=3, ndata_batch=3): OK
  test_dataset(shuffle=no, ndata_shard=3, ndata_batch=6): OK
  test_dataset(shuffle=no, ndata_shard=4, ndata_batch=4): OK
  test_dataset(shuffle=no, ndata_shard=5, ndata_batch=5): OK
  test_dataset(shuffle=no, ndata_shard=6, ndata_batch=6): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=1): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=2): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=3): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=4): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=5): OK
  test_dataset(shuffle=yes, ndata_shard=1, ndata_batch=6): OK
  test_dataset(shuffle=yes, ndata_shard=2, ndata_batch=2): OK
  test_dataset(shuffle=yes, ndata_shard=2, ndata_batch=4): OK
  test_dataset(shuffle=yes, ndata_shard=2, ndata_batch=6): OK
  test_dataset(shuffle=yes, ndata_shard=3, ndata_batch=3): OK
  test_dataset(shuffle=yes, ndata_shard=3, ndata_batch=6): OK
  test_dataset(shuffle=yes, ndata_shard=4, ndata_batch=4): OK
  test_dataset(shuffle=yes, ndata_shard=5, ndata_batch=5): OK
  test_dataset(shuffle=yes, ndata_shard=6, ndata_batch=6): OK
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_grad(): OK
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_forward_backward(high_level=no, shuffle=no, subtest=results_initial): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_forward_backward(high_level=no, shuffle=no, subtest=weights_after_forward): OK
  test_forward_backward(high_level=no, shuffle=no, subtest=results_after_forward): OK
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_forward_backward(high_level=no, shuffle=no, subtest=weights_after_forward_backward): OK
  test_forward_backward(high_level=no, shuffle=no, subtest=result_after_forward_backward): OK
  test_forward_backward(high_level=yes, shuffle=no, subtest=results_initial): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_forward_backward(high_level=yes, shuffle=no, subtest=weights_after_forward): OK
  test_forward_backward(high_level=yes, shuffle=no, subtest=results_after_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_forward_backward(high_level=yes, shuffle=no, subtest=weights_after_forward_backward): OK
  test_forward_backward(high_level=yes, shuffle=no, subtest=result_after_forward_backward): OK
  test_forward_backward(high_level=yes, shuffle=yes, subtest=results_initial): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_forward_backward(high_level=yes, shuffle=yes, subtest=weights_after_forward): OK
  test_forward_backward(high_level=yes, shuffle=yes, subtest=results_after_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_forward_backward(high_level=yes, shuffle=yes, subtest=weights_after_forward_backward): OK
  test_forward_backward(high_level=yes, shuffle=yes, subtest=result_after_forward_backward): OK
  test_epoch_vs_fit(): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=no, epoch=1, subtest=weights): OK
  test_idata_split(high_level=no, epoch=1, subtest=results_backward): OK
  test_idata_split(high_level=no, epoch=1, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=no, epoch=2, subtest=weights): OK
  test_idata_split(high_level=no, epoch=2, subtest=results_backward): OK
  test_idata_split(high_level=no, epoch=2, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=no, epoch=3, subtest=weights): OK
  test_idata_split(high_level=no, epoch=3, subtest=results_backward): OK
  test_idata_split(high_level=no, epoch=3, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=no, epoch=4, subtest=weights): OK
  test_idata_split(high_level=no, epoch=4, subtest=results_backward): OK
  test_idata_split(high_level=no, epoch=4, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=yes, epoch=1, subtest=weights): OK
  test_idata_split(high_level=yes, epoch=1, subtest=results_backward): OK
  test_idata_split(high_level=yes, epoch=1, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=yes, epoch=2, subtest=weights): OK
  test_idata_split(high_level=yes, epoch=2, subtest=results_backward): OK
  test_idata_split(high_level=yes, epoch=2, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=yes, epoch=3, subtest=weights): OK
  test_idata_split(high_level=yes, epoch=3, subtest=results_backward): OK
  test_idata_split(high_level=yes, epoch=3, subtest=results_forward): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
  test_idata_split(high_level=yes, epoch=4, subtest=weights): OK
  test_idata_split(high_level=yes, epoch=4, subtest=results_backward): OK
  test_idata_split(high_level=yes, epoch=4, subtest=results_forward): OK
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=1, subtest=grads): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=1, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=1, subtest=results): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=2, subtest=grads): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=2, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=2, subtest=results): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=3, subtest=grads): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=3, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=3, subtest=results): OK
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=4, subtest=grads): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=4, subtest=weights): OK
  test_gradient_accumulation(high_level=no, nbatch_physical=2, loss_type=sum, epoch=4, subtest=results): OK
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
Process 52079 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xe8)
    frame #0: 0x000000010038fc18 libggml-base.dylib`ggml_backend_tensor_get(tensor=0x0000000000000000, data=0x0000600000789244, offset=0, size=4) at ggml-backend.cpp:269:41
   266 	}
   267 	
   268 	void ggml_backend_tensor_get(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) {
-> 269 	   ggml_backend_buffer_t buf = tensor->view_src ? tensor->view_src->buffer : tensor->buffer;
   270 	
   271 	   if (size == 0) {
   272 	       return;
Target 0: (test-opt) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xe8)
  * frame #0: 0x000000010038fc18 libggml-base.dylib`ggml_backend_tensor_get(tensor=0x0000000000000000, data=0x0000600000789244, offset=0, size=4) at ggml-backend.cpp:269:41
    frame #1: 0x0000000100008c5c test-opt`test_gradient_accumulation(backend_sched=0x000000011fa5b400, backend=0x00006000029a0a80, nbatch_physical=2, loss_type=GGML_OPT_LOSS_TYPE_MEAN) at test-opt.cpp:598:17
    frame #2: 0x0000000100006974 test-opt`test_backend(backend_sched=0x000000011fa5b400, backend=0x00006000029a0a80) at test-opt.cpp:815:43
    frame #3: 0x0000000100005ee0 test-opt`main at test-opt.cpp:865:38
    frame #4: 0x000000019acdc274 dyld`start + 2840
(lldb) print *tensor
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb) print tensor
(const ggml_tensor *) nullptr
(lldb) frame select 1
frame #1: 0x0000000100008c5c test-opt`test_gradient_accumulation(backend_sched=0x000000011fa5b400, backend=0x00006000029a0a80, nbatch_physical=2, loss_type=GGML_OPT_LOSS_TYPE_MEAN) at test-opt.cpp:598:17
   595 	               ggml_opt_forward_backward(cd.opt_ctx, cd.result);
   596 	
   597 	               grad_history[idata + 0] = 0.0f;
-> 598 	               ggml_backend_tensor_get(ggml_opt_grad_acc(cd.opt_ctx, cd.weights), grad_history.data() + idata + 1, 0, 1*sizeof(float));
   599 	           }
   600 	       } else {
   601 	           GGML_ASSERT(false);
(lldb)

JohannesGaessler · 2024-11-18T22:41:42Z

When running test-opt on a loop I also eventually see a test failure but the test failure manifests in a different way than on Georgi's machine. I always get a failure in test_regression. However, when I remove all other tests test_regression is no longer failing. The only common reference between the tests is an instance of ggml_backend_sched_t. When I modified the tests to initialize and free a dedicated instance for each test I got the same failure pattern as Georgi. Also looking at the tests I'm noticing that we don't actually have any test code outside of test-opt that utilizes ggml_backend_sched so we can't conclusively tell whether ggml_backend_sched or ggml_opt is causing a failure in test-opt. Even without the optimization code, ggml_backend_sched is a fairly complex component and I think it would make sense to add tests (but if we do this we should coordinate). I would in principle be willing to write tests for ggml_backend_sched but I don't feel very confident in my understanding of the code and will likely require assistance.

cc: @slaren

slaren · 2024-11-18T23:06:00Z

It is indirectly tested in any test that run llama.cpp. I agree it would be good to have tests for it, but it's not an easy component to write unit tests for. At some point I will probably rewrite it in C++ with testing in mind. I don't see how it could cause ggml_opt_grad_acc to return NULL, however.

JohannesGaessler · 2024-11-19T08:39:03Z

I think I worded my post poorly. I agree that in this particular instance the bug is overwhelmingly likely in ggml_opt. I was just thinking that tests would be nice to have in general.

Srihari-mcw and others added 15 commits November 17, 2024 14:01

Make updates to fix issues with clang-cl builds while using AVX512 fl…

ebb1192

…ags (llama/10314)

ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324)

fa0b72b

llamafile : fix include path (llama/0)

92bc9e2

ggml-ci

ggml : fix compile warnings (llama/0)

39de653

ggml-ci

ggml : adapt AMX to tensor->grad removal (llama/0)

0426062

ggml-ci

ggml : inttypes.h -> cinttypes (llama/0)

1bf5d5f

ggml-ci

ggml : fix possible buffer use after free in sched reserve (llama/9930)

23d71e1

CMake: default to -arch=native for CUDA build (llama/10320)

35b66a3

CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)

1f57528

ggml : fix undefined reference to 'getcpu' (llama/10354)

492779b

ggerganov/llama.cpp#10352

llama : only use default buffer types for the KV cache (llama/10358)

ffd430e

CMake: fix typo in comment [no ci] (llama/10360)

8560efb

sync : llama.cpp

d7e1367

ggml-ci

ggerganov requested a review from Copilot November 17, 2024 12:02

Copilot AI reviewed Nov 17, 2024

View reviewed changes

JohannesGaessler and others added 2 commits November 18, 2024 10:13

CUDA: fix MMV kernel being used for FP16 src1 (llama/10357)

fa00172

sync : llama.cpp

178ebfc

ggml-ci

ggerganov merged commit 7fe59da into master Nov 18, 2024
9 of 10 checks passed

ggerganov deleted the sync branch November 18, 2024 08:56

JohannesGaessler mentioned this pull request Nov 18, 2024

ggml-opt: fix data corruption #1022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1020

sync : llama.cpp #1020

ggerganov commented Nov 17, 2024 •

edited

Loading

ggerganov commented Nov 18, 2024

ggerganov commented Nov 18, 2024

JohannesGaessler commented Nov 18, 2024

slaren commented Nov 18, 2024

JohannesGaessler commented Nov 19, 2024

sync : llama.cpp #1020

sync : llama.cpp #1020

Conversation

ggerganov commented Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Nov 18, 2024

ggerganov commented Nov 18, 2024

JohannesGaessler commented Nov 18, 2024

slaren commented Nov 18, 2024

JohannesGaessler commented Nov 19, 2024

ggerganov commented Nov 17, 2024 •

edited

Loading