-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync : llama.cpp #965
sync : llama.cpp #965
Commits on Sep 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6f8c166 - Browse repository at this point
Copy the full SHA 6f8c166View commit details -
add check malloc result on device (llama/9346)
* add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f3068aa - Browse repository at this point
Copy the full SHA f3068aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 58ccd65 - Browse repository at this point
Copy the full SHA 58ccd65View commit details -
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …
…by submitting smaller cmdbuffers early. (llama/9118) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
Configuration menu - View commit details
-
Copy full SHA for cb3abaa - Browse repository at this point
Copy the full SHA cb3abaaView commit details -
Configuration menu - View commit details
-
Copy full SHA for ab7c211 - Browse repository at this point
Copy the full SHA ab7c211View commit details -
ggml : vector length agnostic SVE support (llama/9290)
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 82088b4 - Browse repository at this point
Copy the full SHA 82088b4View commit details -
rpc : fix segfault with nkvo (llama/9389)
* rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9f9246d - Browse repository at this point
Copy the full SHA 9f9246dView commit details -
Configuration menu - View commit details
-
Copy full SHA for bf778d0 - Browse repository at this point
Copy the full SHA bf778d0View commit details -
sycl : update support conditions (llama/9394)
* sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <[email protected]> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98d9ff3 - Browse repository at this point
Copy the full SHA 98d9ff3View commit details -
musa: remove Clang builtins mapping (llama/9421)
Signed-off-by: Xiaodong Ye <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6c6800c - Browse repository at this point
Copy the full SHA 6c6800cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ed50f6e - Browse repository at this point
Copy the full SHA ed50f6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for f24368f - Browse repository at this point
Copy the full SHA f24368fView commit details -
riscv : modify Makefile and add a RISCV_VECT to print log info (llama…
…/9442) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V
Configuration menu - View commit details
-
Copy full SHA for 4fcc15a - Browse repository at this point
Copy the full SHA 4fcc15aView commit details -
cann: Add host buffer type for Ascend NPU (llama/9406)
* feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments
Configuration menu - View commit details
-
Copy full SHA for bab17a2 - Browse repository at this point
Copy the full SHA bab17a2View commit details -
cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9…
…463) * cmake : use list(APPEND ...) instead of set() + dedup linker ggml-ci * cmake : try fix sycl * cmake : try to fix sycl 2 * cmake : fix sycl build (llama/9469) * try fix sycl build * use CMAKE_CXX_FLAGS as a string variable --------- Co-authored-by: Georgi Gerganov <[email protected]> * one more CMAKE_CXX_FLAGS fix (llama/9471) --------- Co-authored-by: Michael Podvitskiy <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9a00cfc - Browse repository at this point
Copy the full SHA 9a00cfcView commit details -
ggml : ggml_type_name return "NONE" for invalid values (llama/9458)
When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
Configuration menu - View commit details
-
Copy full SHA for 4de945c - Browse repository at this point
Copy the full SHA 4de945cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d239fb - Browse repository at this point
Copy the full SHA 6d239fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f523e2 - Browse repository at this point
Copy the full SHA 6f523e2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f8cf41 - Browse repository at this point
Copy the full SHA 6f8cf41View commit details -
Configuration menu - View commit details
-
Copy full SHA for 18ecce4 - Browse repository at this point
Copy the full SHA 18ecce4View commit details -
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)
* squashed readd my iq4_nl sgemm PR ggerganov/llama.cpp#8049 have ggml_vec_dot_q4_0 do two blocks per loop for avx try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per ggerganov/llama.cpp#8549 we can calculate several blocks at a time with no issue * shuffle * remove f16c iq4_nl as i cant make it faster than before
Configuration menu - View commit details
-
Copy full SHA for 4ae0501 - Browse repository at this point
Copy the full SHA 4ae0501View commit details -
cmake : do not hide GGML options + rename option (llama/9465)
* cmake : do not hide GGML options ggml-ci * build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS for consistency ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 0192921 - Browse repository at this point
Copy the full SHA 0192921View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1df27fc - Browse repository at this point
Copy the full SHA 1df27fcView commit details -
threadpool : skip polling for unused threads (llama/9461)
* threadpool: skip polling for unused threads Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1). This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur). n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written from one thread and read from other threads (not a race conditions). * threadpool: further simplify and improve ggml_barrier Avoid using strict memory order while polling, yet make sure that all threads go through full memory barrier (memory fence) on ggml_barrier entrace and exit. * threads: add simple barrier test This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead. * threadpool: improve thread sync for new-graphs Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order to keep it efficient, once the new graph is detected we do full fence using read-modify-write with strict memory order. * threadpool: improve abort handling Do not use threadpool->ec (exit code) to decide whether to exit the compute loop. threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it. Instead introduce atomic threadpool->abort flag used for this. This is consistent with how we handle threadpool->stop or pause. While at it add an explicit atomic_load for n_threads_cur for consistency. * test-barrier: release threadpool before releasing the context fixes use-after-free detected by gcc thread-sanitizer on x86-64 for some reason llvm sanitizer is not detecting this issue.
Configuration menu - View commit details
-
Copy full SHA for 2a5a49a - Browse repository at this point
Copy the full SHA 2a5a49aView commit details -
ggml : fix n_threads_cur initialization with one thread (llama/9538)
* ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4344c2d - Browse repository at this point
Copy the full SHA 4344c2dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 68ad0d0 - Browse repository at this point
Copy the full SHA 68ad0d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 23188a3 - Browse repository at this point
Copy the full SHA 23188a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for eea09cf - Browse repository at this point
Copy the full SHA eea09cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for cd7d18e - Browse repository at this point
Copy the full SHA cd7d18eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 242ae95 - Browse repository at this point
Copy the full SHA 242ae95View commit details -
Configuration menu - View commit details
-
Copy full SHA for a146842 - Browse repository at this point
Copy the full SHA a146842View commit details