-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync : llama.cpp #939
sync : llama.cpp #939
Commits on Aug 27, 2024
-
Optimize Vulkan backend for better CPU performance and less GPU synch…
…ronization overhead. (llama/8943) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d0f3a0e - Browse repository at this point
Copy the full SHA d0f3a0eView commit details -
ggml: fix div-by-zero (llama/9003)
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724 In order to access the above bug you need to login using one of the emails in https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5 Signed-off-by: David Korczynski <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5de81c9 - Browse repository at this point
Copy the full SHA 5de81c9View commit details -
ggml : move rope type enum to ggml.h (llama/8949)
* ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3210933 - Browse repository at this point
Copy the full SHA 3210933View commit details -
Configuration menu - View commit details
-
Copy full SHA for 84b060b - Browse repository at this point
Copy the full SHA 84b060bView commit details -
ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
* ggml : Dynamic ggml_sched_max_splits based on graph_size * Fixed and readded debug code for causes
Configuration menu - View commit details
-
Copy full SHA for 893beb2 - Browse repository at this point
Copy the full SHA 893beb2View commit details -
rpc : prevent crashes on invalid input (llama/9040)
Add more checks which prevent RPC server from crashing if invalid input is received from client
Configuration menu - View commit details
-
Copy full SHA for 9550007 - Browse repository at this point
Copy the full SHA 9550007View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad56a42 - Browse repository at this point
Copy the full SHA ad56a42View commit details -
Fix SYCL
im2col
andconvert
Overflow with Large Dims (llama/9052)* sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert overflow Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <[email protected]> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <[email protected]> * sycl:refine convert Signed-off-by: zhentaoyu <[email protected]> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <[email protected]> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <[email protected]> * test: make new cases only in sycl Signed-off-by: zhentaoyu <[email protected]> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e60fc00 - Browse repository at this point
Copy the full SHA e60fc00View commit details -
* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0474811 - Browse repository at this point
Copy the full SHA 0474811View commit details -
llama : simplify Mamba with advanced batch splits (llama/8526)
* llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <[email protected]> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0781ca2 - Browse repository at this point
Copy the full SHA 0781ca2View commit details -
Add oneDNN primitive support (llama/9091)
* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc
Configuration menu - View commit details
-
Copy full SHA for d2ddfd0 - Browse repository at this point
Copy the full SHA d2ddfd0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9300171 - Browse repository at this point
Copy the full SHA 9300171View commit details -
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
* CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check
Configuration menu - View commit details
-
Copy full SHA for e084b3d - Browse repository at this point
Copy the full SHA e084b3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for aca2c78 - Browse repository at this point
Copy the full SHA aca2c78View commit details -
ggml : add SSM Metal kernels (llama/8546)
* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci
Configuration menu - View commit details
-
Copy full SHA for cc1ad6c - Browse repository at this point
Copy the full SHA cc1ad6cView commit details -
metal : separate scale and mask from QKT in FA kernel (llama/9189)
* metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory
Configuration menu - View commit details
-
Copy full SHA for 4beb504 - Browse repository at this point
Copy the full SHA 4beb504View commit details -
Configuration menu - View commit details
-
Copy full SHA for 44bc33d - Browse repository at this point
Copy the full SHA 44bc33dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a23fc97 - Browse repository at this point
Copy the full SHA a23fc97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 234d153 - Browse repository at this point
Copy the full SHA 234d153View commit details -
Configuration menu - View commit details
-
Copy full SHA for b849c25 - Browse repository at this point
Copy the full SHA b849c25View commit details