Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : llama.cpp #965

Merged
merged 31 commits into from
Sep 20, 2024
Merged

sync : llama.cpp #965

merged 31 commits into from
Sep 20, 2024

Commits on Sep 20, 2024

  1. Configuration menu
    Copy the full SHA
    6f8c166 View commit details
    Browse the repository at this point in the history
  2. add check malloc result on device (llama/9346)

    * add check malloc result on device
    
    * update for review comments, check all malloc_device() result
    
    ---------
    
    Co-authored-by: arthw <[email protected]>
    2 people authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    f3068aa View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    58ccd65 View commit details
    Browse the repository at this point in the history
  4. Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …

    …by submitting smaller cmdbuffers early. (llama/9118)
    
    * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.
    
    * fix compile issues
    
    * Fix issues where the last submit wasn't executed or handled properly.
    
    * remove trailing whitespace
    
    * Repair GGML_VULKAN_CHECK_RESULTS
    
    * Increase submit counter only if actual work has been submitted and increase submit count to 100.
    
    * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
    mtavenrath authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    cb3abaa View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ab7c211 View commit details
    Browse the repository at this point in the history
  6. ggml : vector length agnostic SVE support (llama/9290)

    * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
    
    * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
    
    * Removed WhiteSpaces
    
    * ggml : style changes + fix 512-bit nb loop check
    
    - fix local scope in switch cases
    - consistent predicate names
    - empty lines when necessary
    - opening braces, spaces
    - const-correctness
    - add asserts
    
    * Update ggml/src/ggml-quants.c
    
    Co-authored-by: Georgi Gerganov <[email protected]>
    
    ---------
    
    Co-authored-by: Georgi Gerganov <[email protected]>
    Vithulep and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    82088b4 View commit details
    Browse the repository at this point in the history
  7. rpc : fix segfault with nkvo (llama/9389)

    * rpc : fix nkvo
    
    * rpc : buf_size must not be static
    
    ref: #9337
    
    ---------
    
    Co-authored-by: slaren <[email protected]>
    2 people authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    9f9246d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    bf778d0 View commit details
    Browse the repository at this point in the history
  9. sycl : update support conditions (llama/9394)

    * sycl : update support condition to im2col
    
    Signed-off-by: Alberto Cabrera <[email protected]>
    
    * Added TODO to remind supporting FP32 im2col
    
    ---------
    
    Signed-off-by: Alberto Cabrera <[email protected]>
    Alcpz authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    98d9ff3 View commit details
    Browse the repository at this point in the history
  10. musa: remove Clang builtins mapping (llama/9421)

    Signed-off-by: Xiaodong Ye <[email protected]>
    yeahdongcn authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    6c6800c View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    ed50f6e View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    f24368f View commit details
    Browse the repository at this point in the history
  13. riscv : modify Makefile and add a RISCV_VECT to print log info (llama…

    …/9442)
    
    - Added ggml_cpu_has_riscv_v() in GGML to print system info in log
    - Modified Makefile to only use flag when cross compiling for RISC-V
    Tameem-10xE authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    4fcc15a View commit details
    Browse the repository at this point in the history
  14. cann: Add host buffer type for Ascend NPU (llama/9406)

    * feat: Add host buffer type for Ascend NPU(CANN backend)
    
    * fix some checking errors
    
    * Add a few comments
    Dou-Git authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    bab17a2 View commit details
    Browse the repository at this point in the history
  15. cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9…

    …463)
    
    * cmake : use list(APPEND ...) instead of set() + dedup linker
    
    ggml-ci
    
    * cmake : try fix sycl
    
    * cmake : try to fix sycl 2
    
    * cmake : fix sycl build (llama/9469)
    
    * try fix sycl build
    
    * use CMAKE_CXX_FLAGS as a string variable
    
    ---------
    
    Co-authored-by: Georgi Gerganov <[email protected]>
    
    * one more CMAKE_CXX_FLAGS fix (llama/9471)
    
    ---------
    
    Co-authored-by: Michael Podvitskiy <[email protected]>
    ggerganov and Xarbirus committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    9a00cfc View commit details
    Browse the repository at this point in the history
  16. ggml : ggml_type_name return "NONE" for invalid values (llama/9458)

    When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
    ykhrustalev authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    4de945c View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    6d239fb View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    6f523e2 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    6f8cf41 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    18ecce4 View commit details
    Browse the repository at this point in the history
  21. ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)

    * squashed
    
    readd my iq4_nl sgemm PR ggerganov/llama.cpp#8049
    
    have ggml_vec_dot_q4_0 do two blocks per loop for avx
    
    try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per ggerganov/llama.cpp#8549 we can calculate several blocks at a time with no issue
    
    * shuffle
    
    * remove f16c iq4_nl as i cant make it faster than before
    netrunnereve authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    4ae0501 View commit details
    Browse the repository at this point in the history
  22. cmake : do not hide GGML options + rename option (llama/9465)

    * cmake : do not hide GGML options
    
    ggml-ci
    
    * build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS
    
    for consistency
    
    ggml-ci
    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    0192921 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    1df27fc View commit details
    Browse the repository at this point in the history
  24. threadpool : skip polling for unused threads (llama/9461)

    * threadpool: skip polling for unused threads
    
    Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1).
    This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur).
    
    n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written
    from one thread and read from other threads (not a race conditions).
    
    * threadpool: further simplify and improve ggml_barrier
    
    Avoid using strict memory order while polling, yet make sure that all threads go through
    full memory barrier (memory fence) on ggml_barrier entrace and exit.
    
    * threads: add simple barrier test
    
    This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead.
    
    * threadpool: improve thread sync for new-graphs
    
    Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order
    to keep it efficient, once the new graph is detected we do full fence using read-modify-write
    with strict memory order.
    
    * threadpool: improve abort handling
    
    Do not use threadpool->ec (exit code) to decide whether to exit the compute loop.
    threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it.
    
    Instead introduce atomic threadpool->abort flag used for this. This is consistent with
    how we handle threadpool->stop or pause.
    
    While at it add an explicit atomic_load for n_threads_cur for consistency.
    
    * test-barrier: release threadpool before releasing the context
    
    fixes use-after-free detected by gcc thread-sanitizer on x86-64
    for some reason llvm sanitizer is not detecting this issue.
    max-krasnyansky authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    2a5a49a View commit details
    Browse the repository at this point in the history
  25. ggml : fix n_threads_cur initialization with one thread (llama/9538)

    * ggml : fix n_threads_cur initialization with one thread
    
    * Update ggml/src/ggml.c
    
    ---------
    
    Co-authored-by: Max Krasnyansky <[email protected]>
    2 people authored and ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    4344c2d View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    68ad0d0 View commit details
    Browse the repository at this point in the history
  27. ggml : fix trailing whitespace (llama/0)

    ggml-ci
    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    23188a3 View commit details
    Browse the repository at this point in the history
  28. ggml : fix builds (llama/0)

    ggml-ci
    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    eea09cf View commit details
    Browse the repository at this point in the history
  29. ggml : refactoring (llama/#0)

    - d6a04f87
    - 23e0d70b
    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    cd7d18e View commit details
    Browse the repository at this point in the history
  30. sync : llama.cpp

    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    242ae95 View commit details
    Browse the repository at this point in the history
  31. examples : adapt to ggml.h changes (#0)

    ggml-ci
    ggerganov committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    a146842 View commit details
    Browse the repository at this point in the history