[pull] master from mudler:master #74

pull · 2024-05-13T14:34:22Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* auto select cpu variant Signed-off-by: Sertac Ozercan <[email protected]> * remove cuda target for now Signed-off-by: Sertac Ozercan <[email protected]> * fix metal Signed-off-by: Sertac Ozercan <[email protected]> * fix path Signed-off-by: Sertac Ozercan <[email protected]> --------- Signed-off-by: Sertac Ozercan <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

feat(llama.cpp): add flash_attn and no_kv_offload Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: mudler <[email protected]>

* auto select cpu variant Signed-off-by: Sertac Ozercan <[email protected]> * remove cuda target for now Signed-off-by: Sertac Ozercan <[email protected]> * fix metal Signed-off-by: Sertac Ozercan <[email protected]> * fix path Signed-off-by: Sertac Ozercan <[email protected]> * cuda Signed-off-by: Sertac Ozercan <[email protected]> * auto select cuda Signed-off-by: Sertac Ozercan <[email protected]> * update test Signed-off-by: Sertac Ozercan <[email protected]> * select CUDA backend only if present Signed-off-by: mudler <[email protected]> * ci: keep cuda bin in path Signed-off-by: mudler <[email protected]> * Makefile: make dist now builds also cuda Signed-off-by: mudler <[email protected]> * Keep pushing fallback in case auto-flagset/nvidia fails There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU, however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start. We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong Signed-off-by: mudler <[email protected]> * Do not build cuda on MacOS Signed-off-by: mudler <[email protected]> * cleanup Signed-off-by: Sertac Ozercan <[email protected]> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Sertac Ozercan <[email protected]> Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: mudler <[email protected]> Co-authored-by: Ettore Di Giacinto <[email protected]> Co-authored-by: mudler <[email protected]>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(llama.cpp): support distributed llama.cpp Signed-off-by: Ettore Di Giacinto <[email protected]> * feat: let tweak how chat messages are merged together Signed-off-by: Ettore Di Giacinto <[email protected]> * refactor Signed-off-by: Ettore Di Giacinto <[email protected]> * Makefile: register to ALL_GRPC_BACKENDS Signed-off-by: Ettore Di Giacinto <[email protected]> * refactoring, allow disable auto-detection of backends Signed-off-by: Ettore Di Giacinto <[email protected]> * minor fixups Signed-off-by: mudler <[email protected]> * feat: add cmd to start rpc-server from llama.cpp Signed-off-by: mudler <[email protected]> * ci: add ccache Signed-off-by: mudler <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: mudler <[email protected]>

feat(functions): support mixed JSON BNF grammar This PR provides new options to control how functions are extracted from the LLM, and also provides more control on how JSON grammars can be used (also in conjunction). New YAML settings introduced: - `grammar_message`: when enabled, the generated grammar can also decide to push strings and not only JSON objects. This allows the LLM to pick to either respond freely or using JSON. - `grammar_prefix`: Allows to prefix a string to the JSON grammar definition. - `replace_results`: Is a map that allows to replace strings in the LLM result. As an example, consider the following settings for Hermes-2-Pro-Mistral, which allow extracting both JSON results coming from the model, and the ones coming from the grammar: ```yaml function: # disable injecting the "answer" tool disable_no_action: true # This allows the grammar to also return messages grammar_message: true # Suffix to add to the grammar grammar_prefix: '<tool_call>\n' return_name_in_function_response: true # Without grammar uncomment the lines below # Warning: this is relying only on the capability of the # LLM model to generate the correct function call. # no_grammar: true # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>" replace_results: "<tool_call>": "" "\'": "\"" ``` Note: To disable entirely grammars usage in the example above, uncomment the `no_grammar` and `json_regex_match`. Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

Correct llama3-8b-instruct model file This must be a mistake because the config tries to use a model file that is different from the one actually being downloaded. I assumed the downloaded file is what should be used so I corrected the specified model file to that Signed-off-by: Aleksandr Oleinikov <[email protected]>

Signed-off-by: mudler <[email protected]>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

pull bot added the ⤵️ pull label May 13, 2024

models(gallery): add aura-llama-Abliterated (#2309)

4d70b6f

Signed-off-by: Ettore Di Giacinto <[email protected]>

github-actions bot added the area/ai-model label May 13, 2024

mudler and others added 10 commits May 13, 2024 18:44

models(gallery): add Bunny-llama (#2311)

fa7b2ae

Signed-off-by: Ettore Di Giacinto <[email protected]>

models(gallery): add lumimaidv2 (#2312)

2db2208

Signed-off-by: Ettore Di Giacinto <[email protected]>

models(gallery): add orthocopter (#2313)

7123d07

Signed-off-by: Ettore Di Giacinto <[email protected]>

feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)

e49ea01

feat(llama.cpp): add flash_attn and no_kv_offload Signed-off-by: Ettore Di Giacinto <[email protected]>

⬆️ Update ggerganov/whisper.cpp (#2317)

4ac7956

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

feat(functions): support models with no grammar and no regex (#2315)

c4186f1

Signed-off-by: Ettore Di Giacinto <[email protected]>

feat(functions): allow to set JSON matcher (#2319)

84e2407

Signed-off-by: mudler <[email protected]>

⬆️ Update ggerganov/whisper.cpp (#2326)

566b5cf

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

Update README.md

2990966

Signed-off-by: Ettore Di Giacinto <[email protected]>

github-actions bot added the kind/documentation label May 14, 2024

mudler and others added 8 commits May 15, 2024 01:17

Update README.md

07c0559

Signed-off-by: Ettore Di Giacinto <[email protected]>

Update README.md

4c845fb

Signed-off-by: Ettore Di Giacinto <[email protected]>

⬆️ Update ggerganov/llama.cpp (#2316)

b584dcf

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

models(gallery): add hermes-2-theta-llama-3-8b (#2331)

f7508e3

Signed-off-by: mudler <[email protected]>

⬆️ Update ggerganov/whisper.cpp (#2329)

4e92569

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

pull bot merged commit 4e92569 into kp-forks:master May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from mudler:master #74

[pull] master from mudler:master #74

pull bot commented May 13, 2024 •

edited

Loading

[pull] master from mudler:master #74

[pull] master from mudler:master #74

Conversation

pull bot commented May 13, 2024 • edited Loading

pull bot commented May 13, 2024 •

edited

Loading