Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from mudler:master #74

Merged
merged 20 commits into from
May 16, 2024
Merged

[pull] master from mudler:master #74

merged 20 commits into from
May 16, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented May 13, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

* auto select cpu variant

Signed-off-by: Sertac Ozercan <[email protected]>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <[email protected]>

* fix metal

Signed-off-by: Sertac Ozercan <[email protected]>

* fix path

Signed-off-by: Sertac Ozercan <[email protected]>

---------

Signed-off-by: Sertac Ozercan <[email protected]>
@pull pull bot added the ⤵️ pull label May 13, 2024
mudler and others added 10 commits May 13, 2024 18:44
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <[email protected]>
* auto select cpu variant

Signed-off-by: Sertac Ozercan <[email protected]>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <[email protected]>

* fix metal

Signed-off-by: Sertac Ozercan <[email protected]>

* fix path

Signed-off-by: Sertac Ozercan <[email protected]>

* cuda

Signed-off-by: Sertac Ozercan <[email protected]>

* auto select cuda

Signed-off-by: Sertac Ozercan <[email protected]>

* update test

Signed-off-by: Sertac Ozercan <[email protected]>

* select CUDA backend only if present

Signed-off-by: mudler <[email protected]>

* ci: keep cuda bin in path

Signed-off-by: mudler <[email protected]>

* Makefile: make dist now builds also cuda

Signed-off-by: mudler <[email protected]>

* Keep pushing fallback in case auto-flagset/nvidia fails

There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU,
however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start.

We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong

Signed-off-by: mudler <[email protected]>

* Do not build cuda on MacOS

Signed-off-by: mudler <[email protected]>

* cleanup

Signed-off-by: Sertac Ozercan <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <[email protected]>

---------

Signed-off-by: Sertac Ozercan <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: mudler <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: mudler <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
mudler and others added 8 commits May 15, 2024 01:17
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <[email protected]>

* refactor

Signed-off-by: Ettore Di Giacinto <[email protected]>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <[email protected]>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <[email protected]>

* minor fixups

Signed-off-by: mudler <[email protected]>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <[email protected]>

* ci: add ccache

Signed-off-by: mudler <[email protected]>

---------

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: mudler <[email protected]>
feat(functions): support mixed JSON BNF grammar

This PR provides new options to control how functions are extracted from
the LLM, and also provides more control on how JSON grammars can be used
(also in conjunction).

New YAML settings introduced:

- `grammar_message`: when enabled, the generated grammar can also decide
  to push strings and not only JSON objects. This allows the LLM to pick
to either respond freely or using JSON.
- `grammar_prefix`: Allows to prefix a string to the JSON grammar
  definition.
- `replace_results`: Is a map that allows to replace strings in the LLM
  result.

As an example, consider the following settings for Hermes-2-Pro-Mistral,
which allow extracting both JSON results coming from the model, and the
ones coming from the grammar:

```yaml
function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
```

Note: To disable entirely grammars usage in the example above, uncomment the
`no_grammar` and `json_regex_match`.

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <[email protected]>
Correct llama3-8b-instruct model file

This must be a mistake because the config tries to use a model file that is different from the one actually being downloaded.
I assumed the downloaded file is what should be used so I corrected the specified model file to that

Signed-off-by: Aleksandr Oleinikov <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <[email protected]>
@pull pull bot merged commit 4e92569 into kp-forks:master May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants