forked from mudler/LocalAI
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from mudler:master #74
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* auto select cpu variant Signed-off-by: Sertac Ozercan <[email protected]> * remove cuda target for now Signed-off-by: Sertac Ozercan <[email protected]> * fix metal Signed-off-by: Sertac Ozercan <[email protected]> * fix path Signed-off-by: Sertac Ozercan <[email protected]> --------- Signed-off-by: Sertac Ozercan <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
feat(llama.cpp): add flash_attn and no_kv_offload Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: mudler <[email protected]>
* auto select cpu variant Signed-off-by: Sertac Ozercan <[email protected]> * remove cuda target for now Signed-off-by: Sertac Ozercan <[email protected]> * fix metal Signed-off-by: Sertac Ozercan <[email protected]> * fix path Signed-off-by: Sertac Ozercan <[email protected]> * cuda Signed-off-by: Sertac Ozercan <[email protected]> * auto select cuda Signed-off-by: Sertac Ozercan <[email protected]> * update test Signed-off-by: Sertac Ozercan <[email protected]> * select CUDA backend only if present Signed-off-by: mudler <[email protected]> * ci: keep cuda bin in path Signed-off-by: mudler <[email protected]> * Makefile: make dist now builds also cuda Signed-off-by: mudler <[email protected]> * Keep pushing fallback in case auto-flagset/nvidia fails There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU, however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start. We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong Signed-off-by: mudler <[email protected]> * Do not build cuda on MacOS Signed-off-by: mudler <[email protected]> * cleanup Signed-off-by: Sertac Ozercan <[email protected]> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Sertac Ozercan <[email protected]> Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: mudler <[email protected]> Co-authored-by: Ettore Di Giacinto <[email protected]> Co-authored-by: mudler <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
* feat(llama.cpp): support distributed llama.cpp Signed-off-by: Ettore Di Giacinto <[email protected]> * feat: let tweak how chat messages are merged together Signed-off-by: Ettore Di Giacinto <[email protected]> * refactor Signed-off-by: Ettore Di Giacinto <[email protected]> * Makefile: register to ALL_GRPC_BACKENDS Signed-off-by: Ettore Di Giacinto <[email protected]> * refactoring, allow disable auto-detection of backends Signed-off-by: Ettore Di Giacinto <[email protected]> * minor fixups Signed-off-by: mudler <[email protected]> * feat: add cmd to start rpc-server from llama.cpp Signed-off-by: mudler <[email protected]> * ci: add ccache Signed-off-by: mudler <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: mudler <[email protected]>
feat(functions): support mixed JSON BNF grammar This PR provides new options to control how functions are extracted from the LLM, and also provides more control on how JSON grammars can be used (also in conjunction). New YAML settings introduced: - `grammar_message`: when enabled, the generated grammar can also decide to push strings and not only JSON objects. This allows the LLM to pick to either respond freely or using JSON. - `grammar_prefix`: Allows to prefix a string to the JSON grammar definition. - `replace_results`: Is a map that allows to replace strings in the LLM result. As an example, consider the following settings for Hermes-2-Pro-Mistral, which allow extracting both JSON results coming from the model, and the ones coming from the grammar: ```yaml function: # disable injecting the "answer" tool disable_no_action: true # This allows the grammar to also return messages grammar_message: true # Suffix to add to the grammar grammar_prefix: '<tool_call>\n' return_name_in_function_response: true # Without grammar uncomment the lines below # Warning: this is relying only on the capability of the # LLM model to generate the correct function call. # no_grammar: true # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>" replace_results: "<tool_call>": "" "\'": "\"" ``` Note: To disable entirely grammars usage in the example above, uncomment the `no_grammar` and `json_regex_match`. Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
Correct llama3-8b-instruct model file This must be a mistake because the config tries to use a model file that is different from the one actually being downloaded. I assumed the downloaded file is what should be used so I corrected the specified model file to that Signed-off-by: Aleksandr Oleinikov <[email protected]>
Signed-off-by: mudler <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )