[Example] Use new ggml backend with llama options support #52

dm4 · 2023-10-19T15:02:54Z

Should be merged after

juntao · 2023-10-19T15:02:57Z

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.

Overall, the pull request titled "[Example] Use new ggml backend with llama options support" contains a collection of code changes across multiple files. The important findings and potential issues are as follows:

Potential problems:
- The options string is hardcoded, which may limit flexibility. It is recommended to pass it as a command-line argument or read it from a configuration file.
- There is no validation or error handling for parsing the options JSON string, which could cause runtime errors. Consider adding error handling or using a JSON parsing library.
- The purpose of the saved_prompt variable is unclear and it is not being used. It could potentially be removed to improve clarity.
Potential problems:
- It's unclear why the branch of the wasi-nn dependency is being changed. More context is needed to determine the necessity and potential issues.
- The commit message lacks information about the reasoning behind the change, making it difficult to understand the motivation for the patch.
Potential problems:
- It is not clear why the serde_json crate is added as a dependency. The purpose of this change should be explained.
- The update to use the json! macro for the options variable is not well-justified. The necessity and potential risks of this change are unclear.
No potential problems identified.
Potential problems:
- There may be a typo in the command in the README.md file.
- The usage of environmental variables is mentioned, but it's not clear how they are handled and what implications or limitations there may be.
- It is not explained why caution is needed with the - and _ characters in the variable name.
- The expected values for the environmental variables and their impact on the command execution are not clear.
- The purpose and behavior of several environmental variables are not explained.
- The requirement of the wasmedge-ggml-llama-interactive.wasm file may be ambiguous.
No potential problems identified.
Potential problems:
- The patch lacks context, making it difficult to understand the motivation for the change.
- Further details or testing steps would be helpful to verify the fix.
Potential problems:
- The patch does not generate an AOT wasm binary anymore, which may cause compatibility issues.
- The purpose and function of the new command introduced are not clear.
Potential problems:
- There is a mismatch in types for some environment variables, which may lead to parsing errors.
- Suggestions to update default values to match the README and implementation are provided.
No potential problems identified.
Potential problems:

Lack of explanation for the macOS Metal support recommendation in the README file.
Uncertainty about missing changes or if the patch is comprehensive.

Considering all the individual summaries, the pull request includes some potential issues and errors, such as hardcoded options, lack of error handling, unclear changes, missing information, and compatibility concerns. Further clarification, documentation, and addressing the mentioned problems are recommended before merging the pull request.

Details

Commit e60315b1dde62f7c7a5c9d43d9ba6dfc2a960ed8

Key changes:

Added the ability to set options for input with index 1 using the context.set_input function.
Updated the context.set_input function calls to use the new method signature.
Added a prompt string for input with index 1, containing options in JSON format.
Updated the max_output_size calculation to use proper spacing.

Potential problems:

The options string is hardcoded in the code. It might be better to pass it as a command-line argument or read it from a configuration file for flexibility.
There is no validation or error handling for parsing the options JSON string. If the JSON is malformed, it may cause runtime errors. Consider adding error handling or using a JSON parsing library to handle invalid input gracefully.
It's unclear what the purpose of saved_prompt variable is and why it's not being used in the code. It could potentially be removed to improve clarity.

Commit 450a92c1a10222d944bf4ab874d9ff1ba4be91b7

Key changes:

The branch of the wasi-nn dependency in both wasmedge-ggml-llama-interactive and wasmedge-ggml-llama crates has been changed from "dm4/ggml" to "ggml".

Potential problems:

It's unclear why the branch of the wasi-nn dependency is being changed in this patch. Without more context, it's difficult to determine if this change is necessary or if it will introduce any issues.
It's also worth noting that the commit message does not provide much information about the reasoning behind this change, making it harder to understand the motivation for the patch. Adding more context to the commit message would be helpful for reviewers and future reference.

Overall, more information and context about the changes would be beneficial to fully assess the impact and potential problems of this patch.

Commit ad4af727f4a1cc808a672ddfad1ea020c5571c48

Key Changes:

Added the serde_json dependency in the Cargo.toml file.
Imported the serde_json crate in the main.rs file.
Updated the options variable to use serde_json's json! macro to construct a JSON object.

Potential problems:

It is not clear why the serde_json crate is being added as a dependency. The purpose of this change should be explained in the pull request or in a separate documentation.
The code updates the options variable to use the json! macro, but it is not clear if this change is necessary or if it introduces any risks.

Commit 274d494bb669a871e97de64741c763b6f281400f

Key changes:

Updated the command in the README.md file to use the new ggml backend with the llama options support
Added support for setting llama options using set_input with index 1
Modified the list of supported parameters and their descriptions

Potential problems:

None identified

Commit b839ddedcba0a149c0864f06e1e98fc0cb9ffdbb

Key changes:

Added documentation to the README.md file for setting environmental variables.
Provided an example of setting environmental variables for the wasmedge command.

Potential problems:

There may be a typo in the line wasmedge --dir .:. \. It should be verified if the intended path is .:..
The usage of environmental variables is mentioned as being handled by Rust, but it is not clear how exactly they are handled and what implications or limitations there may be.
It is mentioned to beware of the - and _ characters in the variable name, but it is not explained why and what issues may arise from using those characters.
It is not clear what values should be assigned to the environmental variables and what impact they have on the execution of the command.
The purpose and expected behavior of the stream_stdout, enable_log, ctx_size, n_predict, and n_gpu_layers environmental variables are not explained.
It is not clear if the wasmedge-ggml-llama-interactive.wasm file needs to be present in the specified directory (--dir .:.) before running the command.

Commit 3c14961cc148a07d643d5f85160115e939ef41fa

Key changes in this patch:

An additional print statement is added before executing the inference.
The stdout output is now conditionally printed based on the value of the stream_stdout variable.

Potential problems:

There don't seem to be any obvious problems with this patch. However, without more context or the entire pull request, it's difficult to provide a comprehensive review.

Commit e6b9477dc186e560a482b9ddb75d504541986673

The key change in this patch is the fix to the llama interactive workflow in the .github/workflows/llama.yml file. Specifically, the name of the compiled wasm file is updated from wasmedge-ggml-llama.wasm to wasmedge-ggml-llama-interactive.wasm.

Overall, this patch should resolve an issue with the llama interactive workflow by correctly compiling and using the updated wasm file.

One potential problem with this patch is that it does not provide any additional context or explanation for why this change is necessary. It would be helpful to include a brief description or reference to the specific issue or bug that this patch is addressing.

Additionally, it would be beneficial to include more detailed testing steps or information to verify that this fix is working as expected.

Overall, the changes in this patch appear to be straightforward and should effectively fix the llama interactive workflow.

Commit 559f195de562ecac884ac9b6dce37430d3f8a167

Key changes:

The patch updates the workflow file .github/workflows/llama.yml.
It replaces the command to compile the Rust code into an AOT (Ahead-of-Time) WebAssembly (wasm) binary with the original wasm binary.

Potential problems:

The patch removes the line wasmedge compile target/wasm32-wasi/release/wasmedge-ggml-llama-interactive.wasm wasmedge-ggml-llama-interactive.wasm, which means the AOT wasm binary is not being generated anymore. This could cause compatibility issues if the project relied on the AOT binary for some reason.
The patch introduces a new command wasmedge target/wasm32-wasi/release/wasmedge-ggml-llama-interactive.wasm default. However, it is unclear what the purpose of this command is and how it relates to the previous commands. Further documentation or context would be helpful in understanding its function.

Commit fac6415f63803a11e0f6430fb3b4a7471ba8f449

Key Changes:

Added default parameter values to the README file.
Synchronized the README file with the implementation of the WasmEdge WASI-NN plugin.
Updated the default values for the parameters in the main.rs file.

Potential Problems:

The stream_stdout and enable_log variables are parsed as booleans, but the default values are supplied as strings. This may cause parsing errors if the environment variable is not set or if it is set to an invalid value.
The ctx_size and n_predict variables are parsed as i32 integers, but the default values are supplied as strings. This may cause parsing errors if the environment variable is not set or if it is set to an invalid value.

Suggestions:

Update the default values for stream_stdout and enable_log to false to match the README.
Update the default values for ctx_size and n_predict to 512 to match the README and the implementation in llama.cpp.

Overall, the changes seem to be focused on adding default values to the README and synchronizing it with the implementation. There are some potential problems in parsing the environment variables, which should be addressed for proper functionality.

Commit bf8317150ad77639c90b99349543ed35fdb2c4ba

Key changes:

The README file for the wasmedge-ggml-llama-interactive project has been updated.
The URL for the llama.cpp backend has been corrected.
A reference to the version used has been added.

Potential problems:

No potential problems have been identified in this patch.

Commit 27073e8271a49ba1906cc91d1e6c9401d3cdf1b0

Key changes:

Updated the README file for the n-gpu-layers parameter in the wasmedge-ggml-llama-interactive directory.

Potential problems:

The updated README file includes a specific note about using Metal support in macOS and suggests setting n-gpu-layers to 0 or not setting it for the default value. However, it does not provide any explanation for this recommendation. It would be helpful to include more information on why this is necessary or what potential issues may arise if this parameter is not set correctly in macOS with Metal support.
It is unclear if this patch is comprehensive or if there are other changes made in addition to updating the README file. The patch only includes changes related to the wasmedge-ggml-llama-interactive directory, but it is not clear if there were other changes made elsewhere that are missing from this patch. It would be beneficial to review the complete set of changes to ensure nothing is missing.

Overall, the changes seem relatively minor and focused on updating documentation. However, the potential problem with the lack of explanation for the macOS Metal support recommendation and the uncertainty about missing changes should be addressed before merging this pull request.

wasmedge-ggml-llama-interactive/src/main.rs

dm4 · 2023-10-20T14:46:53Z

The upcoming commits are expected to include:

Updating the explanation of Parameters in the README.md file.
Changing the branch of the second-state/wasmedge-wasi-nn crate from dm4/ggml to ggml.
Utilizing more supported parameters in wasmedge-ggml-llama-interactive for demonstration.

juntao · 2023-10-20T15:06:35Z

The index 0 is always the initial prompt and index 1 is always the options, right?

The initial prompt is used to load the model. When we are in a chat session, subsequent calls to the compute will not have the initial prompt? Thanks.

dm4 · 2023-10-20T15:11:27Z

The index 0 is always the initial prompt and index 1 is always the options, right?

The initial prompt is used to load the model. When we are in a chat session, subsequent calls to the compute will not have the initial prompt? Thanks.

Yes, index 1 always represents the options, while index 0 is used for prompts. During a chat session, we always update the prompt by setting it to index 0 in order to replace the old prompt. Then, we call compute again.

…e to ggml Signed-off-by: dm4 <[email protected]>

Signed-off-by: dm4 <[email protected]>

juntao · 2023-10-20T23:38:16Z

Perhaps we should make the "initial prompt" a field in the options JSON object?

Also, during a conversation, I believe the initial prompt should only be called once at the beginning to load the model -- the same behavior of the current llama interactive example.

hydai · 2023-10-23T02:53:21Z

Perhaps we should make the "initial prompt" a field in the options JSON object?

Also, during a conversation, I believe the initial prompt should only be called once at the beginning to load the model -- the same behavior of the current llama interactive example.

This seems wrong. The JSON is for setting the context for llama.cpp. However, this init prompt is for the llama2 chat model only, although there are many different types of llama2. Let users decide when they want to call the init prompt or not. The reason is that this init prompt is for the UX in this example only rather than for a real use case.

hydai · 2023-10-23T05:48:06Z

We should merge this after WasmEdge/WasmEdge#2952 gets merged.

juntao · 2023-10-23T06:49:41Z

Even so -- I think the options should be at index 0 and initial prompt should be at index 1 as the latter is probably much less used.

dm4 · 2023-10-23T07:59:32Z

I just want to confirm that the current initial_prompt is only used to load the model before the first compute() to reduce the time spent on the first compute()?

If that's the case, I think we can modify it to handle the first model loading in graph.init_execution_context() to reduce the time spent on the first 'compute()'. We don't need to call compute() in advance with initial_prompt in the Rust end.

juntao · 2023-10-23T08:11:30Z

Good idea

I just want to confirm that the current initial_prompt is only used to load the model before the first compute() to reduce the time spent on the first compute()?

If that's the case, I think we can modify it to handle the first model loading in graph.init_execution_context() to reduce the time spent on the first 'compute()'. We don't need to call compute() in advance with initial_prompt in the Rust end.

Signed-off-by: dm4 <[email protected]>

dm4 · 2023-10-31T08:26:09Z

I just tested the response time of the ggml plugin. Previously, we didn't support loading the model file directly, so we had to write a ggml-model.bin file and use the initial prompt to compute() once. Now, the Rust side should be able to skip the initial prompt and directly ask the questions.

logs:

// with init prompt compute
Time elapsed in build_from_cache() is: 182.881855ms
Time elapsed in init_execution_context() is: 2.064µs
// init prompt time
Time elapsed in prepare() is: 11.049936722s
// first question
Time elapsed in set_input() is: 4.500595ms
Time elapsed in compute() is: 14.011402635s
// second question
Time elapsed in set_input() is: 1.594805ms
Time elapsed in compute() is: 15.538733204s
---
// without init prompt compute
Time elapsed in build_from_cache() is: 183.347165ms
Time elapsed in init_execution_context() is: 2.339µs
// first question
Time elapsed in set_input() is: 3.862416ms
Time elapsed in compute() is: 13.512181653s
// second question
Time elapsed in set_input() is: 4.851177ms
Time elapsed in compute() is: 15.273717378s

…vars way Signed-off-by: hydai <[email protected]>

Signed-off-by: dm4 <[email protected]>

Set env variable `is_interactive` to `true` to switch into non-interactive mode. Signed-off-by: dm4 <[email protected]>

wasmedge-ggml-llama-interactive/src/main.rs

hydai

remove aot.wasm

.github/workflows/llama.yml

wasmedge-ggml-llama-interactive/aot.wasm

wasmedge-ggml-llama-interactive/README.md

- If only the preload model name is given, then it defaults to interactive mode. - If another argument is provided after the preload model name, then it will be considered as a prompt, and it will enter non-interactive mode (just like the previous wasmedge-ggml-llama example)." Signed-off-by: dm4 <[email protected]>

Signed-off-by: dm4 <[email protected]>

…ize it with the implementation of the WasmEdge WASI-NN plugin Signed-off-by: dm4 <[email protected]>

wasmedge-ggml-llama-interactive/README.md

Signed-off-by: dm4 <[email protected]>

hydai · 2023-11-03T06:26:49Z

Please merge this after the 0.13.5 is out.

[Example] Use new ggml backend with llama options support

e60315b

Signed-off-by: dm4 <[email protected]>

dm4 mentioned this pull request Oct 20, 2023

[WASI-NN] Add metadata support WasmEdge/WasmEdge#2957

Merged

hydai requested changes Oct 20, 2023

View reviewed changes

wasmedge-ggml-llama-interactive/src/main.rs Outdated Show resolved Hide resolved

dm4 added 3 commits October 20, 2023 23:15

[Example] Change the branch of the second-state/wasmedge-wasi-nn crat…

450a92c

…e to ggml Signed-off-by: dm4 <[email protected]>

[Example] Use serde_json for llama options

ad4af72

Signed-off-by: dm4 <[email protected]>

[Example] Update wasm binary

46e25f3

Signed-off-by: dm4 <[email protected]>

dm4 marked this pull request as ready for review October 22, 2023 16:54

dm4 requested a review from hydai October 23, 2023 05:40

[Example] Update README.md of llama interactive

274d494

Signed-off-by: dm4 <[email protected]>

dm4 force-pushed the dm4/support-llama-metadata branch from e2baeb6 to 274d494 Compare October 31, 2023 08:22

[Example] Use metadata in the llama2-chat example instead of the env …

75c8b1a

…vars way Signed-off-by: hydai <[email protected]>

hydai mentioned this pull request Nov 2, 2023

[Example] Add llama streaming example #49

Closed

dm4 added 4 commits November 2, 2023 12:12

[Example] Remove basic ggml-llama example

e794afe

Signed-off-by: dm4 <[email protected]>

[Example] Update README.md of llama interactive

b839dde

Signed-off-by: dm4 <[email protected]>

[Example] Handle streaming output

3c14961

Signed-off-by: dm4 <[email protected]>

[Example] Merge wasmedge-ggml-llama into interactive example

c55bb63

Set env variable `is_interactive` to `true` to switch into non-interactive mode. Signed-off-by: dm4 <[email protected]>

hydai requested changes Nov 2, 2023

View reviewed changes

wasmedge-ggml-llama-interactive/src/main.rs Outdated Show resolved Hide resolved

dm4 force-pushed the dm4/support-llama-metadata branch from 74f6c75 to be2c191 Compare November 2, 2023 08:02

dm4 requested a review from hydai November 2, 2023 08:06

hydai requested changes Nov 2, 2023

View reviewed changes

.github/workflows/llama.yml Outdated Show resolved Hide resolved

hydai requested changes Nov 2, 2023

View reviewed changes

wasmedge-ggml-llama-interactive/aot.wasm Outdated Show resolved Hide resolved

apepkuss reviewed Nov 2, 2023

View reviewed changes

wasmedge-ggml-llama-interactive/README.md Outdated Show resolved Hide resolved

dm4 added 3 commits November 2, 2023 23:21

[CI] Fix llama interactive workflow

e6b9477

Signed-off-by: dm4 <[email protected]>

[Example] Use original wasm instead of AOT wasm

559f195

Signed-off-by: dm4 <[email protected]>

dm4 force-pushed the dm4/support-llama-metadata branch from deabd73 to 559f195 Compare November 2, 2023 15:21

[Example] Add the default parameter values to the README and synchron…

fac6415

…ize it with the implementation of the WasmEdge WASI-NN plugin Signed-off-by: dm4 <[email protected]>

hydai requested changes Nov 3, 2023

View reviewed changes

wasmedge-ggml-llama-interactive/README.md Outdated Show resolved Hide resolved

dm4 added 2 commits November 3, 2023 13:49

[Example] Update README for llama.cpp version

bf83171

Signed-off-by: dm4 <[email protected]>

[Example] Update README for n-gpu-layers in macOS

27073e8

Signed-off-by: dm4 <[email protected]>

hydai approved these changes Nov 3, 2023

View reviewed changes

hydai merged commit 3f93dd0 into master Nov 3, 2023
1 check passed

hydai deleted the dm4/support-llama-metadata branch November 3, 2023 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Use new ggml backend with llama options support #52

[Example] Use new ggml backend with llama options support #52

dm4 commented Oct 19, 2023 •

edited

Loading

juntao commented Oct 19, 2023 •

edited

Loading

dm4 commented Oct 20, 2023 •

edited

Loading

juntao commented Oct 20, 2023

dm4 commented Oct 20, 2023

juntao commented Oct 20, 2023

hydai commented Oct 23, 2023

hydai commented Oct 23, 2023

juntao commented Oct 23, 2023

dm4 commented Oct 23, 2023

juntao commented Oct 23, 2023

dm4 commented Oct 31, 2023

hydai left a comment

hydai commented Nov 3, 2023

[Example] Use new ggml backend with llama options support #52

[Example] Use new ggml backend with llama options support #52

Conversation

dm4 commented Oct 19, 2023 • edited Loading

juntao commented Oct 19, 2023 • edited Loading

Details

dm4 commented Oct 20, 2023 • edited Loading

juntao commented Oct 20, 2023

dm4 commented Oct 20, 2023

juntao commented Oct 20, 2023

hydai commented Oct 23, 2023

hydai commented Oct 23, 2023

juntao commented Oct 23, 2023

dm4 commented Oct 23, 2023

juntao commented Oct 23, 2023

dm4 commented Oct 31, 2023

hydai left a comment

Choose a reason for hiding this comment

hydai commented Nov 3, 2023

dm4 commented Oct 19, 2023 •

edited

Loading

juntao commented Oct 19, 2023 •

edited

Loading

dm4 commented Oct 20, 2023 •

edited

Loading