Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Example] Use new ggml backend with llama options support #52

Merged
merged 16 commits into from
Nov 3, 2023

Conversation

Copy link
Member

juntao commented Oct 19, 2023

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.


Overall, the pull request titled "[Example] Use new ggml backend with llama options support" contains a collection of code changes across multiple files. The important findings and potential issues are as follows:

  1. Potential problems:

    • The options string is hardcoded, which may limit flexibility. It is recommended to pass it as a command-line argument or read it from a configuration file.
    • There is no validation or error handling for parsing the options JSON string, which could cause runtime errors. Consider adding error handling or using a JSON parsing library.
    • The purpose of the saved_prompt variable is unclear and it is not being used. It could potentially be removed to improve clarity.
  2. Potential problems:

    • It's unclear why the branch of the wasi-nn dependency is being changed. More context is needed to determine the necessity and potential issues.
    • The commit message lacks information about the reasoning behind the change, making it difficult to understand the motivation for the patch.
  3. Potential problems:

    • It is not clear why the serde_json crate is added as a dependency. The purpose of this change should be explained.
    • The update to use the json! macro for the options variable is not well-justified. The necessity and potential risks of this change are unclear.
  4. No potential problems identified.

  5. Potential problems:

    • There may be a typo in the command in the README.md file.
    • The usage of environmental variables is mentioned, but it's not clear how they are handled and what implications or limitations there may be.
    • It is not explained why caution is needed with the - and _ characters in the variable name.
    • The expected values for the environmental variables and their impact on the command execution are not clear.
    • The purpose and behavior of several environmental variables are not explained.
    • The requirement of the wasmedge-ggml-llama-interactive.wasm file may be ambiguous.
  6. No potential problems identified.

  7. Potential problems:

    • The patch lacks context, making it difficult to understand the motivation for the change.
    • Further details or testing steps would be helpful to verify the fix.
  8. Potential problems:

    • The patch does not generate an AOT wasm binary anymore, which may cause compatibility issues.
    • The purpose and function of the new command introduced are not clear.
  9. Potential problems:

    • There is a mismatch in types for some environment variables, which may lead to parsing errors.
    • Suggestions to update default values to match the README and implementation are provided.
  10. No potential problems identified.

  11. Potential problems:

  • Lack of explanation for the macOS Metal support recommendation in the README file.
  • Uncertainty about missing changes or if the patch is comprehensive.

Considering all the individual summaries, the pull request includes some potential issues and errors, such as hardcoded options, lack of error handling, unclear changes, missing information, and compatibility concerns. Further clarification, documentation, and addressing the mentioned problems are recommended before merging the pull request.

Details

Commit e60315b1dde62f7c7a5c9d43d9ba6dfc2a960ed8

Key changes:

  • Added the ability to set options for input with index 1 using the context.set_input function.
  • Updated the context.set_input function calls to use the new method signature.
  • Added a prompt string for input with index 1, containing options in JSON format.
  • Updated the max_output_size calculation to use proper spacing.

Potential problems:

  • The options string is hardcoded in the code. It might be better to pass it as a command-line argument or read it from a configuration file for flexibility.
  • There is no validation or error handling for parsing the options JSON string. If the JSON is malformed, it may cause runtime errors. Consider adding error handling or using a JSON parsing library to handle invalid input gracefully.
  • It's unclear what the purpose of saved_prompt variable is and why it's not being used in the code. It could potentially be removed to improve clarity.

Commit 450a92c1a10222d944bf4ab874d9ff1ba4be91b7

Key changes:

  • The branch of the wasi-nn dependency in both wasmedge-ggml-llama-interactive and wasmedge-ggml-llama crates has been changed from "dm4/ggml" to "ggml".

Potential problems:

  • It's unclear why the branch of the wasi-nn dependency is being changed in this patch. Without more context, it's difficult to determine if this change is necessary or if it will introduce any issues.
  • It's also worth noting that the commit message does not provide much information about the reasoning behind this change, making it harder to understand the motivation for the patch. Adding more context to the commit message would be helpful for reviewers and future reference.

Overall, more information and context about the changes would be beneficial to fully assess the impact and potential problems of this patch.

Commit ad4af727f4a1cc808a672ddfad1ea020c5571c48

Key Changes:

  1. Added the serde_json dependency in the Cargo.toml file.
  2. Imported the serde_json crate in the main.rs file.
  3. Updated the options variable to use serde_json's json! macro to construct a JSON object.

Potential problems:

  1. It is not clear why the serde_json crate is being added as a dependency. The purpose of this change should be explained in the pull request or in a separate documentation.
  2. The code updates the options variable to use the json! macro, but it is not clear if this change is necessary or if it introduces any risks.

Commit 274d494bb669a871e97de64741c763b6f281400f

Key changes:

  • Updated the command in the README.md file to use the new ggml backend with the llama options support
  • Added support for setting llama options using set_input with index 1
  • Modified the list of supported parameters and their descriptions

Potential problems:

  • None identified

Commit b839ddedcba0a149c0864f06e1e98fc0cb9ffdbb

Key changes:

  • Added documentation to the README.md file for setting environmental variables.
  • Provided an example of setting environmental variables for the wasmedge command.

Potential problems:

  • There may be a typo in the line wasmedge --dir .:. \. It should be verified if the intended path is .:..
  • The usage of environmental variables is mentioned as being handled by Rust, but it is not clear how exactly they are handled and what implications or limitations there may be.
  • It is mentioned to beware of the - and _ characters in the variable name, but it is not explained why and what issues may arise from using those characters.
  • It is not clear what values should be assigned to the environmental variables and what impact they have on the execution of the command.
  • The purpose and expected behavior of the stream_stdout, enable_log, ctx_size, n_predict, and n_gpu_layers environmental variables are not explained.
  • It is not clear if the wasmedge-ggml-llama-interactive.wasm file needs to be present in the specified directory (--dir .:.) before running the command.

Commit 3c14961cc148a07d643d5f85160115e939ef41fa

Key changes in this patch:

  • An additional print statement is added before executing the inference.
  • The stdout output is now conditionally printed based on the value of the stream_stdout variable.

Potential problems:

  • There don't seem to be any obvious problems with this patch. However, without more context or the entire pull request, it's difficult to provide a comprehensive review.

Commit e6b9477dc186e560a482b9ddb75d504541986673

The key change in this patch is the fix to the llama interactive workflow in the .github/workflows/llama.yml file. Specifically, the name of the compiled wasm file is updated from wasmedge-ggml-llama.wasm to wasmedge-ggml-llama-interactive.wasm.

Overall, this patch should resolve an issue with the llama interactive workflow by correctly compiling and using the updated wasm file.

One potential problem with this patch is that it does not provide any additional context or explanation for why this change is necessary. It would be helpful to include a brief description or reference to the specific issue or bug that this patch is addressing.

Additionally, it would be beneficial to include more detailed testing steps or information to verify that this fix is working as expected.

Overall, the changes in this patch appear to be straightforward and should effectively fix the llama interactive workflow.

Commit 559f195de562ecac884ac9b6dce37430d3f8a167

Key changes:

  • The patch updates the workflow file .github/workflows/llama.yml.
  • It replaces the command to compile the Rust code into an AOT (Ahead-of-Time) WebAssembly (wasm) binary with the original wasm binary.

Potential problems:

  • The patch removes the line wasmedge compile target/wasm32-wasi/release/wasmedge-ggml-llama-interactive.wasm wasmedge-ggml-llama-interactive.wasm, which means the AOT wasm binary is not being generated anymore. This could cause compatibility issues if the project relied on the AOT binary for some reason.
  • The patch introduces a new command wasmedge target/wasm32-wasi/release/wasmedge-ggml-llama-interactive.wasm default. However, it is unclear what the purpose of this command is and how it relates to the previous commands. Further documentation or context would be helpful in understanding its function.

Commit fac6415f63803a11e0f6430fb3b4a7471ba8f449

Key Changes:

  • Added default parameter values to the README file.
  • Synchronized the README file with the implementation of the WasmEdge WASI-NN plugin.
  • Updated the default values for the parameters in the main.rs file.

Potential Problems:

  • The stream_stdout and enable_log variables are parsed as booleans, but the default values are supplied as strings. This may cause parsing errors if the environment variable is not set or if it is set to an invalid value.
  • The ctx_size and n_predict variables are parsed as i32 integers, but the default values are supplied as strings. This may cause parsing errors if the environment variable is not set or if it is set to an invalid value.

Suggestions:

  • Update the default values for stream_stdout and enable_log to false to match the README.
  • Update the default values for ctx_size and n_predict to 512 to match the README and the implementation in llama.cpp.

Overall, the changes seem to be focused on adding default values to the README and synchronizing it with the implementation. There are some potential problems in parsing the environment variables, which should be addressed for proper functionality.

Commit bf8317150ad77639c90b99349543ed35fdb2c4ba

Key changes:

  • The README file for the wasmedge-ggml-llama-interactive project has been updated.
  • The URL for the llama.cpp backend has been corrected.
  • A reference to the version used has been added.

Potential problems:

  • No potential problems have been identified in this patch.

Commit 27073e8271a49ba1906cc91d1e6c9401d3cdf1b0

Key changes:

  • Updated the README file for the n-gpu-layers parameter in the wasmedge-ggml-llama-interactive directory.

Potential problems:

  • The updated README file includes a specific note about using Metal support in macOS and suggests setting n-gpu-layers to 0 or not setting it for the default value. However, it does not provide any explanation for this recommendation. It would be helpful to include more information on why this is necessary or what potential issues may arise if this parameter is not set correctly in macOS with Metal support.
  • It is unclear if this patch is comprehensive or if there are other changes made in addition to updating the README file. The patch only includes changes related to the wasmedge-ggml-llama-interactive directory, but it is not clear if there were other changes made elsewhere that are missing from this patch. It would be beneficial to review the complete set of changes to ensure nothing is missing.

Overall, the changes seem relatively minor and focused on updating documentation. However, the potential problem with the lack of explanation for the macOS Metal support recommendation and the uncertainty about missing changes should be addressed before merging this pull request.

@dm4
Copy link
Member Author

dm4 commented Oct 20, 2023

The upcoming commits are expected to include:

  • Updating the explanation of Parameters in the README.md file.
  • Changing the branch of the second-state/wasmedge-wasi-nn crate from dm4/ggml to ggml.
  • Utilizing more supported parameters in wasmedge-ggml-llama-interactive for demonstration.

@juntao
Copy link
Member

juntao commented Oct 20, 2023

The index 0 is always the initial prompt and index 1 is always the options, right?

The initial prompt is used to load the model. When we are in a chat session, subsequent calls to the compute will not have the initial prompt? Thanks.

@dm4
Copy link
Member Author

dm4 commented Oct 20, 2023

The index 0 is always the initial prompt and index 1 is always the options, right?

The initial prompt is used to load the model. When we are in a chat session, subsequent calls to the compute will not have the initial prompt? Thanks.

Yes, index 1 always represents the options, while index 0 is used for prompts. During a chat session, we always update the prompt by setting it to index 0 in order to replace the old prompt. Then, we call compute again.

@juntao
Copy link
Member

juntao commented Oct 20, 2023

Perhaps we should make the "initial prompt" a field in the options JSON object?

Also, during a conversation, I believe the initial prompt should only be called once at the beginning to load the model -- the same behavior of the current llama interactive example.

@dm4 dm4 marked this pull request as ready for review October 22, 2023 16:54
@hydai
Copy link
Member

hydai commented Oct 23, 2023

Perhaps we should make the "initial prompt" a field in the options JSON object?

Also, during a conversation, I believe the initial prompt should only be called once at the beginning to load the model -- the same behavior of the current llama interactive example.

This seems wrong. The JSON is for setting the context for llama.cpp. However, this init prompt is for the llama2 chat model only, although there are many different types of llama2. Let users decide when they want to call the init prompt or not. The reason is that this init prompt is for the UX in this example only rather than for a real use case.

@dm4 dm4 requested a review from hydai October 23, 2023 05:40
@hydai
Copy link
Member

hydai commented Oct 23, 2023

We should merge this after WasmEdge/WasmEdge#2952 gets merged.

@juntao
Copy link
Member

juntao commented Oct 23, 2023

Even so -- I think the options should be at index 0 and initial prompt should be at index 1 as the latter is probably much less used.

@dm4
Copy link
Member Author

dm4 commented Oct 23, 2023

I just want to confirm that the current initial_prompt is only used to load the model before the first compute() to reduce the time spent on the first compute()?

If that's the case, I think we can modify it to handle the first model loading in graph.init_execution_context() to reduce the time spent on the first 'compute()'. We don't need to call compute() in advance with initial_prompt in the Rust end.

@juntao
Copy link
Member

juntao commented Oct 23, 2023

Good idea

I just want to confirm that the current initial_prompt is only used to load the model before the first compute() to reduce the time spent on the first compute()?

If that's the case, I think we can modify it to handle the first model loading in graph.init_execution_context() to reduce the time spent on the first 'compute()'. We don't need to call compute() in advance with initial_prompt in the Rust end.

@dm4 dm4 force-pushed the dm4/support-llama-metadata branch from e2baeb6 to 274d494 Compare October 31, 2023 08:22
@dm4
Copy link
Member Author

dm4 commented Oct 31, 2023

I just tested the response time of the ggml plugin. Previously, we didn't support loading the model file directly, so we had to write a ggml-model.bin file and use the initial prompt to compute() once. Now, the Rust side should be able to skip the initial prompt and directly ask the questions.

logs:

// with init prompt compute
Time elapsed in build_from_cache() is: 182.881855ms
Time elapsed in init_execution_context() is: 2.064µs
// init prompt time
Time elapsed in prepare() is: 11.049936722s
// first question
Time elapsed in set_input() is: 4.500595ms
Time elapsed in compute() is: 14.011402635s
// second question
Time elapsed in set_input() is: 1.594805ms
Time elapsed in compute() is: 15.538733204s
---
// without init prompt compute
Time elapsed in build_from_cache() is: 183.347165ms
Time elapsed in init_execution_context() is: 2.339µs
// first question
Time elapsed in set_input() is: 3.862416ms
Time elapsed in compute() is: 13.512181653s
// second question
Time elapsed in set_input() is: 4.851177ms
Time elapsed in compute() is: 15.273717378s

dm4 added 4 commits November 2, 2023 12:12
Set env variable `is_interactive` to `true` to switch into non-interactive mode.

Signed-off-by: dm4 <[email protected]>
wasmedge-ggml-llama-interactive/src/main.rs Outdated Show resolved Hide resolved
@dm4 dm4 force-pushed the dm4/support-llama-metadata branch from 74f6c75 to be2c191 Compare November 2, 2023 08:02
@dm4 dm4 requested a review from hydai November 2, 2023 08:06
Copy link
Member

@hydai hydai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove aot.wasm

.github/workflows/llama.yml Outdated Show resolved Hide resolved
wasmedge-ggml-llama-interactive/aot.wasm Outdated Show resolved Hide resolved
dm4 added 3 commits November 2, 2023 23:21
- If only the preload model name is given, then it defaults to interactive mode.
- If another argument is provided after the preload model name, then it will be considered as a prompt, and it will enter non-interactive mode (just like the previous wasmedge-ggml-llama example)."

Signed-off-by: dm4 <[email protected]>
@dm4 dm4 force-pushed the dm4/support-llama-metadata branch from deabd73 to 559f195 Compare November 2, 2023 15:21
…ize it with the implementation of the WasmEdge WASI-NN plugin

Signed-off-by: dm4 <[email protected]>
wasmedge-ggml-llama-interactive/README.md Outdated Show resolved Hide resolved
@hydai
Copy link
Member

hydai commented Nov 3, 2023

Please merge this after the 0.13.5 is out.

@hydai hydai merged commit 3f93dd0 into master Nov 3, 2023
1 check passed
@hydai hydai deleted the dm4/support-llama-metadata branch November 3, 2023 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants