allow chat to halt new token generation on `stop_sequence` #364

averypelle · 2023-07-09T20:05:55Z

Closes #363

Stop token generation after reaching a specified stop_sequence in chat mode

I am still new to rust so please let me know how I can improve my code!

averypelle · 2023-07-09T20:20:34Z

binaries/llm-cli/src/main.rs

@@ -264,6 +266,33 @@ fn interactive(
            .unwrap_or(false)
    }

+    fn inference_callback(


this is essentially a copy of the function here https://github.com/rustformers/llm/blob/main/crates/llm/examples/vicuna-chat.rs#L119

averypelle · 2023-07-09T20:24:28Z

binaries/llm-cli/src/main.rs

@@ -256,6 +256,12 @@ fn interactive(
    let parameters = generate.inference_parameters(model.eot_token_id());
    let mut rng = generate.rng();

+    let stop_sequence = message_prompt_template


I am assuming the message_prompt_template is something like:

User: {{PROMPT}}

But very open to suggestions here on how to make this more robust

averypelle · 2023-07-09T20:29:24Z

binaries/llm-cli/src/main.rs

+) -> impl FnMut(InferenceResponse) -> Result<InferenceFeedback, Infallible> + '_ {
+    move |resp| match resp {
+        InferenceResponse::InferredToken(t) => {
+            if chat_mode {


i didn't touch REPL mode - is the desired behavior to continue generating tokens in that mode? otherwise happy to change

philpax · 2023-07-09T21:45:45Z

Great work! I was actually thinking about bringing that logic out so that I could use it for llmcord.

Do you think you'd be able to move the inference_callback to llm-base, name it something like conversation_inference_callback, and update both llm-cli and the vicuna-chat example to use it? (You might need to parameterise over print_token)

That would allow for this logic to be used across both, as well as elsewhere (the aforementioned llmcord).

I'd suggest passing the stop sequence in from the CLI (i.e. maybe replace message_prompt with message_prompt_prefix and use that as the stop sequence.)

i didn't touch REPL mode - is the desired behavior to continue generating tokens in that mode? otherwise happy to change

Yup, that's for a back and forth where no state is preserved and the model can produce as much output as it wants. I'd suggest splitting the code paths so that they use entirely different inference callbacks - they share the readline, but their inference behaviour is pretty different.

Great work once again, let me know if you need a hand with any of this! 🙂

averypelle · 2023-07-10T16:22:11Z

Thanks @philpax! Just pushed an update where I moved the function to llm-base. For the message_prompt, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string {{PROMPT}}?

crates/llm-base/src/inference_session.rs

binaries/llm-cli/src/main.rs

philpax · 2023-07-11T22:10:01Z

Great work! Looking forward to merging this soon 🚀

For the message_prompt, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string {{PROMPT}}?

I don't think the logic would work if it was postfix anyway - we should make it clear that you need to pass in a prefix. I'd say you can leave out the {{PROMPT}} in that case, because it should always be implied that the prompt will be suffixed.

averypelle · 2023-07-12T16:01:11Z

Okay @philpax I have updated the CLI to take a message_prompt_prefix instead. I also tried running locally with several models and it is working as expected for a multi-prompt chat.

The previous abstraction made it hard to reason about what each codepath would do. To resolve this, I've split the code up and now have separate functions entirely that share code.

philpax · 2023-07-12T23:47:32Z

Heya! ...apologies for hijacking the PR. I went to test it and all of your changes worked as expected, but I realised that there were quite a few latent bugs with the stuff not covered by your PR and that the whole chat/REPL logic just wasn't working how I wanted it to work. I ended up revising way more than I intended 😅

The upshot is that it should now work consistently, and there shouldn't be any surprise discrepancies between REPL and chat mode. Sorry once again for the complete hijack 😭

Feel free to ask about any of the changes I made! Most of them were unrelated to the code you introduced (I mostly addressed issues that were already present before your changes), but I'm happy to explain them nonetheless. You might be interested in the simplify message_prompt_prefix commit, in which I replaced the if-lets with a match.

averypelle · 2023-07-13T15:30:52Z

binaries/llm-cli/src/cli_args.rs

-            eyre::bail!(
-                "Must specify either --message-prompt-prefix or --message-prompt-prefix-file"
-            )
+            (None, Some(message_prompt_prefix_file)) => {


wow very cool!

averypelle · 2023-07-13T15:31:49Z

crates/llm-base/src/inference_session.rs

 ) -> impl FnMut(InferenceResponse) -> Result<InferenceFeedback, E> + 'a {
+    let mut stop_sequence_buf = String::new();


interesting, scoping this buffer to the function seems a lot better!

averypelle · 2023-07-13T15:32:57Z

crates/llm-base/src/inference_session.rs

@@ -897,8 +897,8 @@ pub fn feed_prompt_callback<'a, E: std::error::Error + 'static>(

 /// An [InferenceResponse] callback that will halt inference when a `stop_sequence` is generated.
 /// This callback is used in [InferenceSession::infer] in chat_mode.
-pub fn conversation_inference_callback<'a, E: std::error::Error + 'static>(
-    stop_sequence: String,
+pub fn conversation_inference_callback<'a, E: std::error::Error + Send + Sync + 'static>(


what do these do?

In Rust, objects get the Send trait if they can be sent across threads, and Sync if they can be used by multiple threads (you can see more details here).

I needed to add this because eyre, which we use for error reporting in the CLI, expects the error from infer to be Send + Sync. The error is passed down from callback to infer, so the trait requirements need to be updated across the library.

averypelle · 2023-07-13T15:34:43Z

utils/prompts/pygmalion-prelude.txt

I have some prompts I made for Falcon and MPT too since I was testing that. Want me to add in a follow-up PR?

Sure thing!

averypelle · 2023-07-13T15:35:30Z

binaries/llm-cli/src/main.rs

-    )
-}
-
-fn interactive(


ah love that you split these out into separate functions!

allow chat to halt new token generation

43b4f0b

averypelle marked this pull request as draft July 9, 2023 20:06

averypelle commented Jul 9, 2023

View reviewed changes

pull function out

f87b4fa

averypelle commented Jul 9, 2023

View reviewed changes

averypelle marked this pull request as ready for review July 9, 2023 20:25

averypelle commented Jul 9, 2023

View reviewed changes

averypelle mentioned this pull request Jul 9, 2023

Chat does not halt #363

Closed

averypelle changed the title ~~allow chat to halt new token generation~~ allow chat to halt new token generation on stop_sequence Jul 9, 2023

philpax added issue:enhancement New feature or request app:cli App: the `llm` CLI labels Jul 9, 2023

fix: move inference callback to llm-base

38d8632

philpax reviewed Jul 11, 2023

View reviewed changes

crates/llm-base/src/inference_session.rs Outdated Show resolved Hide resolved

philpax reviewed Jul 11, 2023

View reviewed changes

binaries/llm-cli/src/main.rs Outdated Show resolved Hide resolved

averypelle added 2 commits July 12, 2023 11:07

clarify function comment

41bf37a

change args to message-prompt-prefix

39e45db

averypelle and others added 8 commits July 12, 2023 12:06

Merge branch 'main' into fix/chat-halting

acbf117

update comment

b4efde7

fix(cli): don't insert newline at start of chat

a0ad8b4

fix(llm): clarify conversation_inference_callback

138263a

refactor(cli): simplify message_prompt_prefix

8702593

fix(llm): only feed prompt if not empty

710f3c2

fix(llm): require errors to be Send+Sync

74d2d67

feat(cli): rewrite interactive... again

34a8c68

The previous abstraction made it hard to reason about what each codepath would do. To resolve this, I've split the code up and now have separate functions entirely that share code.

philpax merged commit fc1c052 into rustformers:main Jul 12, 2023
14 checks passed

averypelle commented Jul 13, 2023

View reviewed changes

averypelle deleted the fix/chat-halting branch July 13, 2023 17:57

hhamud mentioned this pull request Aug 7, 2023

Write a 0.2 changelog #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow chat to halt new token generation on `stop_sequence` #364

allow chat to halt new token generation on `stop_sequence` #364

averypelle commented Jul 9, 2023 •

edited

Loading

averypelle Jul 9, 2023

averypelle Jul 9, 2023 •

edited

Loading

averypelle Jul 9, 2023

philpax commented Jul 9, 2023

averypelle commented Jul 10, 2023

philpax commented Jul 11, 2023

averypelle commented Jul 12, 2023 •

edited

Loading

philpax commented Jul 12, 2023 •

edited

Loading

averypelle Jul 13, 2023

averypelle Jul 13, 2023

averypelle Jul 13, 2023

philpax Jul 13, 2023

averypelle Jul 13, 2023

philpax Jul 13, 2023

averypelle Jul 13, 2023

		) -> impl FnMut(InferenceResponse) -> Result<InferenceFeedback, E> + 'a {
		let mut stop_sequence_buf = String::new();

allow chat to halt new token generation on stop_sequence #364

allow chat to halt new token generation on stop_sequence #364

Conversation

averypelle commented Jul 9, 2023 • edited Loading

Choose a reason for hiding this comment

averypelle Jul 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philpax commented Jul 9, 2023

averypelle commented Jul 10, 2023

philpax commented Jul 11, 2023

averypelle commented Jul 12, 2023 • edited Loading

philpax commented Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allow chat to halt new token generation on `stop_sequence` #364

allow chat to halt new token generation on `stop_sequence` #364

averypelle commented Jul 9, 2023 •

edited

Loading

averypelle Jul 9, 2023 •

edited

Loading

averypelle commented Jul 12, 2023 •

edited

Loading

philpax commented Jul 12, 2023 •

edited

Loading