This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 362
allow chat to halt new token generation on stop_sequence
#364
Merged
Merged
Changes from 2 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
43b4f0b
allow chat to halt new token generation
averypelle f87b4fa
pull function out
averypelle 38d8632
fix: move inference callback to llm-base
averypelle 41bf37a
clarify function comment
averypelle 39e45db
change args to message-prompt-prefix
averypelle acbf117
Merge branch 'main' into fix/chat-halting
averypelle b4efde7
update comment
averypelle a0ad8b4
fix(cli): don't insert newline at start of chat
philpax 138263a
fix(llm): clarify conversation_inference_callback
philpax 8702593
refactor(cli): simplify message_prompt_prefix
philpax 710f3c2
fix(llm): only feed prompt if not empty
philpax 74d2d67
fix(llm): require errors to be Send+Sync
philpax 34a8c68
feat(cli): rewrite interactive... again
philpax File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -256,6 +256,12 @@ fn interactive( | |
let parameters = generate.inference_parameters(model.eot_token_id()); | ||
let mut rng = generate.rng(); | ||
|
||
let stop_sequence = message_prompt_template | ||
.map(|s| s.replace("{{PROMPT}}", "").trim().to_owned()) | ||
.unwrap_or_default(); | ||
|
||
let mut buf = String::new(); | ||
|
||
fn session_ends_with_newline(session: &InferenceSession) -> bool { | ||
session | ||
.decoded_tokens() | ||
|
@@ -293,15 +299,7 @@ fn interactive( | |
maximum_token_count: generate.num_predict, | ||
}, | ||
&mut Default::default(), | ||
|r| match r { | ||
InferenceResponse::PromptToken(t) | InferenceResponse::InferredToken(t) => { | ||
print!("{t}"); | ||
std::io::stdout().flush().unwrap(); | ||
|
||
Ok(InferenceFeedback::Continue) | ||
} | ||
_ => Ok(InferenceFeedback::Continue), | ||
}, | ||
inference_callback(stop_sequence.clone(), chat_mode, &mut buf), | ||
) | ||
}; | ||
|
||
|
@@ -448,3 +446,42 @@ impl Validator for LineContinuationValidator { | |
fn process_prompt(raw_prompt: &str, prompt: &str) -> String { | ||
raw_prompt.replace("{{PROMPT}}", prompt) | ||
} | ||
|
||
fn inference_callback( | ||
stop_sequence: String, | ||
chat_mode: bool, | ||
buf: &mut String, | ||
) -> impl FnMut(InferenceResponse) -> Result<InferenceFeedback, Infallible> + '_ { | ||
move |resp| match resp { | ||
InferenceResponse::InferredToken(t) => { | ||
if chat_mode { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i didn't touch REPL mode - is the desired behavior to continue generating tokens in that mode? otherwise happy to change |
||
let mut reverse_buf = buf.clone(); | ||
reverse_buf.push_str(t.as_str()); | ||
if stop_sequence.as_str().eq(reverse_buf.as_str()) { | ||
buf.clear(); | ||
return Ok(InferenceFeedback::Halt); | ||
} else if stop_sequence.as_str().starts_with(reverse_buf.as_str()) { | ||
buf.push_str(t.as_str()); | ||
return Ok(InferenceFeedback::Continue); | ||
} | ||
|
||
if buf.is_empty() { | ||
print_token(t) | ||
} else { | ||
print_token(reverse_buf) | ||
} | ||
} else { | ||
print_token(t) | ||
} | ||
} | ||
InferenceResponse::EotToken => Ok(InferenceFeedback::Halt), | ||
_ => Ok(InferenceFeedback::Continue), | ||
} | ||
} | ||
|
||
fn print_token(t: String) -> Result<llm::InferenceFeedback, Infallible> { | ||
print!("{t}"); | ||
std::io::stdout().flush().unwrap(); | ||
|
||
Ok(llm::InferenceFeedback::Continue) | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming the message_prompt_template is something like:
But very open to suggestions here on how to make this more robust