-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support chat history #438
support chat history #438
Conversation
ragna/assistants/_openai.py
Outdated
role=MessageRole.SYSTEM, | ||
) | ||
|
||
def _format_message_sources(self, messages: list[Message]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of what this could look like for OpenAI-like assistants. There could also be an attribute on the Assistant that tells this function how to truncate the message list based on the size of the target LLM's context window.
ragna/core/_rag.py
Outdated
@@ -195,6 +195,9 @@ async def prepare(self) -> Message: | |||
await self._run(self.source_storage.store, self.documents) | |||
self._prepared = True | |||
|
|||
system_prompt = self.assistant._make_system_content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels easiest to include the basic system prompt here, but maybe it should be regenerated and prepended to the list of messages for every call to the LLM and not stored here. This would make the most sense if the Assistant might truncate the message list and chop off the system prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree. I also mostly talked myself out of putting it here as well. I think I wanted _messages
to be a complete historical list of messages that could be sent to the LLM, but system prompts can stay hard coded on the Assistant (or better yet, in a config somewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just be sure there's no attempt at a 'universal' system prompt anywhere - on the assistant we'll have a general one for .answer(), but e.g. we'll want different system prompts for things like all the pre-process steps or anything that's outside the standard chain. Simply to say, it's reasonable to have a standard system prompt on .answer() but we don't want it fixed on the general llm calls in .generate() (wip).
sources = await self._run(self.source_storage.retrieve, self.documents, prompt) | ||
|
||
question = Message(content=prompt, role=MessageRole.USER, sources=sources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until this, we only ever attached the sources to the answer and not the question. I'm open to a "philosophical debate" on where they should go, but I don't want them on the question and the answer. Why not just keep passing the sources to Assistant.answer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was something I meant to ask about in my original list of questions last night but I ran out of steam. I'm really curious about this from a design perspective. Let's imagine a design where chat._messages
is a list of Message
objects, as currently implemented. Now also say the assistant is responsible for formatting each raw message for the LLM API call. I can imagine at least a couple scenarios for how we would format historical messages:
- The assistant reformats the message list every time it receives a new
.answer
call. So if we want to split up a prompt/sources pair into two separate json objects (i.e.{'content': '<user question>', 'role': 'user'}
+{'content': 'please use only the following sources: ...', 'role': 'system'}
we either need the sources to be attached to the question or we need to reach forward in the messages list to grab the sources from the answer. Something feels a little strange about the latter, but I agree that sources shouldn't be on both the question and the answer, just one or the other. - Alternatively, rather than reformat the LLM message JSON every time, we only format new prompt/source pairs and maintain a list of message dictionaries or rendered json somewhere of previously formatted messages. The we'd just need to append the new messages and send them along. This then becomes another data structure to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recall, we eventually want ['chat history' messages]+prompt->preprocess->retrieval so the messages coming in the history before the current prompt are(were) all strings to the LLM, but we may want to be able to act upon them with the distinction between content and source - e.g. if messages from the user are tracked at a 'content' + sources level, then it might be easier to process the sources down in later stages - e.g. you may not want to carry content+N-chunks forward from every user prompt.
Keeping the sources attached to prompts would be particularly useful in preprocess tracking where a given user-role message may be (content+sources) and you want to streamline the context there (parse out/summarize/select sources).
Just keep in mind that 'previous' messages make sense to have the sources attached, while the 'current' prompt we'll still be sending through processing and retrieval.
If I'm thinking about a pre-process stage class, receiving
messages = [{system,...}, {user,.. content, sources}, {assistant,.. answer}...]
makes more sense than the source being with the answer - the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).
So you're thinking of the stage of the pipeline where we have the most recent user prompt but have not yet retrieved any sources from source storage, is that right? Would you want .content
to contain string-formatted sources from previous questions, or would it be more flexible to still have them as Source
objects?
ragna/core/_rag.py
Outdated
@@ -195,6 +195,9 @@ async def prepare(self) -> Message: | |||
await self._run(self.source_storage.store, self.documents) | |||
self._prepared = True | |||
|
|||
system_prompt = self.assistant._make_system_content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?
ragna/assistants/_openai.py
Outdated
) | ||
|
||
def _format_message_sources(self, messages: list[Message]) -> str: | ||
sources_prompt = "Include the following sources in your answer:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what order we try to merge PRs - this piece will eventually be overwritten by the notion of the preprocess stage. It's probably fine for this version imo.
@dillonroach I haven't updated any assistants other than the ones that inherit from |
I would intentionally stay hands-off from your PR on the assistants, I'll be overhauling that anyway (a reference implement in one or two places doesn't hurt anything, but I'll likely just use it as ref and tear it up, so don't be too worried about details).
That's on the assistants to handle and there's nothing motivating it beyond the context window of the LLM - each assistant should have an idea of a max sequence length they can handle, and if the length coming in is too long, they'll basically need to drop the first n messages to account for the issue. E.g. (assuming message 1 is system prompt) (pseudo) messages[1] + messages[1+n:] - ideally, here, you'd do a token-count for the messages and anticipate the max length your generation will take, such that all messages+upcoming gen can fit within the llm context window. |
@dillonroach Do you want to just merge this branch into yours and make a mega-PR, or would you prefer I coordinate with Philip to get mine merged into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Blake! I pushed a minor cleanup commit (d7fbccb) to avoid some back and forth.
|
||
|
||
@skip_on_windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue here? We shouldn't just skip our E2E test unless we are sure that this is testing issue on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure looks real and serious:
Differing items:
{'role': 'user'} != {'role': 'assistant'}
{'id': 'a5a9ab44-2571-4dd1-889c-1ea8c61e6072'} != {'id': '0de4bf10-7e1c-4e54-8b8d-2aa760868335'}
{'content': '?'} != {'content': "I'm a demo assistant and can be used to try Ragna's workflow.\nI will only mirror back my inputs. \n\nSo far I have received 1 messages.\n\nYour last prompt was:\n\n> ?\n\nThese are the sources I was given:\n\n- test.txt: ! "}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blakerosenthal I'm going to merge this PR to unblock @dillonroach. Could you please get on debugging this issue next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there's something funky going on with transaction processing on Windows but it's intermittent (sometimes the GH tests pass and sometimes they don't). I'll try to get to the bottom of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #447
Co-authored-by: Philip Meier <[email protected]>
This is a very rough first pass at #421. I decided to investigate the naive approach of replacing
with
for just the HTTP-like assistants as a POC. Since a
Message
instance can hold a list ofSource
s, the_messages
list inRag.Chat
serves as a cache of raw messages that live in the database (and what the user sees), while each assistant is responsible for formatting the message list according to the needs of the individual LLM.