support chat history #438

blakerosenthal · 2024-07-04T05:01:27Z

This is a very rough first pass at #421. I decided to investigate the naive approach of replacing

class Assistant
    ...
    def answer(self, prompt: str, sources: list[Source]):
        ...

with

class Assistant
    ...
    def answer(self, messages: list[Message]):
        ...

for just the HTTP-like assistants as a POC. Since a Message instance can hold a list of Sources, the _messages list in Rag.Chat serves as a cache of raw messages that live in the database (and what the user sees), while each assistant is responsible for formatting the message list according to the needs of the individual LLM.

ragna/assistants/_demo.py

blakerosenthal · 2024-07-04T05:05:42Z

ragna/assistants/_openai.py

+            role=MessageRole.SYSTEM,
+        )
+
+    def _format_message_sources(self, messages: list[Message]) -> str:


Example of what this could look like for OpenAI-like assistants. There could also be an attribute on the Assistant that tells this function how to truncate the message list based on the size of the target LLM's context window.

blakerosenthal · 2024-07-04T05:09:13Z

ragna/core/_rag.py

@@ -195,6 +195,9 @@ async def prepare(self) -> Message:
        await self._run(self.source_storage.store, self.documents)
        self._prepared = True

+        system_prompt = self.assistant._make_system_content()


It feels easiest to include the basic system prompt here, but maybe it should be regenerated and prepended to the list of messages for every call to the LLM and not stored here. This would make the most sense if the Assistant might truncate the message list and chop off the system prompt.

I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?

I think I agree. I also mostly talked myself out of putting it here as well. I think I wanted _messages to be a complete historical list of messages that could be sent to the LLM, but system prompts can stay hard coded on the Assistant (or better yet, in a config somewhere?)

Just be sure there's no attempt at a 'universal' system prompt anywhere - on the assistant we'll have a general one for .answer(), but e.g. we'll want different system prompts for things like all the pre-process steps or anything that's outside the standard chain. Simply to say, it's reasonable to have a standard system prompt on .answer() but we don't want it fixed on the general llm calls in .generate() (wip).

ragna/core/_rag.py

ragna/assistants/_demo.py

pmeier · 2024-07-04T07:26:45Z

ragna/core/_rag.py

        sources = await self._run(self.source_storage.retrieve, self.documents, prompt)

+        question = Message(content=prompt, role=MessageRole.USER, sources=sources)


Until this, we only ever attached the sources to the answer and not the question. I'm open to a "philosophical debate" on where they should go, but I don't want them on the question and the answer. Why not just keep passing the sources to Assistant.answer?

This was something I meant to ask about in my original list of questions last night but I ran out of steam. I'm really curious about this from a design perspective. Let's imagine a design where chat._messages is a list of Message objects, as currently implemented. Now also say the assistant is responsible for formatting each raw message for the LLM API call. I can imagine at least a couple scenarios for how we would format historical messages:

The assistant reformats the message list every time it receives a new .answer call. So if we want to split up a prompt/sources pair into two separate json objects (i.e. {'content': '<user question>', 'role': 'user'} + {'content': 'please use only the following sources: ...', 'role': 'system'} we either need the sources to be attached to the question or we need to reach forward in the messages list to grab the sources from the answer. Something feels a little strange about the latter, but I agree that sources shouldn't be on both the question and the answer, just one or the other.

Alternatively, rather than reformat the LLM message JSON every time, we only format new prompt/source pairs and maintain a list of message dictionaries or rendered json somewhere of previously formatted messages. The we'd just need to append the new messages and send them along. This then becomes another data structure to maintain.

Recall, we eventually want ['chat history' messages]+prompt->preprocess->retrieval so the messages coming in the history before the current prompt are(were) all strings to the LLM, but we may want to be able to act upon them with the distinction between content and source - e.g. if messages from the user are tracked at a 'content' + sources level, then it might be easier to process the sources down in later stages - e.g. you may not want to carry content+N-chunks forward from every user prompt.

Keeping the sources attached to prompts would be particularly useful in preprocess tracking where a given user-role message may be (content+sources) and you want to streamline the context there (parse out/summarize/select sources).
Just keep in mind that 'previous' messages make sense to have the sources attached, while the 'current' prompt we'll still be sending through processing and retrieval.

If I'm thinking about a pre-process stage class, receiving
messages = [{system,...}, {user,.. content, sources}, {assistant,.. answer}...] makes more sense than the source being with the answer - the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).

the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).

So you're thinking of the stage of the pipeline where we have the most recent user prompt but have not yet retrieved any sources from source storage, is that right? Would you want .content to contain string-formatted sources from previous questions, or would it be more flexible to still have them as Source objects?

ragna/core/_rag.py

ragna/core/_components.py

pmeier · 2024-07-04T07:34:06Z

ragna/core/_rag.py

@@ -195,6 +195,9 @@ async def prepare(self) -> Message:
        await self._run(self.source_storage.store, self.documents)
        self._prepared = True

+        system_prompt = self.assistant._make_system_content()


I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?

dillonroach · 2024-07-05T22:52:24Z

ragna/assistants/_openai.py

+        )
+
+    def _format_message_sources(self, messages: list[Message]) -> str:
+        sources_prompt = "Include the following sources in your answer:"


Depending on what order we try to merge PRs - this piece will eventually be overwritten by the notion of the preprocess stage. It's probably fine for this version imo.

blakerosenthal · 2024-07-09T17:05:08Z

@dillonroach I haven't updated any assistants other than the ones that inherit from OpenaiLikeHttpApiAssistant and the demo. Do you anticipate [m]any merge issues with your work if I went ahead and did that / do you have any other thoughts on how to pass multiple messages to the other assistants? I was thinking it would be a good idea to have a message list length limit but you are probably the one to know what that should be.

dillonroach · 2024-07-09T17:58:28Z

@dillonroach I haven't updated any assistants other than the ones that inherit from OpenaiLikeHttpApiAssistant and the demo. Do you anticipate [m]any merge issues with your work if I went ahead and did that / do you have any other thoughts on how to pass multiple messages to the other assistants?

I would intentionally stay hands-off from your PR on the assistants, I'll be overhauling that anyway (a reference implement in one or two places doesn't hurt anything, but I'll likely just use it as ref and tear it up, so don't be too worried about details).

I was thinking it would be a good idea to have a message list length limit but you are probably the one to know what that should be.

That's on the assistants to handle and there's nothing motivating it beyond the context window of the LLM - each assistant should have an idea of a max sequence length they can handle, and if the length coming in is too long, they'll basically need to drop the first n messages to account for the issue. E.g. (assuming message 1 is system prompt) (pseudo) messages[1] + messages[1+n:] - ideally, here, you'd do a token-count for the messages and anticipate the max length your generation will take, such that all messages+upcoming gen can fit within the llm context window.

blakerosenthal · 2024-07-09T18:27:27Z

I would intentionally stay hands-off from your PR on the assistants

@dillonroach Do you want to just merge this branch into yours and make a mega-PR, or would you prefer I coordinate with Philip to get mine merged into main first?

pmeier

Thanks Blake! I pushed a minor cleanup commit (d7fbccb) to avoid some back and forth.

pmeier · 2024-07-11T12:08:51Z

tests/deploy/api/test_e2e.py



+@skip_on_windows


What is the issue here? We shouldn't just skip our E2E test unless we are sure that this is testing issue on Windows.

The failure looks real and serious:

Differing items: {'role': 'user'} != {'role': 'assistant'} {'id': 'a5a9ab44-2571-4dd1-889c-1ea8c61e6072'} != {'id': '0de4bf10-7e1c-4e54-8b8d-2aa760868335'} {'content': '?'} != {'content': "I'm a demo assistant and can be used to try Ragna's workflow.\nI will only mirror back my inputs. \n\nSo far I have received 1 messages.\n\nYour last prompt was:\n\n> ?\n\nThese are the sources I was given:\n\n- test.txt: ! "}

@blakerosenthal I'm going to merge this PR to unblock @dillonroach. Could you please get on debugging this issue next?

Agreed, there's something funky going on with transaction processing on Windows but it's intermittent (sometimes the GH tests pass and sometimes they don't). I'll try to get to the bottom of this.

Opened #447

This reverts commit dee268b.

Co-authored-by: Philip Meier <[email protected]>

blakerosenthal added 2 commits July 3, 2024 12:46

wip: use messages in Assistant.answer instead of prompt + sources

d81dc97

test without streaming

1be9013

blakerosenthal commented Jul 4, 2024

View reviewed changes

ragna/assistants/_demo.py Outdated Show resolved Hide resolved

blakerosenthal commented Jul 4, 2024

View reviewed changes

ragna/core/_rag.py Outdated Show resolved Hide resolved

blakerosenthal requested review from pmeier and dillonroach July 4, 2024 05:16

pmeier reviewed Jul 4, 2024

View reviewed changes

demo for RuntimeError

1e2b626

dillonroach reviewed Jul 5, 2024

View reviewed changes

dillonroach mentioned this pull request Jul 8, 2024

WIP draft of generate() outside of chat.answer() #432

Open

blakerosenthal added 7 commits July 8, 2024 10:56

shallow copy messages list before passing to assistant.answer

213bccd

some code review comments

6e3caef

include sources on prompt message

212b94e

fix message ordering

772ee4f

cleanup

1ed2349

remove print

6310533

fix test

57f5949

blakerosenthal added 6 commits July 10, 2024 10:23

update builtin assistants

08e49ab

appease mypy

2f192be

update tutorials

f26616c

revert changes to openai assistant

4cfc198

reorder test

97f9eba

skip broken windows test

0fb10d3

blakerosenthal marked this pull request as ready for review July 10, 2024 21:35

cleanup

d7fbccb

pmeier changed the title ~~WIP: chat history~~ support chat history Jul 11, 2024

pmeier approved these changes Jul 11, 2024

View reviewed changes

pmeier added 3 commits July 11, 2024 14:10

fix docs

36beb72

[DEBUG] unskip API E2E test on Windows

dee268b

Revert "[DEBUG] unskip API E2E test on Windows"

65d5ef3

This reverts commit dee268b.

pmeier merged commit 55f7fc5 into Quansight:main Jul 11, 2024
20 of 21 checks passed

blakerosenthal added a commit that referenced this pull request Jul 17, 2024

support chat history (#438)

82fbe0c

Co-authored-by: Philip Meier <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support chat history #438

support chat history #438

blakerosenthal commented Jul 4, 2024

blakerosenthal Jul 4, 2024

blakerosenthal Jul 4, 2024

pmeier Jul 4, 2024

blakerosenthal Jul 4, 2024

dillonroach Jul 5, 2024

pmeier Jul 4, 2024

blakerosenthal Jul 4, 2024

dillonroach Jul 5, 2024

blakerosenthal Jul 8, 2024

pmeier Jul 4, 2024

dillonroach Jul 5, 2024

blakerosenthal commented Jul 9, 2024

dillonroach commented Jul 9, 2024

blakerosenthal commented Jul 9, 2024

pmeier left a comment

pmeier Jul 11, 2024

pmeier Jul 11, 2024

pmeier Jul 11, 2024

blakerosenthal Jul 11, 2024

blakerosenthal Jul 11, 2024

		sources = await self._run(self.source_storage.retrieve, self.documents, prompt)

		question = Message(content=prompt, role=MessageRole.USER, sources=sources)



		@skip_on_windows

support chat history #438

support chat history #438

Conversation

blakerosenthal commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blakerosenthal commented Jul 9, 2024

dillonroach commented Jul 9, 2024

blakerosenthal commented Jul 9, 2024

pmeier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment