feat(actions): enable streaming in custom actions #735

niels-garve · 2024-09-07T17:16:46Z

🚨 Updates in the discussions below

Fixes #646

Problem description

First of all, thanks for NeMo-Guardrails!

Given two consecutive actions. The first one is a custom RAG, and the second one analyzes the answer to render a disclaimer in case the answer is not grounded in the knowledge base. It is like fact-checking, but with streaming enabled. The bot should answer and finish like: "I learn something new every day, so my answers may not always be perfect."

Using streaming currently leads to two errors:

The streaming handler finishes after the first action. The disclaimer is not streamed. This is because the variable streaming_finished_event is set, which in turn is caused by an empty chunk ("") that is passed to on_llm_new_token. The existing if statement checks for empty chunks, but only when they occur at the beginning. In our case, it happens at the end. I extended the check so that "" is never being processed.

For downstream usage, the first action returns the final answer, which has also been streamed. When the action finishes, the accumulated result is emitted, which is why you end up with duplicate sentences in the result:

Question: Hi

Answer:
I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!
I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!I learn 
something new every day, so my answers may not always be perfect.

This one I solved in the _process function by adding an early return in case chunk == self.completion

How to test

I've added an example under examples/configs/rag/custom_rag_streaming which you can test like so:

$ export OPENAI_API_KEY='sk-xxx'
$ python -m nemoguardrails.__main__ chat --config /<path_to>/examples/configs/rag/custom_rag_streaming --streaming

Please also follow the README.md I've included.

I'm happy to hear your feedback!

@drazvan @mikeolubode

niels-garve · 2024-09-09T07:36:23Z

Hello @drazvan @mikeolubode, I found a neat solution without altering the library. So, I'm just requesting my example be pulled. What do you think of the idea of using a local streaming handler that filters out and handles stream-stopping chunks ("" and None) while keeping the main streaming handler open?

niels-garve · 2024-09-09T16:11:35Z

Update: the duplicate chunks as a result of streaming within an action and returning its result must still be handled

drazvan · 2024-09-10T09:32:29Z

Thanks for digging into this @niels-garve!
Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)

flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

niels-garve · 2024-09-12T04:41:16Z

Thanks for digging into this @niels-garve! Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)
flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

Thanks for your prompt reply, @drazvan ! I pushed another approach: what if we leverage the possibility that ActionResult is defined with an optional return value and return None while ensuring inter-action communication via context? With None we signal, we streamed.

I had to alter the library code, though; removing the fallback “I'm not sure what to say.” But I also see a chance of reworking this, as an English default reply blocks multi-language support. What do you think?

I like your Colang 2.0 approach, too. Could the "fact-checking" approach work for Colang 1.0?

flow answer report question
  user said something
  $do_streaming = True
  $answer = execute rag
  bot $answer

(I'll gladly squash the commits in the end; just wanted to keep history while discussing)

niels-garve mentioned this pull request Sep 7, 2024

Streams intermediate LLM calls in custom action. #646

Open

niels-garve force-pushed the feat/enable-streaming-in-custom-actions branch from 0156df1 to 2f3f862 Compare September 9, 2024 07:04

feat(actions): enable streaming in custom actions

bfce7d8

niels-garve force-pushed the feat/enable-streaming-in-custom-actions branch from 2f3f862 to bfce7d8 Compare September 9, 2024 16:10

fix(actions): enable streaming in custom actions

96dcde5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(actions): enable streaming in custom actions #735

feat(actions): enable streaming in custom actions #735

niels-garve commented Sep 7, 2024 •

edited

Loading

niels-garve commented Sep 9, 2024

niels-garve commented Sep 9, 2024

drazvan commented Sep 10, 2024

niels-garve commented Sep 12, 2024

feat(actions): enable streaming in custom actions #735

Are you sure you want to change the base?

feat(actions): enable streaming in custom actions #735

Conversation

niels-garve commented Sep 7, 2024 • edited Loading

Problem description

How to test

niels-garve commented Sep 9, 2024

niels-garve commented Sep 9, 2024

drazvan commented Sep 10, 2024

niels-garve commented Sep 12, 2024

niels-garve commented Sep 7, 2024 •

edited

Loading