Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(actions): enable streaming in custom actions #735

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

niels-garve
Copy link
Contributor

@niels-garve niels-garve commented Sep 7, 2024

🚨 Updates in the discussions below

Fixes #646

Problem description

First of all, thanks for NeMo-Guardrails!

Given two consecutive actions. The first one is a custom RAG, and the second one analyzes the answer to render a disclaimer in case the answer is not grounded in the knowledge base. It is like fact-checking, but with streaming enabled. The bot should answer and finish like: "I learn something new every day, so my answers may not always be perfect."

Using streaming currently leads to two errors:

  1. The streaming handler finishes after the first action. The disclaimer is not streamed. This is because the variable streaming_finished_event is set, which in turn is caused by an empty chunk ("") that is passed to on_llm_new_token. The existing if statement checks for empty chunks, but only when they occur at the beginning. In our case, it happens at the end. I extended the check so that "" is never being processed.
  2. For downstream usage, the first action returns the final answer, which has also been streamed. When the action finishes, the accumulated result is emitted, which is why you end up with duplicate sentences in the result:
    Question: Hi
    
    Answer:
    I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!
    I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!I learn 
    something new every day, so my answers may not always be perfect.
    
    This one I solved in the _process function by adding an early return in case chunk == self.completion

How to test

I've added an example under examples/configs/rag/custom_rag_streaming which you can test like so:

$ export OPENAI_API_KEY='sk-xxx'
$ python -m nemoguardrails.__main__ chat --config /<path_to>/examples/configs/rag/custom_rag_streaming --streaming

Please also follow the README.md I've included.

I'm happy to hear your feedback!

@drazvan @mikeolubode

@niels-garve
Copy link
Contributor Author

Hello @drazvan @mikeolubode, I found a neat solution without altering the library. So, I'm just requesting my example be pulled. What do you think of the idea of using a local streaming handler that filters out and handles stream-stopping chunks ("" and None) while keeping the main streaming handler open?

@niels-garve niels-garve force-pushed the feat/enable-streaming-in-custom-actions branch from 2f3f862 to bfce7d8 Compare September 9, 2024 16:10
@niels-garve
Copy link
Contributor Author

Update: the duplicate chunks as a result of streaming within an action and returning its result must still be handled

@drazvan
Copy link
Collaborator

drazvan commented Sep 10, 2024

Thanks for digging into this @niels-garve!
Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)

flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

@niels-garve
Copy link
Contributor Author

Thanks for digging into this @niels-garve! Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)

flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

Thanks for your prompt reply, @drazvan ! I pushed another approach: what if we leverage the possibility that ActionResult is defined with an optional return value and return None while ensuring inter-action communication via context? With None we signal, we streamed.

I had to alter the library code, though; removing the fallback “I'm not sure what to say.” But I also see a chance of reworking this, as an English default reply blocks multi-language support. What do you think?

I like your Colang 2.0 approach, too. Could the "fact-checking" approach work for Colang 1.0?

flow answer report question
  user said something
  $do_streaming = True
  $answer = execute rag
  bot $answer

(I'll gladly squash the commits in the end; just wanted to keep history while discussing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streams intermediate LLM calls in custom action.
2 participants