Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More work on llm-as-judge phrase endpointing #688

Merged
merged 8 commits into from
Nov 14, 2024
Merged

Conversation

kwindla
Copy link
Contributor

@kwindla kwindla commented Nov 4, 2024

Some proposed/possible additions to the "natural conversation" phrase endpointing:

  • change in the pipeline/elements to try to pass all frames in the expected sequence
  • change the pipeline architecture to do as much as possible in parallel (for latency)
  • iterated on system prompt for the llm judge
  • concatenate user messages for the llm judge -- this seems to be necessary for good results, in my testing
  • temporarily disabled idle timeout to make it easier to test
  • not-quite-finished interruption handling logic

The llm-as-judge performance is better in this version, I think. Latency is also lower.

There are at least two bugs to fix:

  1. The bot doesn't talk when prompted initially with the pipeline is started with anLLMMessagesFrame. That's easy to fix.
  2. Multiple inferences can cause the TTS to speak over itself. We either need to fix this with proper interruption handling, or move the TTS inference out of the parallel pipelines so that we're not doing TTS inference greedily. That would add a little bit of latency, but might be the right thing from a cost perspective, anyway.

return isinstance(frame, SystemFrame)
# Ignore system frames and frames that are not following the direction of this gate
def _should_passthrough_frame(self, frame, direction):
return isinstance(frame, SystemFrame) or direction != self._direction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, I remember doing this in the gated aggregator.

@manish-baghel
Copy link

manish-baghel commented Nov 8, 2024

@kwindla @aconchillo
Thanks for the great example.
Just curious, have you thought about utilizing InterimTranscriptionFrames from stt and feed those to a Completeness pipeline which can directly push frames containing the complete/identified phrase instead of outputting YES/NO?

Copy link
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement upon the work @aconchillo did. I think we should merge it and continue to improve the functionality and prompting as we learn more.

@kwindla kwindla merged commit 534f710 into main Nov 14, 2024
3 checks passed
@kwindla kwindla deleted the khk/natural-conversation branch November 14, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants