More work on llm-as-judge phrase endpointing #688

kwindla · 2024-11-04T05:47:46Z

Some proposed/possible additions to the "natural conversation" phrase endpointing:

change in the pipeline/elements to try to pass all frames in the expected sequence
change the pipeline architecture to do as much as possible in parallel (for latency)
iterated on system prompt for the llm judge
concatenate user messages for the llm judge -- this seems to be necessary for good results, in my testing
temporarily disabled idle timeout to make it easier to test
not-quite-finished interruption handling logic

The llm-as-judge performance is better in this version, I think. Latency is also lower.

There are at least two bugs to fix:

The bot doesn't talk when prompted initially with the pipeline is started with anLLMMessagesFrame. That's easy to fix.
Multiple inferences can cause the TTS to speak over itself. We either need to fix this with proper interruption handling, or move the TTS inference out of the parallel pipelines so that we're not doing TTS inference greedily. That would add a little bit of latency, but might be the right thing from a cost perspective, anyway.

aconchillo · 2024-11-04T23:11:08Z

src/pipecat/processors/filters/function_filter.py

-        return isinstance(frame, SystemFrame)
+    # Ignore system frames and frames that are not following the direction of this gate
+    def _should_passthrough_frame(self, frame, direction):
+        return isinstance(frame, SystemFrame) or direction != self._direction


Thanks! Yes, I remember doing this in the gated aggregator.

manish-baghel · 2024-11-08T11:56:52Z

@kwindla @aconchillo
Thanks for the great example.
Just curious, have you thought about utilizing InterimTranscriptionFrames from stt and feed those to a Completeness pipeline which can directly push frames containing the complete/identified phrase instead of outputting YES/NO?

markbackman

This is a nice improvement upon the work @aconchillo did. I think we should merge it and continue to improve the functionality and prompting as we learn more.

kwindla requested review from aconchillo and markbackman November 4, 2024 05:47

aconchillo reviewed Nov 4, 2024

View reviewed changes

vipyne added bugfix feature and removed bugfix labels Nov 7, 2024

kwindla force-pushed the khk/natural-conversation branch from 4c2f104 to 33931c1 Compare November 8, 2024 16:29

kwindla added 8 commits November 11, 2024 21:04

temp hacking

8743446

contributing to llm-as-judge phrase endpointing work

55a81df

missing commit

bd435d9

fixes for proposed judge pipeline

b56c789

anthropic natural conversation example using claude haiku

b6c2c1f

small fix and more prompt examples

91ac403

gemini audio-in with no transcription

ee53535

some gemini audio input examples

335178f

kwindla force-pushed the khk/natural-conversation branch from 33931c1 to 335178f Compare November 12, 2024 05:08

markbackman approved these changes Nov 14, 2024

View reviewed changes

kwindla merged commit 534f710 into main Nov 14, 2024
3 checks passed

kwindla deleted the khk/natural-conversation branch November 14, 2024 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More work on llm-as-judge phrase endpointing #688

More work on llm-as-judge phrase endpointing #688

kwindla commented Nov 4, 2024 •

edited

Loading

aconchillo Nov 4, 2024

manish-baghel commented Nov 8, 2024 •

edited

Loading

markbackman left a comment

More work on llm-as-judge phrase endpointing #688

More work on llm-as-judge phrase endpointing #688

Conversation

kwindla commented Nov 4, 2024 • edited Loading

aconchillo Nov 4, 2024

Choose a reason for hiding this comment

manish-baghel commented Nov 8, 2024 • edited Loading

markbackman left a comment

Choose a reason for hiding this comment

kwindla commented Nov 4, 2024 •

edited

Loading

manish-baghel commented Nov 8, 2024 •

edited

Loading