-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPRIORITIZED][AAQ-765] Retry LLM generation when AlignScore fails #399
base: main
Are you sure you want to change the base?
Conversation
asession=asession, | ||
exclude_archived=True, | ||
) | ||
response.debug_info["past_failure"] = failure_reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that in case it works after the second try to understand why it failed first.
@@ -307,6 +320,39 @@ async def search_base( | |||
return response | |||
|
|||
|
|||
def is_unable_to_generate_response(response: QueryResponse) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this function retry only if that condition is met.
|
||
|
||
@backoff.on_predicate( | ||
backoff.expo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What backoff.expo
does is basically waiting a little more everytime the function is reran in an exponential way just to handle the load better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just have a logic that retries once, instead of adding a config (num retries) we don't know if we'll use 🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how useful the approach is, since we haven't done any analysis to see how well it works.
But personnally I think since it doesn't add a dependency (backoff being used by litellm), and since the only code we would change if we retry just once is the decorator and the config variable, the cost is pretty low, so we can just keep it.
@@ -177,6 +177,11 @@ def get_prompt(cls) -> str: | |||
You are a helpful question-answering AI. You understand user question and answer their \ | |||
question using the REFERENCE TEXT below. | |||
""" | |||
RETRY_PROMPT_SUFFIX = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suffix to the prompt to incorporate failure reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Carlos! I think we need to think and validate the prompt a bit. Should we discuss in the next tech session?
@@ -37,7 +37,14 @@ async def get_llm_rag_answer( | |||
""" | |||
|
|||
metadata = metadata or {} | |||
prompt = RAG.prompt.format(context=context, original_language=original_language) | |||
if "failure_reason" in metadata and metadata["failure_reason"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we create a new arg, "retry=False"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside is
- We would have to create it for all the parent functions and
- We need both
is_retry
andmetadata["failure_reaon"]
to actually do retry.
But I think it would be easier to understand the code, and we won't be hiding any unexpected actions! What do you think?
Something like
if is_retry:
if "failure_reason" not in metadata:
raise ValueError("failure_reason is required for retry requests")
prompt = RAG.retry_prompt.format(
context=context,
original_language=original_language,
failure_reason=metadata["failure_reason"],
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial understanding was that we are using this to try the functionality. What if we keep it like this while testing, and if it turns out to be something we want to keep, the we will explicitly set it as a functionality by addind the is_retry
parameter. What do you think?
RETRY_PROMPT_SUFFIX = """ | ||
If the response above is not aligned with the question, please rectify this by \ | ||
considering the following reason(s) for misalignment: "{failure_reason}". | ||
Make necessary adjustments to ensure the answer is aligned with the question. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we are only passing failure_reason
which is response.debug_info["factual_consistency"]["reason"]
,
but we should also include the LLM response in this prompt..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't the prompt define what we mean by alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. To be honest, I was just having a go at updating the prompt to take the output into consideration. I am not exactly an expert in prompt engineering. Should we discuss that in a tech session?
|
||
|
||
@backoff.on_predicate( | ||
backoff.expo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just have a logic that retries once, instead of adding a config (num retries) we don't know if we'll use 🤔 ?
3858453
to
a5a0ac9
Compare
Reviewer: @amiraliemami
Estimate: 40mins
Ticket
Fixes:AAQ-675
Description
Goal
The goal of this PR is to allow retrying LLM response again when AlignScore fails because of a low score N times (default is 0).
Changes
The following changes have been made:
QueryResponseError
and the error is low alignment score. Also allowed to add the previous failure raison inresponse.debug_info["past_failure"]
Future Tasks (optional)
How has this been tested?
Testing this is tricky because for this change to be observed we need LLM response to work but AlignScore to fail, and finding these cases are not straighforward.
Was tested two ways:
First way is :
ALIGN_SCORE_THRESHOLD
to an unrealistic score (example 1.5). That way AlignScore fails/search
withgenerate_lllm_response
set totrue
. with a content and a question relevant for the content. An example is content:Here we are going to talk about pineapples because of their pine shapes and the applee like tast
and question:
Are apple related to pineapples
,Second way was by still setting
ALIGN_SCORE_THRESHOLD
to a value >1 but then adding a logic in the code to make sure the value ofALIGN_SCORE_THRESHOLD
is reduced to a reasonable value everytime the LLM response is regenerated to make sure that after the second retry the AlignScore check passes. So, in second run,ALIGN_SCORE_THRESHOLD
should be less than 0.8. This approach is not straightforward. I am open to more efficient ways of testing this feature.Checklist
Fill with
x
for completed.(Delete any items below that are not relevant)
scripts/
.github/workflows/