-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Align configuration for inference and evaluation #61
base: main
Are you sure you want to change the base?
Conversation
dbc16b0
to
0d21062
Compare
0d21062
to
32f496e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review on the config modifications only. I chose to focus on this first as it will affect the remainder of the PR as well. My goal is to try and minimise the additions we add to the config dataclass. I'd like to keep it as simple as we can make it. Every parameter should be self-explanatory.
src/raglite/_config.py
Outdated
@@ -53,6 +67,12 @@ class RAGLiteConfig: | |||
), | |||
compare=False, # Exclude the reranker from comparison to avoid lru_cache misses. | |||
) | |||
search_method: "SearchMethod" = field(default_factory=_default_search_method, compare=False) | |||
system_prompt: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which system prompt is this? Can we leave it out of the config? I don't believe we use one for RAG currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the one for the specific use case. It contains information about the assistant role, language, style, etc. I think it is important to keep it here and make it available to the evaluation as it contains valuable information. Also my aim is that just by modifying the Config on could switch from one use case to another without modifying anything else.
In other words, everything that could be modified to improve the performance should be in Config (num_chunks, search method and prompts)
I am using it like this:
messages = [
{"role": "system", "content": config.system_prompt},
*history,
create_rag_instruction(
user_prompt=user_prompt,
context=chunk_spans,
config=config,
),
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling we'll end up with potentially many system prompts though in the future, and then it will be difficult to distinguish between them. Can we solve this with the same partial
trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to use the same trick (any suggestions?). I don't think that we should use the assistant instructions (role, language, tone, format, examples, etc) in the rag instructions as they are immutable. One option would be to leave it outside of the configuration as an application specific feature, but then the evaluation would need to take care of the system prompt for the answers and evaluation phase.
Maybe we could configure create_rag_instruction and create_system_prompt as (partial) functions. (I would call them system_prompt and user_prompt,
To be clear: I consider that the evaluation pipeline should take into account the same system prompt that is going to be used during inference. Imagine that the system_prompt says that all the answers should be in Dutch, this information should be taken into account by the answer generation for evaluation.
Another thing I want is to configure everything in a single configuration class. This way switching different versions or use cases becomes trivial.
Finally, (a long shot maybe), by having access to the system_prompt, that describes the particular use cases could be useful for other RAG phases. We could leverage the system prompt in order to augment the chunks with contextual information, hypothetical questions, keywords, etc.
TL;DR: I am willing to change how it is configured (partial or another method), but I think that it should be included in the configuration.
Big refactoring to prevent cyclic dependencies. Not fully convinced about the interface yet. In particular I don't like |
f1ef291
to
8e6436e
Compare
Ensure that evaluation can be configured using the same configuration as inference.
#60