Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Align configuration for inference and evaluation #61

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

undo76
Copy link
Contributor

@undo76 undo76 commented Dec 9, 2024

Ensure that evaluation can be configured using the same configuration as inference.

#60

@undo76 undo76 force-pushed the feat/ms-shared-configuration branch from dbc16b0 to 0d21062 Compare December 9, 2024 14:13
@undo76 undo76 force-pushed the feat/ms-shared-configuration branch from 0d21062 to 32f496e Compare December 9, 2024 14:16
Copy link
Member

@lsorber lsorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review on the config modifications only. I chose to focus on this first as it will affect the remainder of the PR as well. My goal is to try and minimise the additions we add to the config dataclass. I'd like to keep it as simple as we can make it. Every parameter should be self-explanatory.

src/raglite/_config.py Outdated Show resolved Hide resolved
@@ -53,6 +67,12 @@ class RAGLiteConfig:
),
compare=False, # Exclude the reranker from comparison to avoid lru_cache misses.
)
search_method: "SearchMethod" = field(default_factory=_default_search_method, compare=False)
system_prompt: str | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which system prompt is this? Can we leave it out of the config? I don't believe we use one for RAG currently.

Copy link
Contributor Author

@undo76 undo76 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the one for the specific use case. It contains information about the assistant role, language, style, etc. I think it is important to keep it here and make it available to the evaluation as it contains valuable information. Also my aim is that just by modifying the Config on could switch from one use case to another without modifying anything else.

In other words, everything that could be modified to improve the performance should be in Config (num_chunks, search method and prompts)

I am using it like this:

 messages = [
        {"role": "system", "content": config.system_prompt},
        *history,
        create_rag_instruction(
            user_prompt=user_prompt,
            context=chunk_spans,
            config=config,
        ),
    ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling we'll end up with potentially many system prompts though in the future, and then it will be difficult to distinguish between them. Can we solve this with the same partial trick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to use the same trick (any suggestions?). I don't think that we should use the assistant instructions (role, language, tone, format, examples, etc) in the rag instructions as they are immutable. One option would be to leave it outside of the configuration as an application specific feature, but then the evaluation would need to take care of the system prompt for the answers and evaluation phase.

Maybe we could configure create_rag_instruction and create_system_prompt as (partial) functions. (I would call them system_prompt and user_prompt,

To be clear: I consider that the evaluation pipeline should take into account the same system prompt that is going to be used during inference. Imagine that the system_prompt says that all the answers should be in Dutch, this information should be taken into account by the answer generation for evaluation.

Another thing I want is to configure everything in a single configuration class. This way switching different versions or use cases becomes trivial.

Finally, (a long shot maybe), by having access to the system_prompt, that describes the particular use cases could be useful for other RAG phases. We could leverage the system prompt in order to augment the chunks with contextual information, hypothetical questions, keywords, etc.

TL;DR: I am willing to change how it is configured (partial or another method), but I think that it should be included in the configuration.

src/raglite/_config.py Outdated Show resolved Hide resolved
@undo76
Copy link
Contributor Author

undo76 commented Dec 12, 2024

Big refactoring to prevent cyclic dependencies. Not fully convinced about the interface yet. In particular I don't like config.retrieval, but it is taking shape. Other thing I don't like is that it is not possible to execute the different phases separately.

@undo76 undo76 force-pushed the feat/ms-shared-configuration branch from f1ef291 to 8e6436e Compare December 13, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants