feat: Align configuration for inference and evaluation #61

undo76 · 2024-12-09T14:10:13Z

Ensure that evaluation can be configured using the same configuration as inference.

lsorber

Partial review on the config modifications only. I chose to focus on this first as it will affect the remainder of the PR as well. My goal is to try and minimise the additions we add to the config dataclass. I'd like to keep it as simple as we can make it. Every parameter should be self-explanatory.

src/raglite/_config.py

lsorber · 2024-12-10T15:10:12Z

src/raglite/_config.py

@@ -53,6 +67,12 @@ class RAGLiteConfig:
        ),
        compare=False,  # Exclude the reranker from comparison to avoid lru_cache misses.
    )
+    search_method: "SearchMethod" = field(default_factory=_default_search_method, compare=False)
+    system_prompt: str | None = None


Which system prompt is this? Can we leave it out of the config? I don't believe we use one for RAG currently.

It is the one for the specific use case. It contains information about the assistant role, language, style, etc. I think it is important to keep it here and make it available to the evaluation as it contains valuable information. Also my aim is that just by modifying the Config on could switch from one use case to another without modifying anything else.

In other words, everything that could be modified to improve the performance should be in Config (num_chunks, search method and prompts)

I am using it like this:

messages = [ {"role": "system", "content": config.system_prompt}, *history, create_rag_instruction( user_prompt=user_prompt, context=chunk_spans, config=config, ), ]

I have the feeling we'll end up with potentially many system prompts though in the future, and then it will be difficult to distinguish between them. Can we solve this with the same partial trick?

I don't know how to use the same trick (any suggestions?). I don't think that we should use the assistant instructions (role, language, tone, format, examples, etc) in the rag instructions as they are immutable. One option would be to leave it outside of the configuration as an application specific feature, but then the evaluation would need to take care of the system prompt for the answers and evaluation phase.

Maybe we could configure create_rag_instruction and create_system_prompt as (partial) functions. (I would call them system_prompt and user_prompt,

To be clear: I consider that the evaluation pipeline should take into account the same system prompt that is going to be used during inference. Imagine that the system_prompt says that all the answers should be in Dutch, this information should be taken into account by the answer generation for evaluation.

Another thing I want is to configure everything in a single configuration class. This way switching different versions or use cases becomes trivial.

Finally, (a long shot maybe), by having access to the system_prompt, that describes the particular use cases could be useful for other RAG phases. We could leverage the system prompt in order to augment the chunks with contextual information, hypothetical questions, keywords, etc.

TL;DR: I am willing to change how it is configured (partial or another method), but I think that it should be included in the configuration.

src/raglite/_config.py

undo76 · 2024-12-12T21:37:21Z

Big refactoring to prevent cyclic dependencies. Not fully convinced about the interface yet. In particular I don't like config.retrieval, but it is taking shape. Other thing I don't like is that it is not possible to execute the different phases separately.

undo76 force-pushed the feat/ms-shared-configuration branch from dbc16b0 to 0d21062 Compare December 9, 2024 14:13

feat: Align configuration for inference and evaluation

32f496e

undo76 force-pushed the feat/ms-shared-configuration branch from 0d21062 to 32f496e Compare December 9, 2024 14:16

undo76 added 3 commits December 10, 2024 12:01

fix: Failing test

e60009f

fix: Failing test (oversample in search)

4f93902

fix: Failing test (oversample in search)

cf6efec

lsorber requested changes Dec 10, 2024

View reviewed changes

undo76 added 4 commits December 10, 2024 17:30

fix: Failing test (oversample in search)

605307d

fix: Conditional parameters in pgvector.

93e0495

fix: Refactoring

6002da1

fix: Rerank test

501063b

undo76 added 2 commits December 12, 2024 23:01

fix: Fix text and refactor hybrid search

2a14928

fix: Fix text and refactor hybrid search

8e6436e

undo76 force-pushed the feat/ms-shared-configuration branch from f1ef291 to 8e6436e Compare December 13, 2024 09:32

fix: Move functions and prompts out of config.

c83c586

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Align configuration for inference and evaluation #61

feat: Align configuration for inference and evaluation #61

undo76 commented Dec 9, 2024

lsorber left a comment

lsorber Dec 10, 2024

undo76 Dec 10, 2024 •

edited

Loading

lsorber Dec 11, 2024

undo76 Dec 12, 2024

undo76 commented Dec 12, 2024

feat: Align configuration for inference and evaluation #61

Are you sure you want to change the base?

feat: Align configuration for inference and evaluation #61

Conversation

undo76 commented Dec 9, 2024

lsorber left a comment

Choose a reason for hiding this comment

lsorber Dec 10, 2024

Choose a reason for hiding this comment

undo76 Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

lsorber Dec 11, 2024

Choose a reason for hiding this comment

undo76 Dec 12, 2024

Choose a reason for hiding this comment

undo76 commented Dec 12, 2024

undo76 Dec 10, 2024 •

edited

Loading