[rag evals][1/n] refactor base scoring fn & data schema check #664

yanxi0830 · 2024-12-19T22:14:28Z

What does this PR do?

Refactor BaseScoringFn to be just a minimal interface, add new RegistrableBaseScoring
Refactor data schema check
- To separately evaluate retrieval component in RAG, we will have scoring functions needing "context" column additionally.
Refactor braintrust eval (more scoring fn added & tested in following PR)

Test Plan

pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py

pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Ran pre-commit to handle lint / formatting issues.
Read the contributor guideline,
Pull Request section?
Updated relevant documentation.
Wrote necessary unit or integration tests.

yanxi0830 · 2024-12-20T01:14:16Z

llama_stack/providers/utils/common/data_schema_validator_mixin.py

+        self.validate_row_schema(row_schema, self.get_expected_schema_for_eval())
+
+    def get_expected_schema_for_scoring(self):
+        return [


We will likely need to revisit expected data schema when evaluating for tools.

its better for scoring/evals/post_training to all define their own schema for what they want the dataset to obey. The actual checking of the schema can be done in datasetio.

ashwinb · 2024-12-21T02:52:24Z

llama_stack/providers/utils/scoring/base_scoring_fn.py

@@ -13,12 +13,51 @@

 class BaseScoringFn(ABC):


I still don't understand why we have these base classes because we have already declared what our impls need to do in terms of datatypes in our APIs. so the datatype for a scoring function already exists. Let's say I am implementing a new scoring function -- why do I need another base class and inherit from there? If there is some utilities I need for implementing the functions, they would be just utils / free functions or in the worst case, some mixins.

Can you explain the need for base classes please? I am very allergic to inheritance as you and @raghotham knows :)

This class BaseScoringFn(ABC): is mostly separated out based on feedback from Tejas for his use case without
registration (syncing with you offline).

For our llama-stack implementations, I agree we don't need this separate BaseScoringFn, and could just use RegisteredBaseScoringFn as mixins.

raghotham · 2024-12-27T01:44:59Z

llama_stack/providers/utils/common/data_schema_validator_mixin.py

@@ -0,0 +1,93 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


Can datasetio abstract schema validation instead of creating mixins to be used by scoring and evals?

yanxi0830 added 3 commits December 19, 2024 11:49

refactor base scoring fn v.s. registerable scoring fn

0096c1a

refactor base scoring fn v.s. registerable scoring fn

199f92d

Merge branch 'main' into rag_scoring_fn_1

13720cb

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 19, 2024

yanxi0830 added 2 commits December 19, 2024 14:26

scoring

1094f26

refactor schema check

55e4f4e

yanxi0830 changed the title ~~[rag eval][1/n] refactor base scoring fn & add more braintrust evaluators~~ [rag eval][1/n] refactor base scoring fn & data schema check Dec 19, 2024

yanxi0830 marked this pull request as ready for review December 20, 2024 00:10

yanxi0830 requested review from ashwinb, hardikjshah, dltn, raghotham, dineshyv and vladimirivic as code owners December 20, 2024 00:10

yanxi0830 added 3 commits December 19, 2024 16:20

refactor schema check

c15b0d5

clean up

c4af8f8

clean up

b94ab8d

yanxi0830 commented Dec 20, 2024

View reviewed changes

yanxi0830 changed the title ~~[rag eval][1/n] refactor base scoring fn & data schema check~~ [rag evals][1/n] refactor base scoring fn & data schema check Dec 20, 2024

ashwinb reviewed Dec 21, 2024

View reviewed changes

raghotham reviewed Dec 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rag evals][1/n] refactor base scoring fn & data schema check #664

[rag evals][1/n] refactor base scoring fn & data schema check #664

yanxi0830 commented Dec 19, 2024 •

edited

Loading

yanxi0830 Dec 20, 2024

raghotham Dec 27, 2024

ashwinb Dec 21, 2024

yanxi0830 Dec 26, 2024

raghotham Dec 27, 2024

		@@ -0,0 +1,93 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

[rag evals][1/n] refactor base scoring fn & data schema check #664

Are you sure you want to change the base?

[rag evals][1/n] refactor base scoring fn & data schema check #664

Conversation

yanxi0830 commented Dec 19, 2024 • edited Loading

What does this PR do?

Test Plan

Before submitting

yanxi0830 Dec 20, 2024

Choose a reason for hiding this comment

raghotham Dec 27, 2024

Choose a reason for hiding this comment

ashwinb Dec 21, 2024

Choose a reason for hiding this comment

yanxi0830 Dec 26, 2024

Choose a reason for hiding this comment

raghotham Dec 27, 2024

Choose a reason for hiding this comment

yanxi0830 commented Dec 19, 2024 •

edited

Loading