Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rag evals][1/n] refactor base scoring fn & data schema check #664

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Dec 19, 2024

What does this PR do?

  • Refactor BaseScoringFn to be just a minimal interface, add new RegistrableBaseScoring
  • Refactor data schema check
    • To separately evaluate retrieval component in RAG, we will have scoring functions needing "context" column additionally.
  • Refactor braintrust eval (more scoring fn added & tested in following PR)

Test Plan

pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py
image
pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
image

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Ran pre-commit to handle lint / formatting issues.
  • Read the contributor guideline,
    Pull Request section?
  • Updated relevant documentation.
  • Wrote necessary unit or integration tests.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 19, 2024
@yanxi0830 yanxi0830 changed the title [rag eval][1/n] refactor base scoring fn & add more braintrust evaluators [rag eval][1/n] refactor base scoring fn & data schema check Dec 19, 2024
@yanxi0830 yanxi0830 marked this pull request as ready for review December 20, 2024 00:10
self.validate_row_schema(row_schema, self.get_expected_schema_for_eval())

def get_expected_schema_for_scoring(self):
return [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will likely need to revisit expected data schema when evaluating for tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its better for scoring/evals/post_training to all define their own schema for what they want the dataset to obey. The actual checking of the schema can be done in datasetio.

@yanxi0830 yanxi0830 changed the title [rag eval][1/n] refactor base scoring fn & data schema check [rag evals][1/n] refactor base scoring fn & data schema check Dec 20, 2024
@@ -13,12 +13,51 @@

class BaseScoringFn(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why we have these base classes because we have already declared what our impls need to do in terms of datatypes in our APIs. so the datatype for a scoring function already exists. Let's say I am implementing a new scoring function -- why do I need another base class and inherit from there? If there is some utilities I need for implementing the functions, they would be just utils / free functions or in the worst case, some mixins.

Can you explain the need for base classes please? I am very allergic to inheritance as you and @raghotham knows :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class BaseScoringFn(ABC): is mostly separated out based on feedback from Tejas for his use case without
registration (syncing with you offline).

For our llama-stack implementations, I agree we don't need this separate BaseScoringFn, and could just use RegisteredBaseScoringFn as mixins.

@@ -0,0 +1,93 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can datasetio abstract schema validation instead of creating mixins to be used by scoring and evals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants