Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to supervise non-math data? #56

Open
Luodian opened this issue Jan 26, 2025 · 4 comments
Open

How to supervise non-math data? #56

Luodian opened this issue Jan 26, 2025 · 4 comments

Comments

@Luodian
Copy link

Luodian commented Jan 26, 2025

I see the accuracy reward only can check the numerical equal? But what if my question is MCQ and asking an option?

I did a quick check and find it's not working.

from math_verify import parse, verify

# Parse the gold and answer
# If you know that gold will only contain latex or expr (no latex env), use
# parse(gold, extraction_config=[LatexExtractionConfig()]) or parse(gold, extraction_config=[ExprExtractionConfig()])

gold = parse("So the answer is B")
answer = parse("B")

print(gold)
print(answer)
# Order here is important!
print(verify(gold, answer))


[]
[]
False
@qgallouedec
Copy link
Member

qgallouedec commented Jan 26, 2025

Might be solved by #55. Can you kindly verify?

@Luodian
Copy link
Author

Luodian commented Jan 26, 2025

great, developing so fast.

@qgallouedec
Copy link
Member

Image

@hynky1999
Copy link
Collaborator

hynky1999 commented Jan 26, 2025

Hi, @Luodian,
You are right the code won't work for anything but Math, but the good thing is that the code for extracting indices is ready in lighteval, I have just not ported the code to math-verify to keep the domain concise see it here:
https://github.com/huggingface/lighteval/pull/495/files#diff-66345b827d57f4adb7b6736115827dd6bbe9383372e3d10f1c2bbdd9992d538aR198

I think I am ok, with also adding it to math-verify, while it's not math exactly, it follow pretty much the same logic for extraction.

One thing we need however is to know, the targets we are gonna search to prevent false positive/negatives during retrieval.
cc @qgallouedec, would it be possible to pass some tasks specification (type: math/code/mcq + n_choices) through kwargs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants