Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4.0 lm-evaluation-harness #15

Open
germanjke opened this issue Mar 6, 2024 · 9 comments
Open

0.4.0 lm-evaluation-harness #15

germanjke opened this issue Mar 6, 2024 · 9 comments
Labels
good first issue Good for newcomers

Comments

@germanjke
Copy link

Hi!

Your benchmarks are functioning well with version 0.3.0 of lm-evaluation-harness. Are there any plans to update and support version 0.4.0?

@LSinev LSinev added the good first issue Good for newcomers label Mar 6, 2024
@LSinev
Copy link
Collaborator

LSinev commented Mar 6, 2024

Yes, there are! :) stay tuned!

@LSinev
Copy link
Collaborator

LSinev commented Mar 7, 2024

Do you have any particular expectations for improvements with the upgrade to the 0.4.0+ backend?

@germanjke
Copy link
Author

@LSinev Hi,
Looks like vllm engine which supported in 0.4.0 works faster than hf engine

@germanjke
Copy link
Author

germanjke commented Apr 17, 2024

hello guys! can I ask you, do you work on this topic, maybe you have some estimated dates?
@LSinev

@LSinev
Copy link
Collaborator

LSinev commented Apr 17, 2024

will give more information next week, or may be even branch for playing/testing work in progress

@LSinev
Copy link
Collaborator

LSinev commented Apr 23, 2024

new_harness_codebase — "work in progress" branch with submoduled patched (waiting for PR to be merged) lm-evaluation-harness.
All scores will change. Leaderboard will not publish these yet, but you can use for private scoring. Baseline models scoring should be done by you. Changes to model running code (lm-evaluation-harness side) should be done at their repository to be supported here.

@germanjke
Copy link
Author

great, thank you!

@germanjke
Copy link
Author

germanjke commented May 28, 2024

Hi @LSinev,

I noticed that the tasks from the branch do not include the MERA tasks in 0.4.x format. I checked the link you provided here, and it seems they are indeed missing.

Could you please confirm if the MERA tasks will be added to this branch, or if there is another location where they might be available?

Thanks!

@LSinev
Copy link
Collaborator

LSinev commented May 28, 2024

I checked the link you provided here, a

This link goes to fork of lm-evaluation-harness. In this fork there is a code needed for RuTiE task, which is PRed in lm-evaluation-harness, but not yet approved and merged.
There is no plans yet to submit MERA tasks directly into lm-evaluation-harness.

new_harness_codebase is using 0.4.x code, but tasks are not in fully yaml format yet (will be, but not yet, just like, for example, SQUADv2 task in lm-evaluation-harness). MERA tasks are stored in https://github.com/ai-forever/MERA/tree/update/new_harness_codebase/benchmark_tasks as new code allows to use tasks from external directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants