0.4.0 lm-evaluation-harness #15

germanjke · 2024-03-06T14:44:40Z

Hi!

Your benchmarks are functioning well with version 0.3.0 of lm-evaluation-harness. Are there any plans to update and support version 0.4.0?

LSinev · 2024-03-06T19:24:00Z

Yes, there are! :) stay tuned!

LSinev · 2024-03-07T06:43:17Z

Do you have any particular expectations for improvements with the upgrade to the 0.4.0+ backend?

germanjke · 2024-03-07T06:54:34Z

@LSinev Hi,
Looks like vllm engine which supported in 0.4.0 works faster than hf engine

germanjke · 2024-04-17T12:46:19Z

hello guys! can I ask you, do you work on this topic, maybe you have some estimated dates?
@LSinev

LSinev · 2024-04-17T18:57:20Z

will give more information next week, or may be even branch for playing/testing work in progress

LSinev · 2024-04-23T06:18:52Z

new_harness_codebase — "work in progress" branch with submoduled patched (waiting for PR to be merged) lm-evaluation-harness.
All scores will change. Leaderboard will not publish these yet, but you can use for private scoring. Baseline models scoring should be done by you. Changes to model running code (lm-evaluation-harness side) should be done at their repository to be supported here.

germanjke · 2024-04-23T22:10:02Z

great, thank you!

germanjke · 2024-05-28T09:22:09Z

Hi @LSinev,

I noticed that the tasks from the branch do not include the MERA tasks in 0.4.x format. I checked the link you provided here, and it seems they are indeed missing.

Could you please confirm if the MERA tasks will be added to this branch, or if there is another location where they might be available?

Thanks!

LSinev · 2024-05-28T09:48:58Z

I checked the link you provided here, a

This link goes to fork of lm-evaluation-harness. In this fork there is a code needed for RuTiE task, which is PRed in lm-evaluation-harness, but not yet approved and merged.
There is no plans yet to submit MERA tasks directly into lm-evaluation-harness.

new_harness_codebase is using 0.4.x code, but tasks are not in fully yaml format yet (will be, but not yet, just like, for example, SQUADv2 task in lm-evaluation-harness). MERA tasks are stored in https://github.com/ai-forever/MERA/tree/update/new_harness_codebase/benchmark_tasks as new code allows to use tasks from external directory.

LSinev added the good first issue Good for newcomers label Mar 6, 2024

LSinev mentioned this issue Apr 4, 2024

Скоринг GGUF моделей #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.4.0 lm-evaluation-harness #15

0.4.0 lm-evaluation-harness #15

germanjke commented Mar 6, 2024

LSinev commented Mar 6, 2024

LSinev commented Mar 7, 2024

germanjke commented Mar 7, 2024

germanjke commented Apr 17, 2024 •

edited

Loading

LSinev commented Apr 17, 2024

LSinev commented Apr 23, 2024

germanjke commented Apr 23, 2024

germanjke commented May 28, 2024 •

edited

Loading

LSinev commented May 28, 2024

0.4.0 lm-evaluation-harness #15

0.4.0 lm-evaluation-harness #15

Comments

germanjke commented Mar 6, 2024

LSinev commented Mar 6, 2024

LSinev commented Mar 7, 2024

germanjke commented Mar 7, 2024

germanjke commented Apr 17, 2024 • edited Loading

LSinev commented Apr 17, 2024

LSinev commented Apr 23, 2024

germanjke commented Apr 23, 2024

germanjke commented May 28, 2024 • edited Loading

LSinev commented May 28, 2024

germanjke commented Apr 17, 2024 •

edited

Loading

germanjke commented May 28, 2024 •

edited

Loading