Skip to content

1.29.0

Compare
Choose a tag to compare
@KennethEnevoldsen KennethEnevoldsen released this 13 Jan 17:51
· 19 commits to main since this release

1.29.0 (2025-01-13)

Ci

  • ci: fix model loading test (#1775)

  • pass base branch into the make command as an arg

  • test a file that has custom wrapper

  • what about overview

  • just dont check overview

  • revert instance check

  • explicitly omit overview and init

  • remove test change

  • try on a lot of models

  • revert test model file


Co-authored-by: Isaac Chung <[email protected]> (9b117a8)

Feature

  • feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)

  • feat: Update task filtering, fixing bug on MTEB

  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format

  • remove "en-ext" from AmazonCounterfactualClassification

  • fixed mteb(deu)

  • fix: simplify in a few areas (4a70e5d)