Releases: embeddings-benchmark/mteb
1.29.7
1.29.7 (2025-01-16)
Ci
-
ci: only return 1 model_name per file (#1818)
-
only return 1 model_name per file
-
fix args parse
-
revert test change (
d7a7791
)
Fix
- fix: add bge-m3
ModelMeta
(#1821)
add bge (4ac59bc
)
Unknown
-
Add model inf-retriever-v1 (#1744)
-
feat(models): add infly/inf-retriever-v1 model metadata- Add inf_models.py file with metadata for infly/inf-retriever-v1 model
- Update overview.py to include inf_models in model imports
-
Reformat code
-
Update inf-retriever-v1 ModelMeta
-
Fill more information for inf-retriever-v1
-
Add license information for inf-retriever-v1
Co-authored-by: Samuel Yang <[email protected]> (60c4980
)
1.29.6
1.29.5
1.29.4
1.29.4 (2025-01-15)
Fix
-
fix: Added
ModelMeta
for BGE, GTE Chinese and multilingual models (#1811) -
Added BGE Chinese and multilingual-gemma models
-
Added GTE multilingual and Chinese models
-
Fixed date format (
3f5ee82
) -
fix: Zero shot and aggregation on Leaderboard (#1810)
-
Made join_revision filter out no_revision_available when other revisions have been run on the task
-
Fixed zero-shot filtering
-
Fixed aggregation of task types
-
Ran linting (
0acc166
)
1.29.3
1.29.2
1.29.1
1.29.0
1.29.0 (2025-01-13)
Ci
-
ci: fix model loading test (#1775)
-
pass base branch into the make command as an arg
-
test a file that has custom wrapper
-
what about overview
-
just dont check overview
-
revert instance check
-
explicitly omit overview and init
-
remove test change
-
try on a lot of models
-
revert test model file
Co-authored-by: Isaac Chung <[email protected]> (9b117a8
)
Feature
-
feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
-
feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas (
4a70e5d
)