Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite results when task revision changes #1782

Open
gaohongkui opened this issue Jan 13, 2025 · 1 comment
Open

Overwrite results when task revision changes #1782

gaohongkui opened this issue Jan 13, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@gaohongkui
Copy link

When using CustomTask to evaluate the dataset, I found that whether I update the revision and main_score of the dataset, the result will not be modified synchronously.

  • base result
class MyCustomTask(AbsTaskSTS):
    metadata = TaskMetadata(
        name="DemoSTS",
        dataset={
            "path": "/tmp/demo-sts-bq/demo-sts-bq",
            "revision": "v1"
        },
        description="A Chinese dataset for textual relatedness",
        reference="https://aclanthology.org/2021.emnlp-main.357",
        type="STS",
        category="s2s",
        modalities=["text"],
        eval_splits=["validation", "test"],
        eval_langs=["cmn-Hans"],
        main_score="cosine_spearman",
        date=None,
        domains=["Web"],
        bibtex_citation="",
    )

    @property
    def metadata_dict(self) -> dict[str, str]:
        metadata_dict = super().metadata_dict
        metadata_dict["min_score"] = 0
        metadata_dict["max_score"] = 1
        return metadata_dict

model = SentenceTransformer("BAAI/bge-small-zh-v1.5")
evaluation = MTEB(tasks=[MyCustomTask()])
evaluation.run(model)

image

  • case1: update dataset revision
class MyCustomTask(AbsTaskSTS):
    metadata = TaskMetadata(
        name="DemoSTS",
        dataset={
            "path": "/tmp/demo-sts-bq/demo-sts-bq",
            "revision": "v2"
        },
        description="A Chinese dataset for textual relatedness",
        reference="https://aclanthology.org/2021.emnlp-main.357",
        type="STS",
        category="s2s",
        modalities=["text"],
        eval_splits=["validation", "test"],
        eval_langs=["cmn-Hans"],
        main_score="cosine_spearman",
        date=None,
        domains=["Web"],
        bibtex_citation="",
    )

    @property
    def metadata_dict(self) -> dict[str, str]:
        metadata_dict = super().metadata_dict
        metadata_dict["min_score"] = 0
        metadata_dict["max_score"] = 1
        return metadata_dict

model = SentenceTransformer("BAAI/bge-small-zh-v1.5")
evaluation = MTEB(tasks=[MyCustomTask()])
evaluation.run(model)

image

The revision of "result" is still v1.

  • case 2: change task main_score
class MyCustomTask(AbsTaskSTS):
    metadata = TaskMetadata(
        name="DemoSTS",
        dataset={
            "path": "/tmp/demo-sts-bq/demo-sts-bq",
            "revision": "v2"
        },
        description="A Chinese dataset for textual relatedness",
        reference="https://aclanthology.org/2021.emnlp-main.357",
        type="STS",
        category="s2s",
        modalities=["text"],
        eval_splits=["validation", "test"],
        eval_langs=["cmn-Hans"],
        main_score="cosine_pearson",
        date=None,
        domains=["Web"],
        bibtex_citation="",
    )

    @property
    def metadata_dict(self) -> dict[str, str]:
        metadata_dict = super().metadata_dict
        metadata_dict["min_score"] = 0
        metadata_dict["max_score"] = 1
        return metadata_dict

model = SentenceTransformer("BAAI/bge-small-zh-v1.5")
evaluation = MTEB(tasks=[MyCustomTask()])
evaluation.run(model)

image

The result still remains unchanged.

I understand that adding overwrite_results=True can be used to forcibly update the results, but this is more like a hacky way.

I observe that when the revision of the Model is modified, a re-evaluation process is triggered, which is as expected.

Additionally, when a large number of CustomTasks are defined, overwrite_results=True is not a good solution.

@KennethEnevoldsen
Copy link
Contributor

So you are suggestion that we overwrite results if the revision does not match. I would be perfectly fine with that. One could be even more conservative and overwrite if the version does not match (would probably be too much), however, maybe we should overwrite results is the mteb version is below a threshold (e.g. v1.12.0, where the new results were defined)

@KennethEnevoldsen KennethEnevoldsen added the enhancement New feature or request label Jan 13, 2025
@KennethEnevoldsen KennethEnevoldsen changed the title update Task info, but results not update Overwrite results when task revision changes Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants