[MIEB] performance of VOC2007 #1792

gowitheflow-1998 · 2025-01-13T18:01:54Z

am just finalizing MIEB results for overall table in the paper and find VOC2007 results a bit strange https://github.com/embeddings-benchmark/tmp/tree/master (e.g., E5-v gets 70%+ lrap and Voyage multi-modal gets like 20% - these two have very similar performance trend on all other 120+ tasks (voyage multi-modal typically is slightly better); VLM2Vec-full gets 70%+ and VLM2vec-lora gets ~20% as well).

Since this is the only multi-label classification task, I am trying to merge it into the regular linear-probe task type in the leaderboard table. This is pretty much the last task that needs to take a look if anyone has bandwidth!

cc @isaac-chung

The text was updated successfully, but these errors were encountered:

isaac-chung · 2025-01-14T00:13:32Z

I'll take a look. Since I haven't found scores for these models on VOC2007, I'm going to check whether this dataset is absolutely correct and first see if the current VOC2007 matches CLIP benchmark numbers on OpenAI CLIP models.

There are also 2 different multilabel datasets:

Do you see any major discrepancies for VLM2vec-lora or voyage multi-modal on other tasks?

isaac-chung · 2025-01-14T01:44:21Z

This could be related to #1420, where the samples_per_label were too low to observe scaling laws. On openai/clip-vit-base-patch32:

samples_per_label	8	16	32
map	0.660	0.757	0.801
lrap	0.660	0.757	0.801

It's not entirely 1:1 with CLIP benchmark since it's not zero-shot.

If VLM2Vec-full gets 70%+, does that mean samples_per_label=64 or something?

gowitheflow-1998 · 2025-01-14T05:15:34Z

I'll take a look. Since I haven't found scores for these models on VOC2007, I'm going to check whether this dataset is absolutely correct and first see if the current VOC2007 matches CLIP benchmark numbers on OpenAI CLIP models.

scores are here:
Voyage: https://github.com/embeddings-benchmark/tmp/blob/master/voyage-multimodal-3/1/VOC2007.json
E5-v: https://github.com/embeddings-benchmark/tmp/blob/master/royokong__e5-v/0c1f22679417b3ae925d779442221c40cd1861ab/VOC2007.json
and similarly for VLM2Vec;

The main results here were all run without changing samples_per_label I think! I reran E5-v and voyage locally and it's still 70% v.s. 20%.

gowitheflow-1998 · 2025-01-14T05:18:41Z

Do you see any major discrepancies for VLM2vec-lora or voyage multi-modal on other tasks?

Didn't spot any major discrepancies on other tasks when putting together per-task type results!

isaac-chung · 2025-01-14T05:41:32Z

Since I haven't found scores for these models on VOC2007

Ah sorry. I meant I haven't found reported scores by other papers on VOC2007 multilabel for e5v and voyage 😅

gowitheflow-1998 · 2025-01-14T07:42:44Z

Ah sorry. I meant I haven't found reported scores by other papers on VOC2007 multilabel for e5v and voyage 😅

right, should've got what you meant😅. I guess there won't be any as the models are too new.

I think it might be beneficial to debug with per-class accuracy etc (without using the abstask and the evaluator) to see which step might've gone wrong?

isaac-chung · 2025-01-16T12:33:51Z

Odd - I'm getting ~22% for both VLM2Vec-full and VLM2Vec-lora when samples_per_label=8. See the tagged PR below for progress.

isaac-chung · 2025-01-16T15:31:47Z

In short, I was able to get 72% for both lora and full (full slightly higher than lora) when samples_per_label=64.

gowitheflow-1998 added the mieb The image extension of MTEB label Jan 13, 2025

isaac-chung mentioned this issue Jan 16, 2025

[mieb] Investigate voc2007 vlm2vec #1825

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIEB] performance of VOC2007 #1792

[MIEB] performance of VOC2007 #1792

gowitheflow-1998 commented Jan 13, 2025

isaac-chung commented Jan 14, 2025

isaac-chung commented Jan 14, 2025 •

edited

Loading

gowitheflow-1998 commented Jan 14, 2025

gowitheflow-1998 commented Jan 14, 2025

isaac-chung commented Jan 14, 2025

gowitheflow-1998 commented Jan 14, 2025

isaac-chung commented Jan 16, 2025 •

edited

Loading

isaac-chung commented Jan 16, 2025

[MIEB] performance of VOC2007 #1792

[MIEB] performance of VOC2007 #1792

Comments

gowitheflow-1998 commented Jan 13, 2025

isaac-chung commented Jan 14, 2025

isaac-chung commented Jan 14, 2025 • edited Loading

gowitheflow-1998 commented Jan 14, 2025

gowitheflow-1998 commented Jan 14, 2025

isaac-chung commented Jan 14, 2025

gowitheflow-1998 commented Jan 14, 2025

isaac-chung commented Jan 16, 2025 • edited Loading

isaac-chung commented Jan 16, 2025

isaac-chung commented Jan 14, 2025 •

edited

Loading

isaac-chung commented Jan 16, 2025 •

edited

Loading