What's Changed
- [Feature] Support multi_generate by @kennymckormick in #1
- [Tool] Minor updates 1205 by @kennymckormick in #3
- TranCore-M 20231208 by @PCIResearch in #8
- [Result] update TransCore Results by @kennymckormick in #9
- COREMM Evaluation Benchmark by @youngfly11 in #6
- Adding MMVet by @llllIlllll in #7
- [Doc] README Update 12.11 by @kennymckormick in #10
- Updating mmvet_eval by @llllIlllll in #11
- [Feature] Add run.py and simplify the evaluation. by @kennymckormick in #12
- [Doc] Refine README by @kennymckormick in #13
- [Fix] Fix README by @kennymckormick in #14
- add dataset md5 check integrity by @FangXinyu-0913 in #15
- [Feature] Support two Vision APIs by @kennymckormick in #19
- Add COCO dataset by @FangXinyu-0913 in #16
- [Fix] Fix 1221 by @kennymckormick in #21
- [Feature] More Robust API Evaluation by @kennymckormick in #22
- [Refactor] Refactor Custom Prompt & Fix mPLUG-Owl2 acc by @kennymckormick in #23
- [Dataset] VQA Datasets by @kennymckormick in #25
- [Fix] Bug Fix by @kennymckormick in #26
- Add MMMU dataset by @llllIlllll in #18
- Add QwenVLPlus API by @llllIlllll in #27
- [Result] Update MMMU Acc by @kennymckormick in #30
- [Feature] Support
LLaVA_XTuner
models by @LZHgrla in #17 - [Result] Update XTuner Performance by @kennymckormick in #31
- [Result] Update COCO Caption Results by @kennymckormick in #35
- add dataset ChartQA by @FangXinyu-0913 in #28
- [Feature]: Add ScienceQA by @YuanLiuuuuuu in #24
- [Dataset] MathVista dataset by @llllIlllll in #29
- [Dataset] HallusionBench by @kennymckormick in #38
- Add sharedcaptioner and cogvlm by @fitzpchao in #37
- [Fix] Fix GPT error with parallel calling by @kennymckormick in #40
- [Eval] multiple_choice.py: E->Z by @kennymckormick in #41
- [Eval] Use exact matching for Y/N and multi-choice when OPENAI_API_KEY not set by @kennymckormick in #44
- add VLM: monkey by @ShuoZhang2003 in #45
- [Fix] Fix multiple choice evaluation when OPENAI_API_KEY missing by @kennymckormick in #48
- [Improvement] Support non-contiguous choices by @kennymckormick in #49
- Add Emu2 and Emu2_chat by @llllIlllll in #47
- [Dataset] Add DocVQA by @llllIlllll in #50
- [Benchmark] support AI2D by @kennymckormick in #51
- add monkey-chat by @ShuoZhang2003 in #54
- Add
LLaVA-InternLM2
by @LZHgrla in #53 - [Dataset] Support LLaVABench by @kennymckormick in #55
- Support torchrun for emu2&emu2_chat and fix bug by @llllIlllll in #52
- support sharegpt4v-13b by @xiaoachen98 in #56
- Fix bug in file.py by @Ezra-Yu in #58
- [Result] Update Evaluation Results by @kennymckormick in #60
New Contributors
- @kennymckormick made their first contribution in #1
- @PCIResearch made their first contribution in #8
- @youngfly11 made their first contribution in #6
- @llllIlllll made their first contribution in #7
- @FangXinyu-0913 made their first contribution in #15
- @LZHgrla made their first contribution in #17
- @YuanLiuuuuuu made their first contribution in #24
- @fitzpchao made their first contribution in #37
- @ShuoZhang2003 made their first contribution in #45
- @xiaoachen98 made their first contribution in #56
- @Ezra-Yu made their first contribution in #58
Full Changelog: https://github.com/open-compass/VLMEvalKit/commits/v0.1