A curated list of awesome Multimodal studies.
Title | Venue | Date | Code | Supplement |
---|---|---|---|---|
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning (VideoVista) | arXiv | 2024-06-17 | ||
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis (Video-MME) | arXiv | 2024-05-31 | ||
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark (MVBench) | CVPR 2024 highlight | 2023-11-28 | ||
Perception Test: A Diagnostic Benchmark for Multimodal Video Models (Perception Test, by Google DeepMind) | NeurIPS 2023 | 2023-05-23 |
Title | Venue | Date | Code | Supplement |
---|---|---|---|---|
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? (LaDiC) | NAACL 2024 | 2024-04-16 | - |