✨✨Latest Advances on Multimodal Large Language Models
-
Updated
Nov 9, 2024
✨✨Latest Advances on Multimodal Large Language Models
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Speech, Language, Audio, Music Processing with Large Language Model
A collection of resources on applications of multi-modal learning in medical imaging.
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Research Trends in LLM-guided Multimodal Learning.
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."