MM-LLMS Recent Advances in Multimodal Large Language Models

MM-LLMS Recent Advances in Multimodal Large Language Models - 2024

Research Paper on MM-LLMS Recent Advances in Multimodal Large Language Models

Research Area: Machine Learning

Abstract:

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

Keywords:

Author(s) Name: Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu

Journal name: Computation and Language

Conferrence name:

Publisher name: arXiv

DOI: 10.48550/arXiv.2401.13601

Volume Information: volume 83,(2024)

Paper Link: https://arxiv.org/abs/2401.13601

Office Address

Social List