Context-Enhanced Video Moment Retrieval with LLMs

Context-Enhanced Video Moment Retrieval with Large Language Models - 2024

context-enhanced-video-moment-retrieval-with-large-language-models.png

Research Paper on Context-Enhanced Video Moment Retrieval with Large Language Models

Research Area: Machine Learning

Abstract:

Current methods for Video Moment Retrieval (VMR) struggle to align complex situations involving specific environmental details, character descriptions, and action narratives. To tackle this issue, we propose a Large Language Model-guided Moment Retrieval (LMR) approach that employs the extensive knowledge of Large Language Models (LLMs) to improve video context representation as well as cross-modal alignment, facilitating accurate localization of target moments. Specifically, LMR introduces a context enhancement technique with LLMs to generate crucial target-related context semantics. These semantics are integrated with visual features for producing discriminative video representations. Finally, a language-conditioned transformer is designed to decode free-form language queries, on the fly, using aligned video representations for moment retrieval. Extensive experiments demonstrate that LMR achieves state-of-the-art results, outperforming the nearest competitor by up to 3.28\% and 4.06\% on the challenging QVHighlights and Charades-STA benchmarks, respectively. More importantly, the performance gains are significantly higher for localization of complex queries.

Keywords:

Author(s) Name: Weijia Liu, Bo Miao, Jiuxin Cao, Xuelin Zhu, Bo Liu, Mehwish Nasim, Ajmal Mian

Journal name: Computer Vision and Pattern Recognition

Conferrence name:

Publisher name: arXiv

DOI: 10.48550/arXiv.2405.12540

Volume Information: Volume 11,(2024)

Paper Link: https://arxiv.org/abs/2405.12540

Office Address

Social List

Context-Enhanced Video Moment Retrieval with Large Language Models - 2024

Research Paper on Context-Enhanced Video Moment Retrieval with Large Language Models

Abstract:

S-Logix (OPC) Private Limited

Office Address

Context-Enhanced Video Moment Retrieval with Large Language Models - 2024

Research Paper on Context-Enhanced Video Moment Retrieval with Large Language Models

Abstract:

Related Papers