List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Object-Centric Cross-Modal Knowledge Reasoning for Future Event Prediction in Videos - 2024

object-centric-cross-modal-knowledge-reasoning-for-future-event-prediction-in-videos.png

Research Paper on Object-Centric Cross-Modal Knowledge Reasoning for Future Event Prediction in Videos

Research Area:  Machine Learning

Abstract:

Although multi-modal large language models possess impressive cross-modal reasoning and prediction capabilities, they lack a unified and rigorous evaluation standard. In this paper, we introduce a future event prediction task to assess the cross-modal temporal prediction capabilities of these models. This task requires the model to generate descriptions of events that may occur in the future based on input video. To tackle this new task, we propose an object-centric cross-modal knowledge reasoning framework, which combines a basic information encoder, an adaptive multi-segment filter, a spatial-temporal relation encoder, a vision-text interaction module, and a pre-trained large language model decoder. The adaptive multi-segment filter captures selectively capture critical visual information in videos, enhancing the model’s focus on relevant features. The spatial-temporal relation encoder decomposes and associates the objects and scene information in the video. Additionally, the vision-text interaction module enhances the connection between visual sequences and their corresponding textual narratives, ensuring semantic coherence and consistency. To evaluate our framework, we constructed a dataset containing descriptions, dialogues of future events, and object-centric event reasoning chains. Experimental results indicate that the proposed framework outperforms all previous methods for future event prediction. Ablation studies further demonstrate the effectiveness of the designed modules.

Keywords:  

Author(s) Name:  Chenghang Lai; Haibo Wang; Weifeng Ge; Xiangyang Xue

Journal name:  IEEE Transactions on Circuits and Systems for Video Technology

Conferrence name:  

Publisher name:  IEEE

DOI:  10.1109/TCSVT.2024.3444895

Volume Information:  Volume: 34 , Pages: 13324 - 13337 , (2024)