Research Area:  Machine Learning
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further deduced for progressive learning, which gradually focuses on the entire action instances in an “easy-to-hard” manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks. Our code is available in github.com/MengyuanChen21/ECCV2022-DELU.
Keywords:  
Temporal Action
Dual-Evidential Learning
Author(s) Name:  Changsheng Xu,Junyu Gao,Mengyuan Chen,Wei Wang
Journal name:  IEEE Transactions on Pattern Analysis and Machine Intelligence
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TPAMI.2023.3308571
Volume Information:   Volume 45,Pages 15896-15911,(2023)
Paper Link:   https://ieeexplore.ieee.org/document/10230884