Research Area:  Machine Learning
Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU) demonstrate the effectiveness of our method on the targeted tasks.
Keywords:  
2d-3d Pose Estimation
Action Recognition
Multitask Deep Learning
Machine Learning
Author(s) Name:   Diogo C. Luvizon; David Picard; Hedi Tabia
Journal name:  
Conferrence name:  IEEE/CVF Conference on Computer Vision and Pattern Recognition
Publisher name:  IEEE
DOI:   10.1109/CVPR.2018.00539
Volume Information:  
Paper Link:   https://ieeexplore.ieee.org/document/8578637