Over the past decades, human action recognition has become a dynamic research area owing to the expansion of electronic devices such as mobile phones, smart cell phones and video cameras in daily life. Human action recognition has a broad span of applications, such as intelligent video surveillance and environmental home monitoring, video storage and retrieval, intelligent human-machine interfaces, human-computer interaction, and identity recognition.
Even though accurate and efficient human action recognition remains challenging in computer vision and machine learning, with the advent of deep learning and high computation powers, deep learning and artificial intelligence techniques are applied for feature extraction and classification of simple and complex human action recognition.
In diverse human action recognition applications, deep learning technology provides highly accurate outcomes with larger activity datasets. Some recent works in human action recognition focus on action classification, human–object interaction recognition, and action detection methods.
Human action recognition works on various research topics in computer vision, including human detection in video, human pose estimation, human tracking, and the interpretation and understanding of time series data. Here, the deep learning architectures applied to human action recognition are highlighted;
• Spatio-temporal networks: Convolution neural networks, 3D convolutional networks, recurrent neural networks, long short-term memory, long-term recurrent convolutional network, CNN-LSTM, and various fusion techniques are employed to perform Spatio-temporal operations in human action recognition.
• Multiple stream networks: Deep convolutional neural networks are applied for multiple stream networks enabled human action recognition systems with multi-task learning concepts.
• Deep generative networks: Deep generative architectures applied for human activity recognition are dyencoder to synthesize dynamic textures, LSTM auto-encoder, and adversarial models.
• Temporal coherency networks: Recurrent convolutional networks, Siamese networks, and Siamese triple networks are temporal coherency networks utilized in human action recognition.