Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance - 2022

Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance

Research paper on Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance

Research Area:  Machine Learning

Abstract:

Multimodal fusion has become one of the hottest topics in affective computing and other research areas. Yet, this topic is less studied in surveillance systems. In general, it focused only on a single modality mainly video or audio. Nevertheless, the fusion of multiple data from different sources and modalities is challenging especially for the detection of aggression because of its complexity. Indeed, aggression is difficult to be defined due to its ambiguity. Therefore, this high level concept is hard to be described through simple rules or trivial algorithms and classifiers. This article presents an approach based on four multimodal fusion methods using the audio, video and text modalities, as well as extra-information with Deep Learning techniques to detect aggression in surveillance. We investigate the combination of acoustic, visual and text-based features, as well as a set of five meta-features which have an influence on the fusion process and the aggression intensity. The proposed architectures use Deep Neural Networks (DNNs) although the dataset is relatively small. The neural network models show their effectiveness not only in prediction tasks, but also in feature extraction and dimension reduction. The first multimodal fusion method is based on an intermediate level that contains the predictions of audio, video and the five meta-features by implementing multiple DNNs. As our previous works on aggression detection based on an intermediate level and deep neural networks on a restricted subset of data also indicate the same findings, we extend these works by working on the whole dataset. The second fusion method uses the concatenation of the different features with the meta-features as labels without the intermediate level. The other two fusion methods are based on element-wise operations as a first step then on the concatenation as a second one. One method uses the element-wise product and the other one uses the element-wise addition. Among these proposed models, we find that the second method surpasses the performance of the other methods with an unweighted average accuracy of 85.66%, and weighted average accuracy of 86.35%, as well as the existing similar approach, for the detection of aggression, that provided accuracies of 75% and 77% for the unweighted average and weighted average. Overall, we think that each fusion method has its advantages and shortcomings for the prediction of different aggression intensities. Furthermore, we observe that meta-features are very informative and show significant improvements for the prediction performance of all the fusion methods.

Keywords:  
Multimodal fusion
deep neural networks
meta-information
aggression detection
Machine Learning

Author(s) Name:  Noussaiba Jaafar, Zied Lachiri

Journal name:  Expert Systems with Applications

Conferrence name:  

Publisher name:  Elsevier

DOI:  10.1016/j.eswa.2022.118523

Volume Information:  Volume 211, January 2023, 118523