Research Area:  Machine Learning
Rapid increase in multimodal data from online sources necessitates the need for the development of methods and techniques that use diverse modes along with text to generate summaries. Meeting recordings, CCTV footages, and sports coverages are vast information resources that incorporate modes such as speech, text, and images. Multimodal data contains rich information which could help in the improving the quality of summaries. Recent developments in the field of neural networks have furthered the research in this area. This paper is an attempt to contribute to the growing body of research in multimodal summarization by drawing attention to state-of-the-art techniques, datasets, and evaluation strategies. Besides covering the recent work in the field, the paper also highlights the major issues and challenges.
Keywords:  
Multimodal
Summarization
Review
Neural networks
Meeting recordings
CCTV footages
Sports coverages
Author(s) Name:  Hira Javed, M. M. Sufyan Beg & Nadeem Akhtar
Journal name:  
Conferrence name:  Proceedings of the International Conference on Computational Intelligence and Sustainable Technologies
Publisher name:  Springer
DOI:  https://doi.org/10.1007/978-981-16-6893-7_54
Volume Information:  Volume: 22
Paper Link:   https://link.springer.com/chapter/10.1007/978-981-16-6893-7_54