Research Area:  Machine Learning
Automatic Image captioning means the generation of a caption for an image by a machine. Image captioning is performed by recognizing objects, attributes and interconnection between them. This task involves computer vision for image understanding, natural language processing for syntax and semantics purpose and machine learning for caption generation. Preferably CNN is used to understand features of an image and RNN is used for sentence generation. Earlier, Machine learning approach was used for this purpose. Input data is used to extract the features in traditional machine learning. Extracting features like handcrafted from large dataset is not so easy and feasible. Later on, Various deep learning-based approaches were proposed. In deep learning, retrieval based and template-based methods were proposed but faced some issues like missing important objects and fixed length caption respectively. Then end to end learning approach based on deep learning network came into existence and image captioning task became more efficient. The objective of this paper is to study and compare various end to end learning-based framework for image captioning using standard evaluation metric and to understand how can these frameworks be used for various research applications. Along with the comparison, futuristic challenges have also been discussed.
Keywords:  
Author(s) Name:  Gaurav and Pratistha Mathur
Journal name:  
Conferrence name:  Journal of Physics: Conference Series
Publisher name:  IOP
DOI:  10.1088/1742-6596/1950/1/012045
Volume Information:  
Paper Link:   https://iopscience.iop.org/article/10.1088/1742-6596/1950/1/012045/meta