Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning - 2019

Multi-Level Policy And Reward-Based Deep Reinforcement Learning Framework For Image Captioning

Research Area:  Machine Learning

Abstract:

Image captioning is one of the most challenging tasks in AI because it requires an understanding of both complex visuals and natural language. Because image captioning is essentially a sequential prediction task, recent advances in image captioning have used reinforcement learning (RL) to better explore the dynamics of word-by-word generation. However, the existing RL-based image captioning methods rely primarily on a single policy network and reward function-an approach that is not well matched to the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To solve this problem, we propose a novel multi-level policy and reward RL framework for image captioning that can be easily integrated with RNN-based captioning models, language metrics, or visual-semantic functions for optimization. Specifically, the proposed framework includes two modules: 1) a multi-level policy network that jointly updates the word- and sentence-level policies for word generation; and 2) a multi-level reward function that collaboratively leverages both a vision-language reward and a language-language reward to guide the policy. Furthermore, we propose a guidance term to bridge the policy and the reward for RL optimization. The extensive experiments on the MSCOCO and Flickr30k datasets and the analyses show that the proposed framework achieves competitive performances on a variety of evaluation metrics. In addition, we conduct ablation studies on multiple variants of the proposed framework and explore several representative image captioning models and metrics for the word-level policy network and the language-language reward function to evaluate the generalization ability of the proposed framework.

Keywords:  

Author(s) Name:  Ning Xu; Hanwang Zhang; An-An Liu; Weizhi Nie; Yuting Su; Jie Nie; Yongdong Zhang

Journal name:  IEEE Transactions on Multimedia

Conferrence name:  

Publisher name:  IEEE

DOI:  10.1109/TMM.2019.2941820

Volume Information:  Volume: 22, Issue: 5, May 2020, Page(s): 1372 - 1383