Research Area:  Machine Learning
Text-based games are becoming commonly used in reinforcement learning as real-world simulation environments. They are usually imperfect information games, and their interactions are only in the textual modality. To challenge these games, it is effective to complement the missing information by providing knowledge outside the game, such as human common sense. However, such knowledge has only been available from textual information in previous works. In this paper, we investigate the advantage of employing commonsense reasoning obtained from visual datasets such as scene graph datasets. In general, images convey more comprehensive information compared with text for humans. This property enables to extract commonsense relationship knowledge more useful for acting effectively in a game. We compare the statistics of spatial relationships available in Visual Genome (a scene graph dataset) and ConceptNet (a text-based knowledge) to analyze the effectiveness of introducing scene graph datasets. We also conducted experiments on a text-based game task that requires commonsense reasoning. Our experimental results demonstrated that our proposed methods have higher and competitive performance than existing state-of-the-art methods.
Keywords:  
Text-based games
Textual modality
Textual information
Visual datasets
Commonsense relationship
Author(s) Name:  Tsunehiko Tanaka, Daiki Kimura, Michiaki Tatsubori
Journal name:  Computer Vision and Pattern Recognition
Conferrence name:  
Publisher name:  arXiv.2210.14162
DOI:  10.48550/arXiv.2210.14162
Volume Information:  
Paper Link:   https://arxiv.org/abs/2210.14162