Visual language navigation: a survey and open challenges

Visual language navigation: a survey and open challenges - 2022

Survey paper on Visual language navigation

Research Area: Machine Learning

Abstract:

With the recent development of deep learning, AI models are widely used in various domains. AI models show good performance for definite tasks such as image classification and text generation. With the recent development of generative models (e.g., BigGAN, GPT-3), AI models also show impressive results for diverse generation tasks (e.g., photo-realistic image, paragraph generation). As the performance of each AI model improves, interest in comprehensive tasks, such as visual language navigation (VLN) which follows the language instruction with an egocentric view, is also growing. However, the model integration for VLN has a problem due to the model complexity, modal heterogeneity, and paired data shortage. This study provides a comprehensive survey on VLN with a systemic approach for reviewing recent trends. At first, we define a taxonomy for fundamental techniques which need to perform VLN. We analyze from four perspectives of VLN: representation learning, reinforcement learning, component, and evaluation. We investigate the pros and cons of each component and methodology that have been conducted recently. This survey categorizes major research institute-s approaches with taxonomy defined in four perspectives, unlike other conventional surveys. Finally, we discuss current open challenges and conclude our study by giving possible future directions.

Keywords:
Artificial intelligence
Visual language navigation
Representation learning
Reinforcement learning
Machine Learning
Deep Learning

Author(s) Name: Sang-Min Park & Young-Gab Kim

Journal name: Artificial Intelligence Review

Conferrence name:

Publisher name: Springer

DOI: 10.1007/s10462-022-10174-9

Volume Information: volume 56, pages: 365–427

Paper Link: https://link.springer.com/article/10.1007/s10462-022-10174-9

Office Address

Social List