Research Area:  Machine Learning
In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.
Keywords:  
Author(s) Name:  Borislav Mavrin, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu
Journal name:  
Conferrence name:  Proceedings of the 36th International Conference on Machine Learning
Publisher name:  arxiv
DOI:  10.48550/arXiv.1905.06125
Volume Information:  
Paper Link:   https://arxiv.org/abs/1905.06125