Research Area:  Machine Learning
In this paper, we consider risk-sensitive sequential decision-making in model-based Reinforcement Learning (RL). Our contributions are two-fold. First, we introduce a novel and coherent quantification of risk, namely composite risk, which quantifies joint effect of aleatory and epistemic risk during the learning process. Existing works considered either aleatory or epistemic risk individually, or an additive combination of the two. We prove that the additive formulation is a particular case of the composite risk when the epistemic risk measure is replaced with expectation. Thus, the composite risk provides an estimate more sensitive to both aleatory and epistemic sources of uncertainties than the individual and additive formulations. Following that, we propose to use a bootstrapping method, SENTINEL-K, for performing distributional RL. SENTINEL-K uses an ensemble of K learners to estimate the return distribution. We use the Follow The Regularised Leader (FTRL) to aggregate the return distributions of K learners and to estimate the composite risk. We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than state-of-the-art risk-sensitive and distributional RL algorithms.
Keywords:  
Author(s) Name:  Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis
Journal name:  
Conferrence name:  
Publisher name:  arxiv
DOI:  10.48550/arXiv.2102.11075
Volume Information:  
Paper Link:   https://arxiv.org/abs/2102.11075