Research Area:  Machine Learning
In Off-Policy reinforcement learning (RL), the experience imbalance problem can affect learning performance. The experience imbalance problem refers to the phenomenon that the experiences obtained by the agent during the learning process are unevenly distributed in the state space, resulting in the agent’s inability to accurately estimate the value of each potential state. This problem is typically caused by environments with high-dimensional state and action spaces, as well as the exploration–exploitation mechanism inherent in RL. This article proposes a balanced prioritized experience replay (BPER) algorithm based on experience rarity. First, an evaluation metric to quantify experience rarity is defined. Then, the sampling priority of each experience is calculated according to this metric. Finally, prioritized experience replay is performed according to the sampling priority. BPER increases the sampling frequency of high-rarity experiences and decreases the sampling frequency of low-rarity experiences, enabling the agent to learn more comprehensive knowledge. We evaluate BPER on a series of MuJoCo continuous control tasks. Experimental results show that BPER can effectively improve the learning performance while mitigating the impact of the experience imbalance problem.
Keywords:  
Author(s) Name:  Zhouwei Lou, Yiye Wang, Shuo Shan, Kanjian Zhang & Haikun Wei
Journal name:  Neural Computing and Applications
Conferrence name:  
Publisher name:  Springer
DOI:  10.1007/s00521-024-09913-6
Volume Information:  Volume 36, Pages 15721–15737, (2024)
Paper Link:   https://link.springer.com/article/10.1007/s00521-024-09913-6