Meta reinforcement learning (Meta-RL) is the special category of meta-learning that applies meta-learning to reinforcement learning. Meta-learning possesses the capability of adoption to the new environments which are not experienced on the training time of the model, and Reinforcement Learning enables an agent to learn in an interactive environment by trial and error using the response from its actions and experiences. Meta reinforcement learning not only performs the trained and tested set of problems but also executes a variety of tasks.
Meta RL considers the last reward, last action, and current state into the observation policy. The significant goal of Meta RL is to design an agent with the capability to rapidly adapt and improve with additional experience for the new/unseen tasks. Meta RL aims to learn new skills through several processes, such as prior learning from a similar set of tasks and reusing it in the new environment after a few or zero trials. Meta RL is divided into meta training and meta testing. Some Meta RL methods are gradient-based meta RL, recurrence-based method, model-free and off-policy method. The most popular applications of Meta RL are autonomous driving, robotics, traffic signal control, and wireless networking and applied for complex real-world dynamic tasks.
• Meta-reinforcement learning (meta-RL) is a problem of learning algorithms that aims to determine a good sampling distribution in a new environment.
• It confronts the challenge of traditional Reinforcement Learning by leveraging knowledge learned from training tasks to perform well in previously unseen tasks.
• The meta-learning algorithm learns the task structure in simulation by training on various generated insertion tasks.
• Despite fast progress, meta-RL poses some significant challenges involving meta-RL with sparse rewards remains a challenging issue.
• Moreover, current meta-RL approaches are limited to narrow parametric and stationary task distributions while ignoring qualitative differences and non-stationary changes between tasks that occur in the real world.
• In the future, an efficient exploration is required to quickly find the most informative experiences in both meta-training and fast adaptation.