Research Area:  Machine Learning
Deep reinforcement learning (DRL) has been preliminarily applied to run-to-run (RtR) control. However, the existing works have mainly conducted on shift and drift disturbances in the chemical mechanical polishing (CMP) process and have not taken the non-stationary time-series disturbances into full consideration. Inspiring from the powerful self-learning mechanism of DRL, a new distributional reinforcement learning controller, quantile option structure deep deterministic policy gradient (QUOTA-DDPG), is designed to generate control policies without precise numerical model in this work. Specifically, the procedure for adjusting the recipe is formulated as a Markovian decision process. Meanwhile, state, action and reward are reasonably designed. Regarding QUOTA-DDPG, an option is first determined based on the option strategy, and the action is decided via intra-option policy at each time step. Moreover, target network and empirical replay mechanisms are utilized to enhance the stability and trainability. Simulations demonstrate that the presented approach outperforms the existing methods regarding the disturbance compensation and target tracking. The application of QUOTA-DDPG controller enriches the development of semiconductor smart manufacturing.
Keywords:  
Author(s) Name:  Zhu Ma, Tianhong Pan
Journal name:  Neural Computing and Applications
Conferrence name:  
Publisher name:  Springer
DOI:  10.1007/s00521-023-08760-1
Volume Information:  Volume 35, pages 19337-19350, (2023)
Paper Link:   https://link.springer.com/article/10.1007/s00521-023-08760-1