Research Area:  Machine Learning
The performance of fine-tuning pre-trained language models largely depends on the hyperparameter configuration. In this paper, we investigate the performance of modern hyperparameter optimization methods (HPO) on fine-tuning pre-trained language models. First, we study and report three HPO algorithms performances on fine-tuning two state-of-the-art language models on the GLUE dataset. We find that using the same time budget, HPO often fails to outperform grid search due to two reasons: insufficient time budget and overfitting. We propose two general strategies and an experimental procedure to systematically troubleshoot HPO-s failure cases. By applying the procedure, we observe that HPO can succeed with more appropriate settings in the search space and time budget; however, in certain cases overfitting remains.
Keywords:  
Author(s) Name:  Xueqing Liu, Chi Wang
Journal name:  Computer Science
Conferrence name:  
Publisher name:  arXiv:2106.09204
DOI:  10.48550/arXiv.2106.09204
Volume Information:  
Paper Link:   https://arxiv.org/abs/2106.09204