Research Area:  Machine Learning
Automatic keyphrase extraction algorithms aim to identify words and phrases that contain the core information in documents. As online scholarly resources have become widespread in recent years, better keyphrase extraction techniques are required to improve search efficiency. We present two features, keyphrase semantic diversity and keyphrase coverage, to overcome limitations of existing methods for unsupervised keyphrase extraction. Keyphrase semantic diversity is the degree of semantic variety in the extraction result, which is introduced to avoid extracting synonym phrases that contain the same high-score candidate. Keyphrase coverage refers to candidates’ representativeness of other words in documents. We propose an unsupervised keyphrase extraction method called TripleRank, which evaluates three features: word position (a sensitive feature for academic documents) and two innovative features mentioned above. The architecture of TripleRank includes three sub-models that score the three features and a summing model. Though involving multiple models, there is no typical iteration process in TripleRank; hence, the computational cost is relatively low. TripleRank has led the experiment results on four academic datasets compared to four state-of-the-art baseline models, which confirmed the influence of keyphrase semantic diversity and keyphrase coverage and proved the efficiency of our method.
Keywords:  
Automatic keyphrase extraction
TripleRank
unsupervised
Machine Learning
Deep Learning
Author(s) Name:  Tuohang Li, Liang Hu, Hongtu Li, Chengyu Sun, Shuai Li, Ling Chi
Journal name:  Knowledge-Based Systems
Conferrence name:  
Publisher name:  Elsevier
DOI:  10.1016/j.knosys.2021.106846
Volume Information:  Volume 219, 11 May 2021, 106846
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S095070512100109X