Research Area:  Machine Learning
In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank+ by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets.
Keywords:  
Keyphrase extraction
pre-trained language model
sentence embeddings
position-biased weight
SIFRank
Machine Learning
Deep Leraning
Author(s) Name:   Yi Sun; Hangping Qiu; Yu Zheng; Zhongwei Wang; Chaoran Zhang
Journal name:  IEEE Access
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/ACCESS.2020.2965087
Volume Information:  ( Volume: 8) Page(s): 10896 - 10906
Paper Link:   https://ieeexplore.ieee.org/abstract/document/8954611