SIFRank:A New Baseline for Unsupervised Keyphrase Extraction

SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model - 2020

Research paper on SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

Research Area: Machine Learning

Abstract:

In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank+ by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets.

Keywords:
Keyphrase extraction
pre-trained language model
sentence embeddings
position-biased weight
SIFRank
Machine Learning
Deep Leraning

Author(s) Name: Yi Sun; Hangping Qiu; Yu Zheng; Zhongwei Wang; Chaoran Zhang

Journal name: IEEE Access

Conferrence name:

Publisher name: IEEE

DOI: 10.1109/ACCESS.2020.2965087

Volume Information: ( Volume: 8) Page(s): 10896 - 10906

Paper Link: https://ieeexplore.ieee.org/abstract/document/8954611

Office Address

Social List