Research Area:  Machine Learning
Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.
Keywords:  
Unsupervised Deep Learning
Keyphrase Generation
Deep Learning
Machine Learning
Author(s) Name:  Xianjie Shen , Yinghan Wang, Rui Meng , Jingbo Shang
Journal name:  
Conferrence name:  Proceedings of the AAAI Conference on Artificial Intelligence
Publisher name:  AAAI Press
DOI:  10.1609/aaai.v36i10.21381
Volume Information:  
Paper Link:   https://ojs.aaai.org/index.php/AAAI/article/view/21381