Research Area:  Machine Learning
Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.
Keywords:  
Adversarial Neural Network
Cross-Domain
Keyphrase Extraction
Topic-based Adversarial Neural Network
Machine Learning
Deep Learning
Author(s) Name:  Yanan Wang; Qi Liu; Chuan Qin; Tong Xu; Yijun Wang; Enhong Chen; Hui Xiong
Journal name:  
Conferrence name:  IEEE International Conference on Data Mining (ICDM)
Publisher name:  IEEE
DOI:  10.1109/ICDM.2018.00075
Volume Information:  
Paper Link:   https://ieeexplore.ieee.org/abstract/document/8594884