Research Area:  Machine Learning
Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.
Adversarial Neural Network
Topic-based Adversarial Neural Network
Author(s) Name:  Yanan Wang; Qi Liu; Chuan Qin; Tong Xu; Yijun Wang; Enhong Chen; Hui Xiong
Conferrence name:  IEEE International Conference on Data Mining (ICDM)
Publisher name:  IEEE
Paper Link:   https://ieeexplore.ieee.org/abstract/document/8594884