Geoscience keyphrase extraction algorithm

Research Area: Machine Learning

Abstract:

A large amount of unstructured textual data about geoscience structures and minerals is buried in geoscience documents and is unused. Automatic keyphrase extraction provides opportunities to leverage this wealth of data for analysis and knowledge discovery. However, keyphrase extraction remains a complicated task, and the performance of state-of-the-art approaches is still low. Automatic discovery of high-quality and meaningful keyphrases requires the application of useful knowledge and suitable techniques. Seeing both challenges and opportunities in the situation described above, this paper proposes an ontology and enhanced word embedding-based (OEWE) methodology for automatic keyphrase extraction from geoscience documents. We first develop a quantitative analysis for keyphrase extraction evaluation based on conditional probability and the naive Bayesian model, which is valuable when human-annotated keyphrases are not available. The domain ontology is then performed on a multiway tree to enrich the domain-specific knowledge on certain concepts and relationships in a domain. Simultaneously, word2vec, a model of a word distribution using deep learning, is updated by applying the geological ontology, and it links domain background information and identifies infrequent but representative keyphrases. We use two homemade geoscience datasets to evaluate the performance of OEWE. We compare our method with frequency, term frequency-inverse document frequency (TF-IDF), TextRank and rapid automatic keyword extraction (RAKE), finding that our method achieves average F1 scores of 30.1% and 40.7% on two manually annotated datasets.

Keywords:

Author(s) Name: Qiu Qinjun, Xie Zhong, Wu Liang, Li Wenjia

Journal name: Expert Systems with Applications

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.eswa.2019.02.001

Volume Information: Volume 125, 1 July 2019, Pages 157-169

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0957417419301009

Office Address

Social List