Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

A Sample Extension Method Based on Wikipedia and Its Application in Text Classification - 2018

A Sample Extension Method Based on Wikipedia and Its Application in Text Classification

Research Area:  Data Mining

Abstract:

Text classification is a topic in natural language processing that is particularly useful for Internet information processing. Methods based on supervised learning require a large amount of manually annotated training samples. The annotation of training samples is time consuming, and performance relies heavily on the quality of the training samples. This paper presents a text classification method based on sample extension. The extension is based on the correlation of the labeled sample data and the concepts in Wikipedia. Combined with the rich link relationships between concepts, we selected appropriate articles from Wikipedia to expand the training sample set. By introducing the large amount of rich semantic concept pages that are contained in Wikipedia along with links that are related to different pages, our approach enhances the performance and generalization of the classifier. Experiments demonstrate that the performance of the method proposed in this paper is better than that of both supervised and semi-supervised methods.

Keywords:  

Author(s) Name:  Wenhao Zhu, Yiting Liu, Guannan Hu, Jianyue Ni and Zhiguo Lu

Journal name:  Wireless Personal Communications

Conferrence name:  

Publisher name:  Springer

DOI:  10.1007/s11277-018-5416-z

Volume Information:  volume 102, pages 3851–3867 (2018)