Multi-co-training for document classification - Python Projects

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec - 2018

Research Area: Data Mining

Abstract:

The purpose of document classification is to assign the most appropriate label to a specified document. The main challenges in document classification are insufficient label information and unstructured sparse format. A semi-supervised learning (SSL) approach could be an effective solution to the former problem, whereas the consideration of multiple document representation schemes can resolve the latter problem. Co-training is a popular SSL method that attempts to exploit various perspectives in terms of feature subsets for the same example. In this paper, we propose multi-co-training (MCT) for improving the performance of document classification. In order to increase the variety of feature sets for classification, we transform a document using three document representation methods: term frequency–inverse document frequency (TF–IDF) based on the bag-of-words scheme, topic distribution based on latent Dirichlet allocation (LDA), and neural-network-based document embedding known as document to vector (Doc2Vec). The experimental results demonstrate that the proposed MCT is robust to parameter changes and outperforms benchmark methods under various conditions.

Keywords:

Author(s) Name: Donghwa Kim,Deokseong Seo,Pilsung Kang and Suhyoun Cho

Journal name: Information Sciences

Conferrence name:

Publisher name: Springer

DOI: 10.1016/j.ins.2018.10.006

Volume Information: Volume 477, March 2019, Pages 15-29

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0020025518308028#!

Office Address

Social List

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec - 2018

Abstract:

S-Logix (OPC) Private Limited

Office Address

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec - 2018

Abstract:

Related Papers