Efficient text document clustering new similarity measures - Python Projects

Efficient text document clustering with new similarity measures - 2020

Thesis Topics in Efficient text document clustering with new similarity measures | S - Logix

Research Area: Machine Learning

Abstract:

In this paper, two new similarity measures, namely distance of term frequency-based similarity measure (DTFSM) and presence of common terms-based similarity measure (PCTSM), are proposed to compute the similarity between two documents for improving the effectiveness of text document clustering. The effectiveness of the proposed similarity measures is evaluated on reuters-21578 and WebKB datasets for clustering the documents using K-means and K-means++ clustering algorithms. The results obtained by using the proposed DTFSM and PCTSM are significantly better than other measures for document clustering in terms of accuracy, entropy, recall and F-measure. It is evident that the proposed similarity measures not only improve the effectiveness of the text document clustering, but also reduce the complexity of similarity measures based on the number of required operations during text document clustering.

Keywords:
similarity measure
document
WebKB dataset
accuracy
entropy
recall
F-measure

Author(s) Name: R. Lakshmi and S. Baskar

Journal name: International Journal of Business Intelligence and Data Mining

Conferrence name:

Publisher name: Inderscience

DOI: 10.1504/IJBIDM.2021.111741

Volume Information: Vol. 18, No. 1,November 6, 2020pp 49-72

Paper Link: https://www.inderscienceonline.com/doi/abs/10.1504/IJBIDM.2021.111741

Office Address

Social List

Efficient text document clustering with new similarity measures - 2020

Thesis Topics in Efficient text document clustering with new similarity measures | S - Logix

Abstract:

S-Logix (OPC) Private Limited

Office Address

Efficient text document clustering with new similarity measures - 2020

Thesis Topics in Efficient text document clustering with new similarity measures | S - Logix

Abstract:

Related Papers