Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Efficient text document clustering with new similarity measures - 2020

Efficient Text Document Clustering With New Similarity Measures

Thesis Topics in Efficient text document clustering with new similarity measures | S - Logix

Research Area:  Machine Learning

Abstract:

In this paper, two new similarity measures, namely distance of term frequency-based similarity measure (DTFSM) and presence of common terms-based similarity measure (PCTSM), are proposed to compute the similarity between two documents for improving the effectiveness of text document clustering. The effectiveness of the proposed similarity measures is evaluated on reuters-21578 and WebKB datasets for clustering the documents using K-means and K-means++ clustering algorithms. The results obtained by using the proposed DTFSM and PCTSM are significantly better than other measures for document clustering in terms of accuracy, entropy, recall and F-measure. It is evident that the proposed similarity measures not only improve the effectiveness of the text document clustering, but also reduce the complexity of similarity measures based on the number of required operations during text document clustering.

Keywords:  
similarity measure
document
WebKB dataset
accuracy
entropy
recall
F-measure

Author(s) Name:  R. Lakshmi and S. Baskar

Journal name:  International Journal of Business Intelligence and Data Mining

Conferrence name:  

Publisher name:   Inderscience

DOI:  10.1504/IJBIDM.2021.111741

Volume Information:  Vol. 18, No. 1,November 6, 2020pp 49-72