An Efficient Scalable Procedure for Approximating the Number - Python Projects

kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning - 2018

Research Area: Big Data

Abstract:

The majority of the clinical observation data stored in large-scale Electronic Health Record (EHR) research data networks are unlabeled. Unsupervised clustering can provide invaluable tools for studying patient sub-groups in these data. Many of the popular unsupervised clustering algorithms are dependent on identifying the number of clusters. Multiple statistical methods are available to approximate the number of clusters in a dataset. However, available methods are computationally inefficient when applied to large amounts of data. Scalable analytical procedures are needed to extract knowledge from large clinical datasets. Using both simulated, clinical, and public data, we developed and tested the kluster procedure for approximating the number of clusters in a large clinical dataset. The kluster procedure iteratively applies four statistical cluster number approximation methods to small subsets of data that were drawn randomly with replacements and recommends the most frequent and mean number of clusters resulted from the iterations as the potential optimum number of clusters. Our results showed that the klusters most frequent product that iteratively applies a model-based clustering strategy using Bayesian Information Criterion (BIC) to samples of 200–500 data points, through 100 iterations, offers a reliable and scalable solution for approximating the number of clusters in unsupervised clustering. We provide the kluster procedure as an R package.

Keywords:

Author(s) Name: Hossein Estiri,Behzad Abounia Omran and Shawn N. Murphy

Journal name: Big Data Research

Conferrence name:

Publisher name: ELSEVIER

DOI: 10.1016/j.bdr.2018.05.003

Volume Information: Volume 13, September 2018, Pages 38-51

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S2214579617303611#!

Office Address

Social List

kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning - 2018

Abstract:

S-Logix (OPC) Private Limited

Office Address

kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning - 2018

Abstract:

Related Papers