Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Lossless Pruned Naive Bayes for Big Data Classifications - 2018

Lossless Pruned Naive Bayes for Big Data Classifications

Research Area:  Big Data

Abstract:

In a fast growing big data era, volume and varieties of data processed in Internet applications drastically increase. Real-world search engines commonly use text classifiers with thousands of classes to improve relevance or data quality. These large scale classification problems lead to severe runtime performance challenges, so practitioners often resort to fast approximation techniques. However, the increase in classification speed comes at a cost, as approximations are lossy, mis-assigning classes relative to the original reference classification algorithm. To address this problem, we introduce a Lossless Pruned Naive Bayes (LPNB) classification algorithm tailored to real-world, big data applications with thousands of classes. LPNB achieves significant speed-ups by drawing on Information Retrieval (IR) techniques for efficient posting list traversal and pruning. We show empirically that LPNB can classify text up to eleven times faster than standard Naive Bayes on a real-world data set with 7205 classes, with larger gains extrapolated for larger taxonomies. In practice, the achieved acceleration is significant as it can greatly cut required computation time. In addition, it is lossless: the output is identical to standard Naive Bayes, in contrast to extant techniques such as hierarchical classification. The acceleration does not rely on the taxonomy structure, and it can be used for both hierarchical and flat taxonomies.

Keywords:  

Author(s) Name:  Nanfei Sun,Bingjun Sun,Jian (Denny) Lin and Michael Yu-Chi Wu

Journal name:  Big Data Research

Conferrence name:  

Publisher name:  ELSEVIER

DOI:  10.1016/j.bdr.2018.05.007

Volume Information:  Volume 14, December 2018, Pages 27-36