Research Area:  Data Mining
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although “supervised term weighting” approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article, we analyze the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism, we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimized on the training set of interest; we dub this approach
Keywords:  
Author(s) Name:  Alejandro Moreo , Andrea Esuli and Fabrizio Sebastiani
Journal name:  IEEE Transactions on Knowledge and Data Engineering
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TKDE.2018.2883446
Volume Information:  Volume 32,Issue 2,Feb. 2020, pp 302–31
Paper Link:   https://ieeexplore.ieee.org/document/8550687