Contextualized weak supervision for text classification - Python Project

Research Area: Machine Learning

Abstract:

Weakly supervised text classification based on a few user-provided seed words has recently attracted much attention from researchers. Existing methods mainly generate pseudo-labels in a context-free manner (e.g., string matching), therefore, the ambiguous, context-dependent nature of human language has been long overlooked. In this paper, we propose a novel framework ConWea, providing contextualized weak supervision for text classification. Specifically, we leverage contextualized representations of word occurrences and seed word information to automatically differentiate multiple interpretations of the same word, and thus create a contextualized corpus. This contextualized corpus is further utilized to train the classifier and expand seed words in an iterative manner. This process not only adds new contextualized, highly label-indicative keywords but also disambiguates initial seed words, making our weak supervision fully contextualized. Extensive experiments and case studies on real-world datasets demonstrate the necessity and significant advantages of using contextualized weak supervision, especially when the class labels are fine-grained.

Keywords:

Author(s) Name: Dheeraj Mekala, Jingbo Shang

Journal name:

Conferrence name: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Publisher name: ACL

DOI: 10.18653/v1/2020.acl-main.30

Volume Information: 2020,Pages:323–333

Paper Link: https://aclanthology.org/2020.acl-main.30/

Office Address

Social List

Contextualized weak supervision for text classification - 2020

Abstract:

S-Logix (OPC) Private Limited

Office Address

Contextualized weak supervision for text classification - 2020

Abstract:

Related Papers