Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Global-Local Word Embedding for Text Classification

Global-Local Word Embedding for Text Classification

Great PhD Thesis on Global-Local Word Embedding for Text Classification

Research Area:  Machine Learning

Abstract:

Only humans can understand and comprehend the actual meaning that underlies natural written language, whereas machines can form semantic relationships only after humans have provided the parameters that are necessary to model the meaning. To enable computer models to access the underlying meaning in written language, accurate and sufficient document representation is crucial. Recent word embedding approaches have drawn much attention to text mining research. One of the main benefits of such approaches is the use of global corp uses with the generation of pre-trained word vectors. Although very effective, these approaches have their disadvantages, namely sole reliance on pre-trained word vectors that may neglect the local context and increase word ambiguity. In this thesis, four new document representation approaches are introduced to mitigate the risk of word ambiguity and inject a local context into globally pretrained word vectors. The proposed approaches, which are frameworks for document representation while using word embedding learning features for the task of text classification, are: Content Tree Word Embedding; Composed Maximum Spanning Content Tree; Embedding-based Word Clustering; and Auto encoder-based Word Embedding. The results show improvement in the F_score accuracy measure for a document classification task applied to IMDB Movie Reviews, Hate Speech Identification, 20 Newsgroups, Reuters-21578, and AG News as benchmark datasets in comparison to using three deep learning-based word embedding approaches, namely GloVe, Word2Vec, and fast Text, as well as two other document representations: LSA and Random word embedding.

Name of the Researcher:  Kamkarhaghighi, Mehran

Name of the Supervisor(s):  Makrehchi, Masoud

Year of Completion:  2019

University:  University of Ontario Institute of Technology

Thesis Link:   Home Page Url