Research breakthrough possible @S-Logix pro@slogix.in

Office Address

  • 2nd Floor, #7a, High School Road, Secretariat Colony Ambattur, Chennai-600053 (Landmark: SRM School) Tamil Nadu, India
  • pro@slogix.in
  • +91- 81240 01111

Social List

A self-supervised representation learning of sentence structure for authorship attribution - 2022

A Self-Supervised Representation Learning Of Sentence Structure For Authorship Attribution

Research Paper on A Self-Supervised Representation Learning Of Sentence Structure For Authorship Attribution

Research Area:  Machine Learning

Abstract:

The syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored in recent years and it has been shown that it improves the generalization of different downstream tasks across many domains. Even though utilizing probing methods in several studies suggests that these learned contextual representations implicitly encode some amount of syntax, explicit syntactic information further improves the performance of deep neural models in the domain of authorship attribution. These observations have motivated us to investigate the explicit representation learning of syntactic structure of sentences. In this article, we propose a self-supervised framework for learning structural representations of sentences. The self-supervised network contains two components; a lexical sub-network and a syntactic sub-network which take the sequence of words and their corresponding structural labels as the input, respectively. Due to the n-to-1 mapping of words to their structural labels, each word will be embedded into a vector representation which mainly carries structural information. We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task. Our experimental results indicate that the structural embeddings significantly improve the classification tasks when concatenated with the existing pre-trained word embeddings.

Keywords:  
self-supervised
representation learning
pre-trained word embedding
Machine Learning

Author(s) Name:  Fereshteh Jafariakinabad , Kien A. Hua

Journal name:  ACM Transactions on Knowledge Discovery from Data

Conferrence name:  

Publisher name:  ACM

DOI:  10.1145/3491203

Volume Information:  Volume 16,Issue 4,August 2022, Article No.: 68,pp 1–16