Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning - 2021

common-sense-beyond-english-evaluating-and-improving-multilingual-language-models-for-commonsense-reasoning.jpg

Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning | S-Logix

Research Area:  Machine Learning

Abstract:

Commonsense reasoning research has so far been limited to English. We aim to evaluate and improve popular multilingual language models (ML-LMs) to help advance commonsense reasoning (CSR) beyond English. We collect the Mickey Corpus, consisting of 561k sentences in 11 different languages, which can be used for analyzing and improving ML-LMs. We propose Mickey Probe, a language-agnostic probing task for fairly evaluating the common sense of popular ML-LMs across different languages. In addition, we also create two new datasets, X-CSQA and X-CODAH, by translating their English versions to 15 other languages, so that we can evaluate popular ML-LMs for cross-lingual commonsense reasoning. To improve the performance beyond English, we propose a simple yet effective method -- multilingual contrastive pre-training (MCP). It significantly enhances sentence representations, yielding a large performance gain on both benchmarks.

Keywords:  
Multilingual Language Models
Commonsense Reasoning
Multilingual Contrastive Pre-training
Computation and Language
Artificial Intelligence

Author(s) Name:  Bill Yuchen Lin, Seyeon Lee, Xiaoyang Qiao, Xiang Ren

Journal name:  Computation and Language

Conferrence name:  

Publisher name:  arXiv.2106.06937

DOI:  10.48550/arXiv.2106.06937

Volume Information: