Latest Research Topic in Preserving Commonsense Knowledge from Pre-trained Language Models

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference - 2023

preserving-commonsense-knowledge-from-pre-trained-language-models-via-causal-inference.jpg

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference | S-Logix

Research Area: Machine Learning

Abstract:

Fine-tuning has been proven to be a simple and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pretrained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with commonsense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.

Keywords:
Pre-trained Language Models
Generalization ability
Pretrained data
Preserving knowledge
QA models

Author(s) Name: Junhao Zheng, Qianli Ma, Shengjie Qiu, Yue Wu, Peitian Ma, Junlong Liu, Huawen Feng, Xichen Shang, Haibin Chen

Journal name: Computation and Language

Conferrence name:

Publisher name: arXiv.2306.10790

DOI: 10.48550/arXiv.2306.10790

Volume Information:

Paper Link: https://arxiv.org/abs/2306.10790

Office Address

Social List

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference - 2023

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference | S-Logix

Abstract:

S-Logix (OPC) Private Limited

Office Address

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference - 2023

Preserving Commonsense Knowledge from Pre trained Language Models via Causal Inference | S-Logix

Abstract:

Related Papers