Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

SSA: A Content-Based Sparse Attention Mechanism - 2022


A Content-Based Sparse Attention Mechanism | S-Logix

Research Area:  Machine Learning

Abstract:

Recently, many scholars have used attention mechanisms to achieve excellent performance results on various neural network applications. However, the attention mechanism also has shortcomings. Firstly, the high computational and storage consumption makes the attention mechanism difficult to apply on long sequences. Second, all tokens are involved in the computation of the attention map, which may increase the influence of noisy tokens on the results and lead to poor training results. Due to these two shortcomings, attention models are usually strictly limited to sequence length. Further, attention models have difficulty in exploiting their excellent properties for modelling long sequences. To solve the above problems, an efficient sparse attention mechanism (SSA) is proposed in this paper. SSA is based on two separate layers: the local layer and the global layer. These two layers jointly encode local sequence information and global context. This new sparse-attention patterns is powerful in accelerating reasoning. The experiments in this paper validate the effectiveness of the SSA mechanism by replacing the self-attentive structure with an SSA structure in a variety of transformer models. The SSA attention mechanism has achieved state-of-the-art performance on several major benchmarks. SSA was validated on a variety of datasets and models encompassing language translation, language modelling and image recognition. With a small improvement in accuracy, the inference calculation speed was increased by 24%.

Keywords:  
attention mechanism
high computational
storage consumption
sequence length
local layer
global layer
image recognition
accuracy

Author(s) Name:  Yang Sun, Wei Hu, Fang Liu, Feihu Huang & Yonghao Wang

Journal name:  

Conferrence name:  Knowledge Science, Engineering and Management

Publisher name:  Springer

DOI:  https://doi.org/10.1007/978-3-031-10989-8_53

Volume Information:  -