Amazing technological breakthrough possible @S-Logix

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • +91- 81240 01111

Social List

A Multiscale Visualization of Attention in the Transformer Model - 2019

A Multiscale Visualization Of Attention In The Transformer Model

Research Area:  Machine Learning


The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model by showing how the model assigns weight to different input elements. However, the multi-layer, multi-head attention mechanism in the Transformer model can be difficult to decipher. To make the model more accessible, we introduce an open-source tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism. We demonstrate the tool on BERT and OpenAI GPT-2 and present three example use cases: detecting model bias, locating relevant attention heads, and linking neurons to model behavior.


Author(s) Name:  Jesse Vig

Journal name:  Computer Science

Conferrence name:  

Publisher name:  arXiv:1906.05714

DOI:  10.48550/arXiv.1906.05714

Volume Information: