List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

GPipe:efficient training of giant neural networks using pipeline parallelism - 2019

Gpipe:Efficient Training Of Giant Neural Networks Using Pipeline Parallelism

Research Area:  Machine Learning

Abstract:

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other machine learning tasks. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, TensorPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of TensorPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i)Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii)Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.

Keywords:  

Author(s) Name:  Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Mia Xu Chen , Dehao Chen , HyoukJoong Lee , Jiquan Ngiam , Quoc V. Le , Yonghui Wu , Zhifeng Chen

Journal name:  

Conferrence name:  Proceedings of the 33rd International Conference on Neural Information Processing Systems

Publisher name:  ACM

DOI:  10.5555/3454287.3454297

Volume Information:  Article No.: 10,Pages 103–112