Research Area:  Machine Learning
Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose.
Keywords:  
Deep multimodal fusion
Channel-Exchanging-Network (CEN)
Batch-Normalization (BN)
Deep Learning
Machine Learning
Author(s) Name:  Yikai Wang , Wenbing Huang , Fuchun Sun , Tingyang Xu , Yu Rong , Junzhou Huang
Journal name:  
Conferrence name:  NIPS-20: Proceedings of the 34th International Conference on Neural Information Processing Systems
Publisher name:  ACM
DOI:  10.5555/3495724.3496130
Volume Information:  
Paper Link:   https://dl.acm.org/doi/abs/10.5555/3495724.3496130