Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces - 2018

Unsupervised Cross-Modal Alignment Of Speech And Text Embedding Spaces

Research Paper on Unsupervised Cross-Modal Alignment Of Speech And Text Embedding Spaces

Research Area:  Machine Learning

Abstract:

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision. Inspired by the success in unsupervised cross-lingual word embeddings, in this paper we target learning a cross-modal alignment between the embedding spaces of speech and text learned from corpora of their respective modalities in an unsupervised fashion. The proposed framework learns the individual speech and text embedding spaces, and attempts to align the two spaces via adversarial training, followed by a refinement procedure. We show how our framework could be used to perform spoken word classification and translation, and the experimental results on these two tasks demonstrate that the performance of our unsupervised alignment approach is comparable to its supervised counterpart. Our framework is especially useful for developing automatic speech recognition (ASR) and speech-to-text translation systems for low- or zero-resource languages, which have little parallel audio-text data for training modern supervised ASR and speech-to-text translation models, but account for the majority of the languages spoken across the world.

Keywords:  
Unsupervised
Cross-Modal
Alignment Of Speech
Text Embedding Spaces
Machine Learning
Deep Learning

Author(s) Name:  Yu-An Chung , Wei-Hung Weng , Schrasing Tong , James Glass

Journal name:  

Conferrence name:  NIPS 18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Publisher name:  ACM

DOI:  10.5555/3327757.3327837

Volume Information: