List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction - 2023

bilingual-word-embedding-fusion.jpg

Research Paper On Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction

Research Area:  Machine Learning

Abstract:

Great progress has been made in unsupervised bilingual lexicon induction (UBLI) by aligning the source and target word embeddings independently trained on monolingual corpora. The common assumption of most UBLI models is that the embedding spaces of two languages are approximately isomorphic (i.e., similar in geometric structure). Therefore, the performance is bound by the degree of isomorphism, especially on etymologically and typologically distant languages. Near-zero UBLI results have been reported for them. To address this problem, we propose a method to increase the isomorphism based on bilingual word embedding fusion. In particular, the features from the source embeddings are integrated into the target embeddings, and vice versa. Therefore, the resulting structures of source and target embeddings are similar to each other. The method does not require any form of supervision and can be applied to any language pair. On a benchmark dataset of bilingual lexicon induction, our approach can achieve competitive or superior performance compared to the state-of-the-art methods, with particularly strong results being found on distant languages.

Keywords:  

Author(s) Name:  Hailong Cao, Tiejun Zhao, Weixuan Wang, Wei Peng

Journal name:  Information Fusion

Conferrence name:  

Publisher name:  ACM Digital Library

DOI:  10.1016/j.inffus.2023.101818

Volume Information:  Volume 97, (2023)