Research Area:  Machine Learning
We present a method for automatic query expansion for cross-lingual information retrieval in the medical domain. The method employs machine translation of source-language queries into a document language and linear regression to predict the retrieval performance for each translated query when expanded with a candidate term. Candidate terms (in the document language) come from multiple sources: query translation hypotheses obtained from the machine translation system, Wikipedia articles and PubMed abstracts. Query expansion is applied only when the model predicts a score for a candidate term that exceeds a tuned threshold which allows to expand queries with strongly related terms only. Our experiments are conducted using the CLEF eHealth 2013–2015 test collection and show significant improvements in both cross-lingual and monolingual settings.
Keywords:  
Author(s) Name:  Shadi Saleh & Pavel Pecina
Journal name:  
Conferrence name:  European Conference on Information Retrieval
Publisher name:  Springer
DOI:  10.1007/978-3-030-15712-8_33
Volume Information:  
Paper Link:   https://link.springer.com/chapter/10.1007/978-3-030-15712-8_33