List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Transformer based contextual text representation framework for intelligent information retrieval - 2024

transformer-based-contextual-text-representation-framework-for-intelligent-information-retrieval.jpg

Research Paper On Transformer Based Contextual Text Representation Framework for Intelligent Information Retrieval

Research Area:  Machine Learning

Abstract:

With the advent of transformer-based architectures, the contextual representation of text data has leveraged the query and the document to be represented in low-dimensional dense vector space. These vectors are learned embeddings of fixed sizes, resulting in deeper text understanding. In this study, we designed a pipeline for effectively retrieving documents from a large search space by combining the deeper text understanding capabilities of the transformer-based BERT model and a phrase embedding-based query expansion model. To learn the contextual representations, we fine-tuned a deep semantic matching model by separately encoding the document and the query. The encoder model is based on the Sentence BERT (SBERT) architecture, which separately generates dense vector representations of documents and queries. The study has also addressed the maximum token length limitation of transformer-based models through the summarization of lengthy documents. In addition, to improve the clarity and completeness of short queries and reduce the semantic gap, a phrase embedding-based query expansion model is employed. The documents and their dense vectors are indexed using the Elasticsearch engine, and matched them with query vectors for retrieving query-specific documents. Finally, the BERT-based cross-encoder model is used to re-rank the relevant records for each query. It performs full self-attention over the inputs, and yields richer text interactions to produce the final results. To assess performance, experiments are conducted on two well-known datasets, TREC-CDS-2014 and OHSUMED. A comparative analysis is carried out, which clearly demonstrates that the proposed framework produced competitive retrieval results.

Keywords:  

Author(s) Name:  Amol P. Bhopale , Ashish Tiwari

Journal name:  Expert Systems with Applications

Conferrence name:  

Publisher name:  ScienceDirect

DOI:  10.1016/j.eswa.2023.121629

Volume Information:  Volume 238, (2024)