Speech recognition with large language models

End-to-end speech recognition contextualization with large language models - 2024

end-to-end-speech-recognition-contextualization-with-large-language-models.jpg

Research Paper on End-To-End Speech Recognition Contextualization with Large Language Models

Research Area: Machine Learning

Abstract:

In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for contextualizing speech recognition models incorporating LLMs. Our approach casts speech recognition as a mixed-modal language modeling task based on a pretrained LLM. We provide audio features, along with optional text tokens for context, to train the system to complete transcriptions in a decoder-only fashion. As a result, the system is implicitly incentivized to learn how to leverage unstructured contextual information during training. Our empirical results demonstrate a significant improvement in performance, with a 6% WER reduction when additional textual context is provided. Moreover, we find that our method performs competitively and improve by 7.5% WER overall and 17% WER on rare words against a baseline contextualized RNN-T system that has been trained on more than twenty five times larger speech dataset. Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.

Keywords:
speech recognition
large language models

Author(s) Name: Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Journal name: Speech and Signal Processing

Conferrence name:

Publisher name: IEEE

DOI: 10.1109/ICASSP48485.2024.10446898

Volume Information: Volume 25,(2024)

Paper Link: https://arxiv.org/abs/2309.10917

Office Address

Social List

End-to-end speech recognition contextualization with large language models - 2024

Research Paper on End-To-End Speech Recognition Contextualization with Large Language Models

Abstract:

S-Logix (OPC) Private Limited

Office Address

End-to-end speech recognition contextualization with large language models - 2024

Research Paper on End-To-End Speech Recognition Contextualization with Large Language Models

Abstract:

Related Papers