Are Discrete Units Necessary for Spoken Language Modeling?
نویسندگان
چکیده
Recent work in spoken language modeling shows the possibility of learning a unsupervisedly from raw audio without any text labels. The approach relies first on transforming into sequence discrete units (or pseudo-text) and then training model directly such pseudo-text. Is bottleneck necessary, potentially introducing irreversible errors encoding speech signal, or could we learn at all? In this work, study role versus continuous representations modeling. We show that discretization is indeed essential for good results removes linguistically irrelevant information features, helping to improve performances. On basis study, train HuBERT reaching new state-of-the-art lexical, syntactic semantic metrics Zero Resource Speech Challenge 2021 (Track 1 - Only).
منابع مشابه
On Using Written Language Training Data for Spoken Language Modeling
We attemped to improve recognition accuracy by reducing the inadequacies of the lexicon and language model. Specifically we address the following three problems: (1) the best size for the lexicon, (2) conditioning written text for spoken language recognition, and (3) using additional training outside the text distribution. We found that increasing the lexicon 20,000 words to 40,000 words reduce...
متن کاملClassification-based spoken text selection for LVCSR language modeling
Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this paper, we propose a classification-based method to automatically select social media data for constructing a spoken-style language model in LVCSR. Three classification techniques, SVM, CRF,...
متن کاملSequential Dialogue Context Modeling for Spoken Language Understanding
Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a re...
متن کاملLanguage modeling for speech recognition of spoken Cantonese
This paper addresses the problem of language modeling for LVCSR of Cantonese spoken in daily communication. As a spoken dialect, Cantonese is not used in written documents and published materials. Thus it is difficult to collect sufficient amount of written Cantonese text data for the training of statistical language models. We propose to solve this problem by translating standard Chinese text,...
متن کاملMulti-channel sentence classification for spoken dialogue language modeling
In traditional language modeling word prediction is based on the local context (e.g. n-gram). In spoken dialog, language statistics are affected by the multidimensional structure of the human-machine interaction. In this paper we investigate the statistical dependencies of users’ responses with respect to the system’s and user’s channel. The system channel components are the prompts’ text, dial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Journal of Selected Topics in Signal Processing
سال: 2022
ISSN: ['1941-0484', '1932-4553']
DOI: https://doi.org/10.1109/jstsp.2022.3200909