Cross-language mapping for small-vocabulary ASR in under-resourced languages: investigating the impact of source language choice
نویسندگان
چکیده
For small-vocabulary applications, a mapped pronunciation lexicon can enable speech recognition in a target underresourced language using an out-of-the-box recognition engine for a high-resource source language. Existing algorithms for cross-language phoneme mapping enable the fully automatic creation of such lexicons using just a few minutes of audio, making speech-driven applications in any language feasible. What such methods have not considered is whether careful selection of the source language based on the linguistic properties of the target language can improve recognition accuracy; this paper reports on a preliminary exploration of this question. Results from a first case study seem to indicate that phonetic similarity between target and source language does not significantly impact accuracy, underscoring the languageindependence of such techniques.
منابع مشابه
Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
When speech data with native transcriptions are scarce in an under-resourced language, automatic speech recognition (ASR) must be trained using other methods. Semi-supervised learning first labels the speech using ASR from other languages, then re-trains the ASR using the generated labels. Mismatched crowdsourcing asks crowd-workers unfamiliar with the language to transcribe it. In this paper, ...
متن کاملA first LVCSR system for Luxembourgish, an under-resourced European language
Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s under-described languages. We describe our efforts in building an large vocabulary ASR system for such a “minority” language (target language: Luxembourgish) without any transcribed audio training data. Instead, acoustic models are derived from major languages (sou...
متن کاملInvestigating the Effect of Morphology Instruction through Semantic Map-ping on Vocabulary Learning of Iranian Intermediate EFL Learners
The aim of this study was to investigate the effect of morphology instruction through semantic mapping on vocabulary learning of Iranian intermediate EFL learners. To do so, 50 out of 70 students were se-lected from one English language institute by administrating a PET test. Then, they were assigned into two groups randomly as experimental and control groups. A pretest (teacher made) was adm...
متن کاملSMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
This study investigates the possibility of using statistical machine translation to create domainspecific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. Several translation scenarios (both unsupervised and semi-su...
متن کاملSpeed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof Languages
Automatic Speech Recognition (ASR) for (under-resourced) Sub-Saharan African languages faces several challenges: small amount of transcribed speech, written language normalization issues, few text resources available for language modeling, as well as specific features (tones, morphology, etc.) that need to be taken into account seriously to optimize ASR performance. This paper tries to address ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014