Transcribing continuous speech using mismatched crowdsourcing

نویسندگان

  • Preethi Jyothi
  • Mark Hasegawa-Johnson
چکیده

Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech segments. In order to scale mismatched crowdsourcing to continuous speech, we propose a number of new WFST pruning techniques based on explicitly low-entropy models of the acoustic similarities among orthographic symbols as understood within a transcriber community. We also provide an information-theoretic analysis and estimate the amount of information lost in transcription by the mismatched crowd workers to be under 5 bits.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages

When speech data with native transcriptions are scarce in an under-resourced language, automatic speech recognition (ASR) must be trained using other methods. Semi-supervised learning first labels the speech using ASR from other languages, then re-trains the ASR using the generated labels. Mismatched crowdsourcing asks crowd-workers unfamiliar with the language to transcribe it. In this paper, ...

متن کامل

Crowdsourcing for Large-Scale Pervasive Sensing

Crowdsourcing, or the act of outsourcing a task to the crowd, has the potential to revolutionize information collection and processing systems by enabling in-depth, large-scale, cost-effective information gathering, and more accurate techniques for information extraction from data. Crowdsourcing provides a powerful mechanism for creating data about the physical world, particularly through the u...

متن کامل

Mismatched Crowdsourcing: Mining Latent Skills to Acquire Speech Transcriptions

Automatic speech recognition (ASR) converts audio to text. ASR is usually trained using a large quantity of labeled data, i.e., audio with text transcription. In many languages, however, text transcription is hard to find, e.g., in both Hokkien and Dinka, we found native speakers who had received all their primary education in some other language, and who therefore had difficulty writing in the...

متن کامل

Clustering-based Phonetic Projection in Mismatched Crowdsourcing Channels for Low-resourced ASR

Acquiring labeled speech for low-resource languages is a difficult task in the absence of native speakers of the language. One solution to this problem involves collecting speech transcriptions from crowd workers who are foreign or non-native speakers of a given target language. From these mismatched transcriptions, one can derive probabilistic phone transcriptions that are defined over the set...

متن کامل

Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach

It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training – lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language – are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthogra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015