An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages

نویسندگان

  • Antonios Anastasopoulos
  • David Chiang
  • Long Duong
چکیده

For many low-resource languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Translated speech data is potentially valuable for documenting endangered languages or for training speech translation systems. A first step towards making use of such data would be to automatically align spoken words with their translations. We present a model that combines Dyer et al.’s reparameterization of IBM Model 2 (fast_align) and k-means clustering using Dynamic Time Warping as a distance metric. The two components are trained jointly using expectationmaximization. In an extremely low-resource scenario, our model performs significantly better than both a neural model and a strong baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Discovery for Language Documentation using Translations

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available. We present a method for partially labeling additional speech with translations in this scenario. We modify an unsupervised speech-totranslation alignment model and obtain prototype speech segments that match...

متن کامل

Investigating Translation Strategies of Culture-Specific Items in Alignment with Nord’s Binary Translation Typology: A Case Study of Unaccustomed Earth

  Culture is an extremely complex concept. Translating cultural elements is a demanding task due to the fact that these elements comprise specific meanings and implications belonging exclusively to the lan-guage and culture from which they have emerged. Regarding this point, the present article investigated the strategies employed for translating culture-specific items (CSIs) in an English nov...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

An Attentional Model for Speech Translation Without Transcription

For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, ...

متن کامل

Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016