Haitian Creole: How to Build and Ship an MT Engine from Scratch
نویسنده
چکیده
We describe the effort of the Microsoft Translator team to develop a Haitian Creole statistical machine translation engine from scratch in a matter of days. Haitian Creole presents a number of difficulties for devleoping an SMT system, principal among these is the lack of significant amounts of parallel training data and an inconsistent orthography, both of which lead to data sparseness. We demonstrate, however, that it is possible to build a translation engine of reasonable quality over very little data by engaging with the native language community and reducing data sparseness in creative ways. As such, we show that MT as a technology and as a service can be deployed rapidly in crisis situations.
منابع مشابه
The Value of Monolingual Crowdsourcing in a Real-World Translation Scenario: Simulation using Haitian Creole Emergency SMS Messages
MonoTrans2 is a translation system that combines machine translation (MT) with human computation using two crowds of monolingual source (Haitian Creole) and target (English) speakers. We report on its use in the WMT 2011 Haitian Creole to English translation task, showing that MonoTrans2 translated 38% of the sentences well compared to Google Translate’s 25%.
متن کاملPsychometric properties of the newly translated creole multidimensional scale of perceived social support (MSPSS) and perceived adequacy of resource scale (PARS) and the relationship between perceived social support and resources in Haitian mothers in the US.
BACKGROUND Low income postpartum mothers with little to no social support have increased maternal and infant morbidity and mortality, especially those with limited English proficiency and limited accesses to resources. Haitians, a growing minority in the US are an understudied population excluded from most studies due to the lack of instruments in Creole. The most widely used instruments for me...
متن کاملSpell Checking Techniques for Replacement of Unknown Words and Data Cleaning for Haitian Creole SMS Translation
We report results on translation of SMS messages from Haitian Creole to English. We show improvements by applying spell checking techniques to unknown words and creating a lattice with the best known spelling equivalents. We also used a small cleaned corpus to train a cleaning model that we applied to the noisy corpora.
متن کاملRapid-deployment text-to-speech in the DIPLOMAT system
The DIPLOMAT project at Carnegie Mellon University instantiates a program of rapid-deployment speech-to-speech machine translation; we have developed techniques for quickly producing text-to-speech (TTS) systems for new target languages to support this work. While the resulting systems are not immediately of comparable quality to commercial systems on unrestricted tasks in well-developed langua...
متن کاملWhen Is An Embedded MT System "Good Enough" For Filtering?
This paper proposes an end-to-end process analysis template with replicable measures to evaluate the filtering performance of a Scan-OCR-MT system. Preliminary results 1 across three language-specific FALCon 2 systems show that, with one exception, the derived measures consistently yield the same performance ranking: Haitian Creole at the low end, Arabic in the middle, and Spanish at the high e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010