manipulative transliteration

Mining Transliterations from Wikipedia Using Pair HMMs

2010

Peter Nabende

This paper describes the use of a pair Hidden Markov Model (pair HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair HMM variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identified and used in estimating transliteration similarity. The system resulted in a pre...

متن کامل

Developing the Transliteration Interface for Arabic Text

2013

Mohammed Messaoudi Abdulsamad Al-Marghilani Hussein Zedan Aladdin Ayesh Fouzi Harrag Aboubekeur Hamdi-Cherif Abdul Malik S. Al-Salman Eyas El-Qawasmeh Aitao Chen Mohammed Elzubeir

In the Arabic-English and English-Arabic translation activities, the interface is very significant. For translation in the Arabic language, many issues need to be addressed. The existing systems have some problems and research has been initiated to improve. Transliteration is an important component of the translation. We in this study propose a system of interface for Arabic transliteration. Th...

متن کامل

English-Chinese Transliteration Word Pair Extraction from Parallel Corpora

Journal: :Int. J. Comput. Proc. Oriental Lang. 2008

Chengguo Jin Seung-Hoon Na Dong-Il Kim Jong-Hyeok Lee

Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...

متن کامل

Integrating Output from Specialized Modules in Machine TranslationTransliterations in Joshua

Journal: :Prague Bull. Math. Linguistics 2010

Ann Irvine Mike Kayser Zhifei Li Wren N. G. Thornton Chris Callison-Burch

In many cases in SMT we want to allow specialized modules to propose translation fragments to the decoder and allow them to compete with translations contained in the phrase table. Transliteration is one module that may produce such specialized output. In this paper, as an example, we build a specialized Urdu transliteration module and integrate its output into an Urdu–English MT system. The mo...

متن کامل

A Statistical Approach to Chinese-to-English Back-Transliteration

2003

Chun-Jen Lee Jason S. Chang Jyh-Shing Roger Jang

This paper describes a statistical approach for modeling Chinese-to-English back-transliteration. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between source and target words. The parameters of the proposed model are automatically learned from ...

متن کامل

Performance Improvement Of Bengali Text Compression Using Transliteration And Huffman Principle

2016

Md. Mamun Hossain Ahsan Habib Mohammad Shahidur Rahman

In this paper, we propose a new compression technique based on transliteration of Bengali text to English. Compared to Bengali, English is a less symbolic language. Thus transliteration of Bengali text to English reduces the number of characters to be coded. Huffman coding is well known for producing optimal compression. When Huffman principal is applied on transliterated text significant perfo...

متن کامل

Moses-based official baseline for NEWS 2016

2016

Marta R. Costa-Jussà

Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from l...

متن کامل

Transliterating From All Languages

2010

Ann Irvine Chris Callison-Burch Alexandre Klementiev

Much of the previous work on transliteration has depended on resources and attributes specific to particular language pairs. In this work, rather than focus on a single language pair, we create robust models for transliterating from all languages in a large, diverse set to English. We create training data for 150 languages by mining name pairs from Wikipedia. We train 13 systems and analyze the...

متن کامل

Transliteration as Constrained Optimization

2008

Dan Goldwasser Dan Roth

This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (ws, wt) in the source and target language, a classifier is trained to determine if wt is the transliteration of ws. This paper shows that the transliteration problem can be formulated...

متن کامل

Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration

2011

Min Zhang Xiangyu Duan Ming Liu Yunqing Xia Haizhou Li

In this paper, we first carry out an investigation on two existing pivot strategies for statistical machine transliteration, namely system-based and model-based strategies, to figure out the reason why the previous model-based strategy performs much worse than the system-based one. We then propose a joint alignment algorithm to optimize transliteration alignments jointly across source, pivot an...

متن کامل