Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer
نویسندگان
چکیده
Recently, many studies have been focused on extracting transliteration pairs from bilingual texts. Most of these studies are based on the statistical transliteration model. The paper discusses the limitations of previous approaches and proposes novel approaches called dynamic window and tokenizer to overcome these limitations. Experimental results show that the average rates of word and character precision are 99.0% and 99.78%, respectively.
منابع مشابه
English-Chinese Transliteration Word Pair Extraction from Parallel Corpora
Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...
متن کاملRegularity and Flexibility in English-Chinese Name Transliteration
This paper reflects on the nature of English-Chinese personal name transliteration and the limitations of state-of-the-art language-independent automatic transliteration generation systems. English-Chinese name pairs from various sources were analysed and the complex interaction of factors in transliteration is discussed. Proposals are made for fuller error analysis in shared tasks and for expa...
متن کاملAutomatic Extraction of Translational Japanese-KATAKANA and English Word Pairs
The method to automatically extract translational Japanese-KATAKANA and English word pairs from bilingual corpora is proposed. The method applies all the existing transliteration rules to each mora unit in a KATAKANA word, and extract English word which matched or partially-matched to one of these transliteration candidates as translation. For instance, if there is a word ‘グラフ’ (graph) in Japan...
متن کاملMining Transliterations from Wikipedia using Dynamic Bayesian Networks
Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...
متن کاملExtracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries
Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008