manipulative transliteration

Transliterated Named Entity Recognition Based on Chinese Word Sketch

Journal: :Int. J. Comput. Proc. Oriental Lang. 2008

Petr Simon Chu-Ren Huang Shu-Kai Hsieh Jia-Fei Hong

One of the unique challenges to Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name transliterations can vary greatly between PRC and Taiwan. This situation poses a serious problem for NLP tasks: including data mining, translation and information retrieval. In this paper, we introduce a novel approach to...

متن کامل

Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs

Journal: :Int. J. Comput. Proc. Oriental Lang. 2002

Keita Tsuji

The method to automatically extract translational Japanese-KATAKANA and English word pairs from bilingual corpora is proposed. The method applies all the existing transliteration rules to each mora unit in a KATAKANA word, and extract English word which matched or partially-matched to one of these transliteration candidates as translation. For instance, if there is a word ‘グラフ’ (graph) in Japan...

متن کامل

Learning Transliteration Lexicons from the Web

2006

Jin-Shea Kuo Haizhou Li Ying-Kuei Yang

This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervis...

متن کامل

A Deep Learning Approach to Machine Transliteration

2009

Thomas Deselaers Sasa Hasan Oliver Bender Hermann Ney

In this paper we present a novel transliteration technique which is based on deep belief networks. Common approaches use finite state machines or other methods similar to conventional machine translation. Instead of using conventional NLP techniques, the approach presented here builds on deep belief networks, a technique which was shown to work well for other machine learning problems. We show ...

متن کامل

Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

2008

Bruno Pouliquen

Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a un...

متن کامل

Data representation methods and use of mined corpora for Indian language transliteration

2015

Anoop Kunchukuttan Pushpak Bhattacharyya

Our NEWS 2015 shared task submission is a PBSMT based transliteration system with the following corpus preprocessing enhancements: (i) addition of wordboundary markers, and (ii) languageindependent, overlapping character segmentation. We show that the addition of word-boundary markers improves transliteration accuracy substantially, whereas our overlapping segmentation shows promise in our prel...

متن کامل

Can Chinese Phonemes Improve Machine Transliteration?: A Comparative Study of English-to-Chinese Transliteration Models

2009

Jong-Hoon Oh Kiyotaka Uchimoto Kentaro Torisawa

Inspired by the success of English grapheme-to-phoneme research in speech synthesis, many researchers have proposed phoneme-based English-to-Chinese transliteration models. However, such approaches have severely suffered from the errors in Chinese phoneme-to-grapheme conversion. To address this issue, we propose a new English-to-Chinese transliteration model and make systematic comparisons with...

متن کامل

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

2009

Rejwanul Haque Sandipan Dandapat Ankit K. Srivastava Sudip Kumar Naskar Andy Way

This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...

متن کامل

Hindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009

2009

Rejwanul Haque Sandipan Dandapat Ankit Kumar Srivastava Sudip Kumar Naskar

This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...

متن کامل

POS Tagging of English-Hindi Code-Mixed Social Media Content

2014

Yogarshi Vyas Spandana Gella Jatin Sharma Kalika Bali Monojit Choudhury

Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling variations, transliteration and non-adherance to formal grammar. We describe our initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums, an...

متن کامل