Learning to Parse Bilingual Sentences Using Bilingual Corpus and Monolingual CFG

نویسندگان

  • Chung-Chi Huang
  • Jason S. Chang
چکیده

Abstract We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of individual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical translation probability based on a word-aligning strategy and inferring probabilities for CFG rules. At runtime, a bottom-up CYK-styled parser is employed to construct the most probable bilingual parse tree for any given sentence pair. We also describe an implementation of the proposed method. The experimental results indicate the proposed model produces word alignments better than those produced by Giza++, a state-of-the-art word alignment system, in terms of alignment error rate and F-measure. The bilingual parse trees produced for the parallel corpus can be exploited to extract bilingual phrases and train a decoder for statistical machine translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metalinguistic Awareness and Bilingual vs. Monolingual EFL Learners: Evidence from a Diagonal Bilingual Context

This paper reports a study of 85 Iranian EFL learners in the English Language Department of Urmia University. It explores the possible differences between performance of 38 Persian monolingual and 47 Turkish-Persian bilingual EFL learners on metalinguistic tasks of ungrammatical structures and translation. The underlying hypothesis is that bilinguals in diagonal bilingual contexts experience a ...

متن کامل

The Use of Hedges and Boosters in Monolingual and Bilingual EFL Learners’ Academic Writings: The Case of Iranian Male and Female Post-graduate MA Articles

Expressing doubt and certainty in academic writings requires a cautious use of hedges and boosters. Despite their importance in academic writing, little is known about how they are used in monolingual and bilingual male and female EFL learners’ academic writings. To shed some lights on the issue, the present study investigated the use of hedges and boosters in research articles written by monol...

متن کامل

Learning a second language and working memory: the role of bilinguali

the purpose of the present study was to evaluate the function of working memory in bilingual, monolingual children and children with learning disorder. The research project was of comparative causality type. Participants included 60 monolingual children, 34 children with learning disabilities and 62 bilingual children. Which completed the Wechsler Intelligence Scale and preschool children's Wec...

متن کامل

Machine Learning Approaches for Dealing with Limited Bilingual Training Data in Statistical Machine Translation

Statistical Machine Translation (SMT) models learn how to translate by examining a bilingual parallel corpus containing sentences aligned with their human-produced translations. However, high quality translation output is dependent on the availability of massive amounts of parallel text in the source and target languages. There are a large number of languages that are considered low-density, ei...

متن کامل

Towards Bilingual Term Extraction in Comparable Patents

In order to extract bilingual terms in a corpus of comparable patents, we present a novel framework in this paper. The framework includes the following major steps: 1) extract monolingual single-word and multi-word term candidates in monolingual patents; 2) Find parallel sentences in comparable patents; 3) extract bilingual single-word and multi-word term candidates; 4) identify correct bilingu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006