ProAlign: Shared Task System Description

نویسندگان

  • Dekang Lin
  • Colin Cherry
چکیده

ProAlign combines several different approaches in order to produce high quality word word alignments. Like competitive linking, ProAlign uses a constrained search to find high scoring alignments. Like EM-based methods, a probability model is used to rank possible alignments. The goal of this paper is to give a bird’s eye view of the ProAlign system to encourage discussion and comparison. 1 Alignment Algorithm at a Glance We have submitted the ProAlign alignment system to the WPT’03 shared task. It received a 5.71% AER on the English-French task and 29.36% on the RomanianEnglish task. These results are with the no-null data; our output was not formatted to work with explicit nulls. ProAlign works by iteratively improving an alignment. The algorithm creates an initial alignment using search, constraints, and summed φ correlation-based scores (Gale and Church, 1991). This is similar to the competitive linking process (Melamed, 2000). It then learns a probability model from the current alignment, and conducts a constrained search again, this time scoring alignments according to the probability model. The process continues until results on a validation set begin to indicate over-fitting. For the purposes of our algorithm, we view an alignment as a set of links between the words in a sentence pair. Before describing the algorithm, we will define the following notation. Let E be an English sentence e1, e2, . . . , em and let F be a French sentence f1, f2, . . . , fn. We define a link l(ei, fj) to exist if ei and fj are a translation (or part of a translation) of one another. We define the null link l(ei, f0) to exist if ei does not correspond to a translation for any French word in F . The null link l(e0, fj) is defined similarly. An alignment A for two sentences E and F is a set of links such that every word in E and F participates in at least one link, and a word linked to e0 or f0 participates in no other links. If e occurs in E x times and f occurs in F y times, we say that e and f co-occur xy times in this sentence pair. ProAlign conducts a best-first search (with constant beam and agenda size) to search a constrained space of possible alignments. A state in this space is a partial alignment, and a transition is defined as the addition of a single link to the current state. Any link which would create a state that does not violate any constraint is considered to be a valid transition. Our start state is the empty alignment, where all words in E and F are implicitly linked to null. A terminal state is a state in which no more links can be added without violating a constraint. Our goal is to find the terminal state with the highest probability. To complete this algorithm, one requires a set of constraints and a method for determining which alignment is most likely. These are presented in the next two sections. The algorithm takes as input a set of English-French sentence pairs, along with dependency trees for the English sentences. The presence of the English dependency tree allows us to incorporate linguistic features into our model and linguistic intuitions into our constraints.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HOO 2012 Shared Task: UKP Lab System Description

In this paper, we describe the UKP Lab system participating in the HOO 2012 Shared Task on preposition and determiner error correction. Our focus was to implement a highly flexible and modular system which can be easily augmented by other researchers. The system might be used to provide a level playground for subsequent shared tasks and enable further progress in this important research field o...

متن کامل

The Effect of Different Task Types on Learning Prepositions in Form–Focused and Meaning–Focused Interaction Enhancement-Based Classes

The current study examines the impact of different task types on learning prepositions in form and meaning- focused interaction enhancement- based classes. The participants were 57 second Year University students enrolled in three intact lab classes at Tabriz Islamic Azad University.  The first group was provided with form-focused interaction enhancement, the second with the meaning-focused int...

متن کامل

A Shared Task on Multimodal Machine Translation and Crosslingual Image Description

This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submi...

متن کامل

CoNLL-2013 Shared Task: Grammatical Error Correction NTHU System Description

Grammatical error correction has been an active research area in the field of Natural Language Processing. This paper describes the grammatical error correction system developed at NTHU in participation of the CoNLL-2013 Shared Task. The system consists of four modules in a pipeline to correct errors related to determiners, prepositions, verb forms and noun number. Although more types of errors...

متن کامل

Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description

We describe our present system for language identification as a part of the EMNLP 2016 Shared Task. We were provided with the Spanish-English corpus composed of tweets. We have employed a predictor-corrector algorithm to accomplish the goals of this shared task and analyzed the results obtained.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003