Empirical Methods for Compound Splitting

نویسندگان

  • Philipp Koehn
  • Kevin Knight
چکیده

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing of Swedish Compounds for Phrase-Based Statistical Machine Translation

We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We investigated two ways of handling compound parts, by marking them as compound parts or by normalizing them to ...

متن کامل

German Compounds in Factored Statistical Machine Translation

An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds ...

متن کامل

Assessment of manning’s resistance coefficient in compound channels

In this paper twelve different empirical resistance coefficients expressed in terms of Manning's roughness are used apiece in seventeen known compositing methods. The data obtained from ten different cross-sections of the Sefidrood River, Iran, are used for the evaluation of the empiricalformulas. The present case-study is selected from a reach with gravel bed topology. Then no remarkable bed f...

متن کامل

Influence of accurate compound noun splitting on bilingual vocabulary extraction

The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected ...

متن کامل

Investigating Different Methods of Closed Shell Pistachios Splitting and Effects of Freezing Prior to Drying on Shell Splitting Percentage

In this study, different methods for shell splitting and the effect of freezing prior to drying on shell splitting percentage of pistachio were investigated. A completely randomized design was used to investigate the effects of different freezing temperatures (0, -6, -12 and -18°C), different drying temperatures (80, 90 and 100°C) and different cultivars (Akbari and Kalehghouchi) on shell split...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003