Automatic Detection of Collocation

نویسندگان

  • Jiangsheng Yu
  • Zhihui Jin
  • Zhenshan Wen
چکیده

Collocation is a very important relation between words, which can be widely applied to semantic parsing (e.g., word sense disambiguation), machine translation (e.g., automatic alignment of bilingual corpus), computational lexicon, etc. Firstly, we summarized the methods of likelihood interval, likelihood ratio test, u test and χ test for collocation theoretically, and then utilized them to extract the collocations from a large scale corpus automatically. By experiment (some results are listed in the appendix), the relationship between the statistical models are explored and analyzed. Some further researches are discussed in the conclusion. The corpus we used is a half year collection of People’s Daily with segmentation and POS tagging, which contains at least 1,103,455 Chinese sentences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards advanced collocation error correction in Spanish learner corpora

Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocati...

متن کامل

Writing assistants and automatic lexical error correction: word combinatorics

Genuine lexical writing assistants that attempt to detect lexical errors such as miscollocations are traditionally less common in Computer Assisted Language Learning than spell and grammar checkers. However, there is empirical evidence of the importance of capturing and correcting miscollocations in the writings of language learners, and therefore an increasing number of proposals deals with th...

متن کامل

Reducing Light Change Effects in Automatic Road Detection

Automatic road extraction from aerial images can be very helpful in traffic control and vehicle guidance systems. Most of the road detection approaches are based on image segmentation algorithms. Color-based segmentation is very sensitive to light changes and consequently the change of weather condition affects the recognition rate of road detection systems. In order to reduce the light change ...

متن کامل

Exploiting a learner corpus for the development of a CALL environment for learning Spanish collocations

This paper provides an insight into ongoing research focusing on the exploitation of data from learner corpus in order to enhance the performance of an automatic tool aimed at the correction of collocation errors of L2 Spanish speakers. The procedure adopted for collocation annotation is described together with the main difficulties involved in the annotation task, such as the problem of distin...

متن کامل

Reducing Light Change Effects in Automatic Road Detection

Automatic road extraction from aerial images can be very helpful in traffic control and vehicle guidance systems. Most of the road detection approaches are based on image segmentation algorithms. Color-based segmentation is very sensitive to light changes and consequently the change of weather condition affects the recognition rate of road detection systems. In order to reduce the light change ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003