Combining Different Features of Idiomaticity for the Automatic Classification of Noun+Verb Expressions in Basque

نویسندگان

  • Antton Gurrutxaga
  • Iñaki Alegria
چکیده

We present an experimental study of how different features help measuring the idiomaticity of noun+verb (NV) expressions in Basque. After testing several techniques for quantifying the four basic properties of multiword expressions or MWEs (institutionalization, semantic non-compositionality, morphosyntactic fixedness and lexical fixedness), we test different combinations of them for classification into idioms and collocations, using Machine Learning (ML) and feature selection. The results show the major role of distributional similarity, which measures compositionality, in the extraction and classification of MWEs, especially, as expected, in the case of idioms. Even though cooccurrence and some aspects of morphosyntactic flexibility contribute to this task in a more limited measure, ML experiments make benefit of these sources of knowledge, allowing to improve the results obtained using exclusively distributional similarity features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Verb Noun Construction MWE Token Classification

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. We present a supervised learning approach to the problem. We experiment with different features. Our approach yields the best results to date on MWE c...

متن کامل

Unsupervised Classification of Verb Noun Multi-Word Expression Tokens

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...

متن کامل

Automatic Extraction of NV Expressions in Basque: Basic Issues on Cooccurrence Techniques

Taking as a starting-point the development on cooccurrence techniques for several languages, we focus on the aspects that should be considered in a NV extraction task for Basque. In Basque, NV expressions are considered those combinations in which a noun, inflected or not, is co-occurring with a verb, as erabakia hartu (‘to make a decision’), kontuan hartu (‘to take into account’) and buruz jak...

متن کامل

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...

متن کامل

Automatic classification of normal and abnormal cardiac sounds by combining features based on wavelet transform and capstral coefficients extracted from PCG signals (Research Article)

Cardiac sounds are produced by the mechanical activities of the heart and provide useful information about the function of the heart valves. Due to the transient and unstable nature of the heart's sound and the limitation of the human hearing system, it is difficult to categorize heart sound signals based on what is heard from a stethoscope. Therefore, providing an automated algorithm for prima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013