Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons

نویسندگان

Nikola Ljubesic

Miquel Esplà-Gomis

Filip Klubicka

Nives Mikelic Preradovic

چکیده

In this paper we describe a semi-automated approach to extend morphological lexicons by defining the prediction of the correct inflectional paradigm and the lemma for an unknown word as a supervised ranking task trained on an already existing lexicon. While most ranking approaches rely only on heuristics based on a single information source, our predictor uses hundreds of features calculated on the candidate stem, corpus evidence and statistics calculated from the existing lexicon. On the example of the Croatian language we show that our approach significantly outperforms a heuristic-based baseline, yielding correct candidates in 77% of cases on the first position and in 95% of cases on the first five positions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deriving Morphological Analyzers from Example Inflections

This paper presents a semi-automatic method to derive morphological analyzers from a limited number of example inflections suitable for languages with alphabetic writing systems. The system we present learns the inflectional behavior of morphological paradigms from examples and converts the learned paradigms into a finite-state transducer that is able to map inflected forms of previously unseen...

متن کامل

Automatic Lexical Acquisition for German Based on Morphological Paradigms Diploma Thesis Proposal

The general aim of my diploma thesis is to develop a (semi-)automatic method for the acquisition of a German inflectional lexicon from raw texts. In particular, I want to explore whether inflectional stems can be deduced from word-form occurences that fit into known morphological paradigm classes.

متن کامل

An Approach to Lexical Development for Inflectional Languages

We describe a method for the semi-automatic development of morphological lexicons. The method aims at using minimal pre-existing resources and only relies upon the existence of a raw text corpus and a database of inflectional classes. No lexicon or list of base forms is assumed. The method is based on a contrastive approach, which generates hypothetical entries based on evidence drawn form a co...

متن کامل

The 300k LIMSI German broadcast news transcription system

This paper describes improvements to the existing LIMSI German broadcast news transcription system, especially its extension from a 65k vocabulary to 300k words. Automatic speech recognition for German is more problematic than for a language such as English in that the inflectional morphology of German and its highly generative process of compounding lead to many more out of vocabulary words fo...

متن کامل

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lex...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons

نویسندگان

چکیده

منابع مشابه

Deriving Morphological Analyzers from Example Inflections

Automatic Lexical Acquisition for German Based on Morphological Paradigms Diploma Thesis Proposal

An Approach to Lexical Development for Inflectional Languages

The 300k LIMSI German broadcast news transcription system

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

عنوان ژورنال:

اشتراک گذاری