Merging Data Resources for Inflectional and Derivational Morphology in Czech
نویسندگان
چکیده
The paper deals with merging two complementary resources of morphological data previously existing for Czech, namely the inflectional dictionary MorfFlex CZ and the recently developed lexical network DeriNet. The MorfFlex CZ dictionary has been used by a morphological analyzer capable of analyzing/generating several million Czech word forms according to the rules of Czech inflection. The DeriNet network contains several hundred thousand Czech lemmas interconnected with links corresponding to derivational relations (relations between base words and words derived from them). After summarizing basic characteristics of both resources, the process of merging is described, focusing on both rather technical aspects (growth of the data, measuring the quality of newly added derivational relations) and linguistic issues (treating lexical homonymy and vowel/consonant alternations). The resulting resource contains 970 thousand lemmas connected with 715 thousand derivational relations and is publicly available on the web under the CC-BY-NC-SA license. The data were incorporated in the MorphoDiTa library version 2.0 (which provides morphological analysis, generation, tagging and lemmatization for Czech) and can be browsed and searched by two web tools (DeriNet Viewer and DeriNet Search tool).
منابع مشابه
A Procedure for Word Derivational Processes Concerning Lexicon Extension in Highly Inflected Languages
The aim of this paper is to describe an efficient tool (I PAR) for a supervised and semi-automatic extension of a lexicon or morphological database and its easy updating. We will present the underlying algorithms and their implementation that are general enough to capture the main word-forming processes (both inflectional and derivational). They are designed for languages with a rich inflection...
متن کاملHindi Derivational Morphological Analyzer
Hindi is an Indian language which is relatively rich in morphology. A few morphological analyzers of this language have been developed. However, they give only inflectional analysis of the language. In this paper, we present our Hindi derivational morphological analyzer. Our algorithm upgrades an existing inflectional analyzer to a derivational analyzer and primarily achieves two goals. First, ...
متن کاملRelations between Inflectional and Derivation Patterns
One of the main goals of this paper is to describe a formal procedure linking inflectional and derivational processes in Czech and to indicate that they can be, if appropriate tools and resources are used, applied to other Slavonic languages. The tools developed at the NLP Laboratory FI MU, have been used, particularly the morphological analyser ajka and the program I par for processing and mai...
متن کاملDerivational Relations in Czech WordNet
In the paper we describe enriching Czech WordNet with the derivational relations that in highly inflectional languages like Czech form typical derivational nests (or subnets). Derivational relations are mostly of semantic nature and their regularity in Czech allows us to add them to the WordNet almost automatically. For this purpose we have used the derivational version of morphological analyze...
متن کاملBuilding Czech Wordnet
This paper describes the process of building Czech wordnet. We give the enumeration of the resources and tools used for this purpose and characterize so far obtained results. There are some problems with Czech as a synthetic language, with its rich inflectional morphology and word derivation. They are mentioned below and some solutions are suggested. The necessary resources for building Czech w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016