Parsing Deficiencies of the Pc-kimmo System
نویسندگان
چکیده
In this paper, we discuss the possibilities and limitations of the PC-KIMMO system as a recognition device of compound formations in a language like Modern Greek, where compounding interacts with derivation, inflection and lexical phonology. We deal with the computational processing of nominal and verbal compounds and try to show certain limitations of the PCKIMMO software with respect to the principles of compound formation. Compounds are parsed into their structural constituents that are morphemes (i.e. stems and affixes) or words, depending on the case. Stress is also taken into consideration since compounds display peculiar stress properties which are different from other word-stress properties. In particular, we show that stress and syllabification that are crucial for the analysis of such constructions cannot be dealt with in a satisfactory way. 1 Morpho-phonological parsing with PC-KIMMO v.2 PC-KIMMO is a morphological parser based on the model of two-level morphology ([10], [11]). The model distinguishes between the word’s morphotactics that specify its morpheme constituents in the particular order into which they occur, and the word’s morphophonemics which account for the different orthographic forms of the morphemes. In its original conception ([10], [11]), the two-level model segments the word in its constituent parts, and accounts for word-internal phonology and orthography by means of declarative two-level rules expressing correspondences that hold between a lexical and a surface form. These two-level rules apply in parallel, and do not allow any intermediate levels of representation. Because of their relational character (i.e., they represent correspondences between surface and lexical forms) they are bi-directional. Two-level rules are implemented as finite state transducers. A finite state transducer (FST) functions like a finite state automaton but it operates on two input strings. The label on the arc of an FST consists of a valid correspondence pair of symbols of the two input strings1. The lexicon incorporating the morphotactics consists of a list of morphemes. Each lexical entry is characterized by its grammatical category, its morpho-syntactic features, a gloss (additional information), and an alternation index specifying the list of alternative morphemes that may be combined with it. Lexical entries are generally grouped into sublexica, depending on their grammatical category. (1) lists the sublexica used for Greek: (1) N (noun), V (verb), ADJ (adjective), DET (determiner), P (preposition), PR (pronoun), ADV (non inflected adverb), CONJ (conjunction), IJ (interjection), PART (particle), CLITIC, ADI (inflected adverb), PRI (inflected pronoun), DAF (derivational suffix), PREFIX, SUFFIX, INFL (inflectional ending). An example of a lexical entry of the sublexicon of nouns is given in (2): (2) άνθρωπ[anθrop] «man»
منابع مشابه
Greek Compounds : A challenging case for the parsing techniques of PC - KIMMO v . 2
In this paper we describe the recognition process of Greek compound words using the PC-KIMMO software. We try to show certain limitations of the system with respect to the principles of compound formation in Greek. Moreover, we discuss the computational processing of phenomena such as stress and syllabification which are indispensable for the analysis of such constructions and we try to propose...
متن کاملA Freely Available Wide Coverage Morphological Analyzer for English
This paper presents a morphological lexicon for English tha t handle more than 317000 inflected forms derived from over 90000 stems. The lexicon is available in two formats. The first can be used by an implementation of a two-level processor for morphological analysis (Karttunen and Wit tenhurg, 1983; Antworth, 1990). The second, derived from the first one for efficiency reasons, consists of a ...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002