Limits to pattern-matching, finite-state techniques on lexical analysis for multiword expressions: the case for compound adverbs in Spanish
نویسندگان
چکیده
Pattern-matching finite-state techniques applied to lexical analysis of multiword expressions have received a significant boost from the possibility of intersecting lexical information encoded in lexicongrammar matrices and reference graphs. However, these methods show important limitations in real-life applications. This paper aims at assessing and describing the main drawbacks of this technique when applied to Spanish compound adverbs.
منابع مشابه
A Lexical Interface for Finite-State Syntax
This document describes the lexical interface for nite-state syntax as it is currently implemented and used for the development of the French constraint grammar. The system includes nite-state transducers for multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes general multiword expressions such as dates or idioms. The tokeniser includes a n...
متن کاملSpanish Adverbial Frozen Expressions
This paper presents an electronic dictionary of Spanish adverbial frozen expressions. It focuses on their formal description in view of natural language processing and presents an experiment on the automatic application of this data to real texts using finite-state techniques. The paper makes an assessment of the advantages and limitations of this method for the identification of these multiwor...
متن کاملA Non-deterministic Tokeniser for Finite-State Parsing
This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.
متن کاملManaging Multiword Expressions in a Lexicon-Based Sentiment Analysis System for Spanish
This paper describes our approach to managing multiword expressions in Sentitext, a linguistically-motivated, lexicon-based Sentiment Analysis (SA) system for Spanish whose performance is largely determined by its coverage of MWEs. We defend the view that multiword constructions play a fundamental role in lexical Sentiment Analysis, in at least three ways. First, a significant proportion convey...
متن کاملCompound Temporal Adverbs in Portuguese and in Spanish
This paper reports on an ongoing research on temporal adverbs and deals with the problem of processing a family of Portuguese and Spanish compound temporal adverbs, in a contrastive approach, aiming at building finite state transducers to translate them from one language into the other. Because of the large number of combinations involved and their complexity, it is not easy to list them in ful...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007