The Extended Lexicon: Language Processing as Lexical Description

نویسنده

  • Roger Evans
چکیده

In this paper we introduce an approach to lexical description which is sufficiently powerful to support language processing tasks such as part-of-speech tagging or sentence recognition, traditionally considered the province of external algorithmic components. We show how this approach can be implemented in the lexical description language, DATR, and provide examples of modelling extended lexical phenomena. We argue that applying a modelling approach originally designed for lexicons to a wider range of language phenomena brings a new perspective to the relationship between theory-based and empirically-based approaches to language processing. 1 The Extended Lexicon A lexicon is essentially a structured description of a set of lexical entries. One of the first tasks when developing a lexicon is to decide what the lexical entries are. This task has two dimensions: what kind of linguistic object does a lexical entry describe, and what does it say about it. So for example, one might decide to produce a lexicon which describes individual word instances, and provides the orthographic form and part-of-speech tag for each form. It is the first of these dimensions that is most relevant to the idea of the Extended Lexicon. Conventionally, there are two main candidates for the type of linguistic object described by a lexicon: word forms (such as sings, singing, sang1), corresponding to actual words in a text and lexemes (such as SING, WALK, MAN), describing abstract words, from which word forms are somehow derived. Choosing between these two candidates Typographical conventions for object types: ABSTRACT, LEXEME, wordform, instance, code. Figure 1: A simple inheritance-based lexicon might be a matter of theoretical disposition, or a practical consideration of how the lexicon is populated or used. In the Extended Lexicon, we introduce a third kind of linguistic object, called word instances (or just instances), consisting of word forms as they occur in strings (sequences of words, typically sentences). For example, a string such as the cats sat on the mat contains two distinct instances of the word the. the cats slept contains further (distinct) instances of the and cats. However the instances in a repetition of the cats sat on the mat are the same as those in the original (because instances are defined relative to strings, that is, string types not string tokens). So in an extended lexicon, the lexical entries are word instances, and the lexicon itself is a structured description of a set of word instances. In order to explore this notion in more detail, it is helpful to introduce a more specific notion of a ‘structured description’. We shall use an inheritancebased lexicon, in which there are internal abstract ‘nodes’ representing information that is shared by several lexical entries and inherited by them. Figure 1 shows the structure of a simple inheritancebased lexicon with some abstract high-level structure (CATEGORY, VERB, NOUN), then a layer of lexemes (WALK, TALK, HOUSE, BANK), and below that a layer of word forms (walks, walking,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

Lexical Acquisition

This paper describes several aspects of the process of lexical acquisition for one of the most comprehensive applications in natural language processing knowledge-based machine translation. Specifically, the paper concentrates on those components of the lexicon that centrally relate to the task of processing meaning. The usability of the comprehensive ontological semantic lexicon exceeds NLP an...

متن کامل

Valency Lexicon of Czech Verbs: Towards Formal Description of Valency and Its Modeling in an Electronic Language Resource

Valency refers to the capacity of verb (or a word belonging to another part of speech) to take a specific number and type of syntactically dependent language units. Valency information is thus related to particular lexemes and as such it is necessary to describe valency characteristics for separate lexemes in the form of lexicon entries. A valency lexicon is indispensable for any complex Natura...

متن کامل

A Stylistic Analysis of Lexicon in Ray Bradbury’s The Martian Chronicles

Ray Bradbury’s The Martian Chronicles is a futuristic, science fiction novel that chronicles the colonization of Mars by humans, projecting the United States’ colonial and immigrant past on to a symbolic future. Bradbury’s use of language is mostly picturesque and sensory. The present paper applies a text-oriented analysis of stylistic elements that construct meaning in the text and evoke the n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013