Teaching tools for logic-based grammar development

نویسندگان

  • Michael Moortgat
  • Richard Moot
  • Dick Oehrle
  • Willemijn Vermaat
  • D. Oehrle
چکیده

A well-known slogan in language technology is ‘parsing-as-deduction’: syntax and meaning analysis of a text takes the form of a mathematical proof. Developers of language technology (and students of computational linguistics) want to visualize these mathematical objects, and their dynamic unfolding, in a variety of formats. We discuss a language engineering environment for type-logical computational grammars. The kernel is a theorem prover, implemented in the logic-programming language Prolog. The kernel produces LTEX source code for its internal computations. The front-end displays these in a number of user-defined typeset formats. We report on our work to make the kernel accessible over the web via dynamic PDF documents. This paper discusses some uses of the dynamic possibilities offered by Sebastian Rahtz’ hyperref package in the context of a courseware project we have been engaged in. The project provides a grammar development environment for Type-Logical Grammar — one of the formalisms that are currently used in computational linguistics. Our paper is organized as follows. First, we offer the reader a glimpse of what type-logical grammars look like. In the next section, we discuss the TEX-based visualisation tools of the Grail workbench as it was originally developed for use on a unix platform. Finally, we report on our current efforts to provide browser-based access to the Grail kernel via dynamic PDF documents. 2 M. Moortgat, R. Moot and D. Oehrle 1. Type-logical grammar Type-logical (TLG) grammar is a logic-based computational formalism that grew out of the work of the mathematician Jim Lambek in the late Fifties. The seminal paper [1] is still highly readable; the paper is available electronically for those who don’t have easy access to issues of the American Mathematical Monthly in the pre-TEX era. [3] gives an up-to-date survey of the field. The mathematically-inclined TEX user will easily appreciate why it is such a pleasure to work with TLG. As the name suggests, TLG has strong type-theoretic connections. One could think of it as a functional programming language with some special features to handle the peculiarities of natural (as opposed to programming) languages. In a functional language (say, Haskell), expressions are typed. There is some inventory of basic types (integers, booleans, ...); from types T, T ′ one can form functional types T → T . With these functional types, one can do two things. An expression/program of type T → T ′ can be used to compute an expression of type T ′ by applying it to an argument of the appropriate type T . Or a program of type T → T ′ can be obtained by abstracting over a variable of type T in an expression of type T . Below we give a simple example: the construction of a square function out of a built-in times function. We present this as a logical derivation — the beautiful insight of Curry allows us to freely switch perspective between types and logical formulas, and between type computations and logical derivations in a constructive logic (Positive Intuitionistic Logic). times : Int → (Int → Int) x : Int (times x) : Int → Int (Elim →) x : Int (times x x) : Int (Elim →) λx.(times x x) : Int → Int (Intro →) How can we transfer these ideas to the field of natural language grammars? The basic types in this setting are for expressions one can think of as ‘complete’ in some intuitive sense — one could have a type np for names (‘Donald Knuth’, ‘the author of The Art of Computer Programming ’, ...), common nouns n (‘author’, ‘art’, ...), sentences s (‘Knuth wrote some books’, ‘TEX is necessary’, ...). Now, where a phrase-structure grammar would have to add a plethora of non-terminals to handle incomplete expressions, in TLG we use functional (implicational) types for these. A determiner like ‘the’ is typed as a function from n expressions (like ‘author’) to np expressions; a verb phrase (like ‘is necessary’) as a function from np expressions into s expressions, and so on. Type-logical Grammars and LTEX 3 To adjust the type-logical approach to the natural language domain, we have to introduce two refinements. The syntax of our programming language example obeys the martial law of Polish prefix notation: functions are put before their arguments. Natural languages are not so disciplined: a determiner (in English) comes before the noun it combines with; a verb phrase follows its subject. Instead of one implication, TLG has two to capture these word-order distinctions: an expression of type T/T ′ is prefixed to its T -type argument; an expression T \T is suffixed to it. An example is given below. (The product ◦ is the explicit structure-building operation that goes with use of the slashes. It imposes a tree structure on the derived sentence.) mathematicians ` np like ` (np\s)/np TEX ` np like ◦ TEX ` np\s [/E] mathematicians ◦ (like ◦ TEX) ` s [\E] The second refinement has to do with the management of ‘programming resources’. In our Haskell-style example, one can use resources as many times as one wants (or not use them at all). You see an illustration in the last step of the derivation, where two occurrences of x : Int are withdrawn simultaneously. In natural language, such a cavalier attitude towards occurrences would not be a good idea: a well-formed sentence is not likely to remain well-formed if you remove some words, or repeat some. (You will agree that ‘mathematicians like’ does not convey the message that mathematicians like mathematicians.) Our grammatical type-logic, in other words, insists that every resource is used exactly once. And in addition to resource-sensitivity, there may be certain structural manipulations that are allowable in one language as opposed to another. To control these, there is a module of non-logical axioms (so-called structural postulates) in addition to the logical rules for the slashes. The derivation below contains such a structural move: the inference labeled P2 which uses associativity to rebracket the antecedent tree. At this point, you are perfectly ready to write your first type-logical grammar! Assign types to the words in your lexicon, and decide whether any extra structural reasoning is required. The type-inference machine of TLG does the rest. 2. The Grail theorem prover The Grail system, developed by the second author, is a general grammar development environment for designing and prototyping type-logical grammars. We refer the reader to [4] for a short description of the system, which is available 4 M. Moortgat, R. Moot and D. Oehrle the np/n book n that (n\n)/(s/np) knuth np wrote (np\s)/np [p1 ` np] 1 wrote ◦ p1 ` np\s [/E] knuth ◦ (wrote ◦ p1) ` s [\E] (knuth ◦ wrote) ◦ p1 ` s [P2] knuth ◦ wrote ` s/np [/I ] that ◦ (knuth ◦ wrote) ` n\n [/E] book ◦ (that ◦ (knuth ◦ wrote)) ` n [\E] the ◦ (book ◦ (that ◦ (knuth ◦ wrote))) ` np [/E] Figure 1: Natural deduction derivation: logical and structural rules. under the GNU General Public License agreement from ftp://ftp.let.uu. nl/pub/users/moot. The original Grail implementation presupposes a unix environment. It uses the following software components: — SICStus Prolog: the programming language for the kernel; — Tcl/Tk for the graphical user interface; — a standard teTeX environment for the visualization/export of derivations. In a Grail session, the user can design a grammar fragment, which in the TLG setting comes down to the following: — assign formulas (and meaning programs) to words in the lexicon or edit formulas already in the lexicon, — add or modify structural rewrite rules, — and finally, to run the theorem prover on sample expressions to see which expressions are grammatical in the specified grammar fragment by trying to find a derivation for them. The theorem prover can operate either automatically or interactively. In interactive mode, the user decides which of several possible subproofs to try first, or to abandon subproofs which the user knows cannot succeed, even though the theorem prover might take a very long time to discover that. Another possibility is that the user is only interested in some of the proofs. The interactive debugger is based on proof net technology — a prooftheoretic framework specially Type-logical Grammars and LTEX 5 Figure 2: The proof net debugger window 6 M. Moortgat, R. Moot and D. Oehrle designed for resource-sensitive deductive systems. Figure 2 shows a proof net for the derivation of the sentence ‘Knuth surpassed himself’. Lexical formulas are unfolded up to atomic literals. Literals are signed with an input or output polarity. A net is wellformed if there is a matching of literals with opposite polarities, and if some extra graphtheoretic conditions are met. The interested reader is referred to [5] for details. When successful derivations have been found, Grail stores these proof objects in an internal representation format. An example is given in Figure 3. The internal format is not for human consumption, but it contains all the necessary information for the conversion of the proof objects to natural deductions in the form of LTEX output. The internal representation of derivations may look forbidding; yet, the structure is basically simple. A proof object consists of a conclusion together with a list of proof objects which validate this conclusion. LTEX output is produced by recursively traversing this structure. A number of parameters guide the production of the LTEX proofs. The output parameters include, for example, a choice to have proofs presented in the tree-like Prawitz output format, as shown in Figure 4, or in the list-like Fitch output format, as shown in Figure 5. The Fitch list format is handy when the user chooses to include the meaning assembly in a derivation: tree format quickly exceeds the printed page format in these cases. An extract of the LTEX source for Figure 4 is shown in Figure 6. The Prawitz derivations are typeset using the proof.sty package of [8]. The \infer command from this package takes an optional rule label, conclusion, and (&separated) premise(s) as arguments. The subscripts and superscripts on rule labels and connectives remain empty in this example. They are for extra control information, which the user can enable or disable. The reader will have noticed that core notion of ‘proof’ for type-logical grammatical derivations is inherently dynamic: a derivations is a sequence of inference steps, leading from axioms (lexical assumptions) to the desired conclusion. This naturally suggests a dynamic display format, with a stepwise unfolding of the proof object. The tools we use for dynamic display were developed by Bernhard Fisseni, as part of a student project in our computational linguistics program. The basis is an expanded version of \infer from proof.sty, taking advantage of the \stepwise family of commands from the texpower package of [2]. The kernel computes the sequencing order of derivational steps from the internal proof object. The choice for bottom-up or topdown unfolding is left to the user. In the first case, one assembles the desired end result starting from lexical assumptions; the second option decomposes the end result in its atomic (lexical) parts. For an illustration, we refer the reader T y peogica l G ra m m a rs a n d L A T E X 7 N: 1 ; Mean: $\iota$(^K.(write(knuth,K) & book(K))) ; rule(dre([]),(the *[](book *[](that *[](knuth *[]wrote)))),np,B(D(^E.H(E)(G))(C)), [rule(lex,the,(np /[] n),B,[]), rule(dle([]),(book *[] (that *[] (knuth *[] wrote))),n,D(^E.H(E)(G))(C), [rule(lex,book,n,C,[]), rule(dre([]),(that *[] (knuth *[] wrote)),(n \[] n),D(^E.H(E)(G)), [rule(lex,that,((n \[] n) /[] (s /[] np)),D,[]), rule(dri([],1),(knuth *[] wrote),(s /[] np),^E.H(E)(G), [rule(P2,((knuth *[] wrote) *[] E),s,H(E)(G), [rule(dle([]),(knuth *[] (wrote *[] E)),s,H(E)(G), [rule(lex,knuth,np,G,[]), rule(dre([]),(wrote *[] E),(np \[] s),H(E), [rule(lex,wrote,((np \[] s) /[] np),H,[]), rule(hyp(1),E,np,E,[])])])])])])])]), Con: [],Subst: [$\iota$,book,3-^I.^J.^K.(I(K) & J(K)),knuth,write], NV 8 Figure 3: Internal representation for the derivation of Figure 1 8 M. Moortgat, R. Moot and D. Oehrle

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Textual vs. Compound Input Enhancement in Developing Grammar Ability

The present study investigated comparatively the impact of two types of input enhancement (i.e. textual vs. compound enhancement) on developing grammar ability in Iranian EFL setting. Sixty-five female secondary high school students were selected as a homogenous sample out of about a 100-member population based on Nelson language proficiency test. Then, their grammar ability was measured based ...

متن کامل

Concept-based Instruction and Teaching English Tense and Aspect to Iranian School Learners

The present study examines the role of Gal’perin’s Concept-based Instruction (CBI) as a pedagogical approach in teaching cognitive grammar-based (CG-based) concepts of tense and aspect to EFL students. Following the sociocultural theory of L2 Acquisition (SCT), arming L2 learners with scientific concepts can lead to L2 development by deepening their understanding and raising awareness of L2 str...

متن کامل

The Effects of Gradual and Indirect Feedback on EFL Learners' Grammar Development and Beliefs

Corrective feedback has received significant attention in English language teaching, and its role has been highly substantial. Considering the importance of corrective feedback in EFL classes, this study aimed at finding the effects of indirect and gradual CF on Iranian EFL learners' grammatical development and their beliefs toward CF. Twenty EFL learners, meeting the c...

متن کامل

Tracing an EFL Teacher and Learners’ Cognitive and Emotional Development Using Dialogic Mediation: A Sociocultural Perspective

The purpose of the study was to investigate the effect of mediation on the development of a novice teacher and in turn the effect of transformation of the teacher on the behaviors and emotions of the learners using Vygotskian sociocultural view of learning. For this purpose, a novice teacher teaching the general English course at an Iranian university was selected. To develop an understanding o...

متن کامل

The VISL System: Research and applicative aspects of IT-based learning

The paper presents an integrated inter active user interface for teaching grammatical analysis through the Internet medium (Visual Interactive Syntax Learning), developed at Southern Denmark University, covering 14 different languages , half of which are supported by live grammatical analysis of running text. For reasons of robustness, efficiency and correctness, the system's internal tools are...

متن کامل

The Impact of Structured Input-based Tasks on L2 Learners’ Grammar Learning

Abstract Task-based language teaching has received increased attention in second language research. However, the combination of structured input-based approach and task-based language teaching has not been examined in relation to L2 grammar learning. To address this gap, the present study investigated how the structured input-based tasks with and without explicit information impacted learners’ ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002