Unparsing Expressions with Prefix and Postfix Operators
نویسنده
چکیده
and concrete syntax for infix operators Type rator represents a binary infix operator, which has a text representation, a precedence, and an associativity: hinfixi type precedence = int datatype associativity = LEFT | RIGHT | NONASSOC type rator = string * precedence * associativity This ML code uses simple integers (int) to represent precedence, an enumeration to represent associativity, and a triple to represent an operator. (In the context of an ML type definition, a * does not represent multiplication; it connects elements of a tuple.) The more general unparser, presented below, shows how to use an arbitrary type, not just string, as an operator’s concrete representation. Precedence and associativity determine how infix expressions are parsed into trees, or equivalently, how they are parenthesized. For example, if operator has higher precedence than operator , then x y z= x (y z) and x y z = (x y) z. When two operators have the same precedence, associativity is used to disambiguate. If is left-associative, x y z = (x y) z; if it is right-associative, x y z= x (y z). Some languages have non-associative operators; if is non-associative, then (x y) z 6= x y z 6= x (y z); and x y z may be illegal. The comma that separates parameters in a C function call is an example of a non-associative operator. Many expressions are obtained by applying operators to other expressions, but there must always be indivisible constituents of expressions. We call such constituents atoms. They appear at the leaves of abstract syntax trees and as the non-operator tokens in concrete syntax. In most languages the atoms UNPARSING WITH PREFIX AND POSTFIX OPERATORS 5 include identifiers, integer and real literals, string literals, and so on. In our initial, simple model, an expression is either an atom or an infix operator applied to two expressions: hinfixi+ datatype ast = ATOM of string | APP of ast * rator * ast Type ast represents an expression’s abstract syntax tree. This ML datatype definition is something like a production in a context-free grammar; it gives two alternative ways of constructing an ast. These alternatives are always given names (ATOM and APP), which are called constructors. An ML datatype may contain any number of constructors, and there may be data associated with each one. Here, for an atom, we care only about the atom’s string representation. For an application, we want the operator and the asts representing the two operands. Because ast is used in its own definition, it is a recursive type, and values of type ast are trees. Type type ast represents the input to the unparser; we also need a type to represent the unparser’s output. For simplicity, we treat this output as a sequence of lexemes, where a lexeme represents an atom, an operator, or a parenthesis. Moreover, we undertake to emit only sequences in which atoms and operators alternate, or in which parenthesized sequences take the place of atoms. Let us call such a sequence an image and use a representation that forces it to satisfy the following grammar: image ) lexical-atom operator lexical-atom lexical-atom ) atom ( image ) In the corresponding ML, the sequence in braces becomes the type image’: hinfixi+ datatype image = IMAGE of lexical_atom * image’ and image’ = EOI (* end of image *) | INFIX of rator * lexical_atom * image’ and lexical_atom = LEX_ATOM of string | PARENS of image EOI represents the end of the image, and PARENS represents an image in parentheses. This representation enforces the invariants that expressions and operators alternate, and that the first and last elements of an image are expressions. Parsing infix expressions To be correct, an unparser must produce a sequence that parses back into the original abstract syntax tree. We develop an unparsing algorithm by thinking about parsing. To understand how to minimize parentheses, we need to consider where parentheses are needed to get the correct parse. Suppose we have an abstract syntax tree that has been obtained by parsing an image without any parentheses. Then wherever the syntax tree has an APP node whose parent is also an APP node, there are two cases: the child may be the left child or the right child of its parent:
منابع مشابه
The Representation Of Constituent Structures For Finite-State Parsing
A mixed prefix-postfix notation for representations of the constituent structures of the expressions of natural languages is proposed, which are of limited degree of center embedding if the original expressions are noncenter-embedding. The method of constructing these representations is applicable to expressions with center embedding, and results in representations which seem to reflect the way...
متن کاملReal-Time Recognition of Cyclic Strings by One-Way and Two-Way Cellular Automata
This paper discusses real-time language recognition by 1dimensional one-way cellular automata (OCAs) and two-way cellular automata (CAs), focusing on limitations of the parallel computation power. To clarify the limitations, we investigate real-time recognition of cyclic strings of the form uk with u ∈ {0, 1}+ and k ≥ 2. We show a version of pumping lemma for recognizing cyclic strings by OCAs,...
متن کاملPre-, In- and Postfix grammars for Symbolic Regression in Grammatical Evolution
Recent research has indicated that grammar design is an important consideration when using grammar-based Genetic Programming, particularly with respect to unintended biases that may arise through rule ordering or duplication. In this study we examine how the ordering of the elements during mapping can impact performance. Here we use to the standard GE depth-first mapper and compare the performa...
متن کاملA Mechanically Verified Compiling Specification for a Realistic Compiler∗
We report on a large formal verification effort in mechanically proving correct a compiling specification for a realistic bootstrap compiler from ComLisp (a subset of ANSI Common Lisp sufficiently expressive to serve as a compiler implementation language) to binary Transputer code using the PVS system. The compilation is carried out in five steps through a series of intermediate languages. In t...
متن کاملDirected Replacement
This paper introduces to the nite-state calculus a family of directed replace operators. In contrast to the simple replace expression, UPPER -> LOWER, de ned in Karttunen (1995), the new directed version, UPPER @-> LOWER, yields an unambiguous transducer if the lower language consists of a single string. It transduces the input string from left to right, making only the longest possible replace...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Softw., Pract. Exper.
دوره 28 شماره
صفحات -
تاریخ انتشار 1998