Extending Regular Expressions with Context Operators and Parse Extraction

نویسنده

  • Steven M. Kearns
چکیده

Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one-pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to find out the Zip code part of the address. We present an elegant, efficient algorithm for extracting a parse after matching with a finite automaton. In addition, we extend the regular expression language to include new operators for matching arbitrary left context and single character right context. The extended language can be matched as efficiently as the usual regular expression language, but is more expressive. Finally, we suggest how to apply the matching algorithms to match regular expressions containing arbitrary right context and single character left context. In effect, this allows one to specify patterns that seem to require an unlimited amount of look-ahead to match.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Computational Interpretation of Context-Free Expressions

We phrase parsing with context-free expressions as a type inhabitation problem where values are parse trees and types are contextfree expressions. We first show how containment among context-free and regular expressions can be reduced to a reachability problem by using a canonical representation of states. The proofs-as-programs principle yields a computational interpretation of the reachabilit...

متن کامل

RegReg: a Lightweight Generator of Robust Parsers for Irregular Languages

In reverse engineering, parsing may be partially done to extract lightweight source models. Parsing code containing preprocessing directives, syntactical errors and embedded languages is a difficult task using context-free grammars. Several researchers have proposed some form of lexical analyzer to parse such code. We present a lightweight tool, called RegReg, based on a hierarchy of lexers des...

متن کامل

Yacc is dead

We present two novel approaches to parsing context-free languages. The first approach is based on an extension of Brzozowski’s derivative from regular expressions to context-free grammars. The second approach is based on a generalization of the derivative to parser combinators. The payoff of these techniques is a small (less than 250 lines of code), easy-to-implement parsing library capable of ...

متن کامل

Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information

This paper proposes a tree kernel with contextsensitive structured parse tree information for relation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a dynamic context-sensitive tree span for relation extraction by extending the widely-used Shortest Path-enclosed Tree (SPT) to include necessary conte...

متن کامل

Automata Construction for PSL

The language PSL [1] is a temporal logic standardized by the Accellera standards organization and currently undergoing the process of becoming an IEEE standard. The core of PSL, denoted here LTL WR, is an extension of the linear temporal logic LTL. The extension takes two orthogonal directions. In one direction the logic is interpreted over finite, possibly truncated, as well as infinite words....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 1991