Time/Space Efficient Filtering of Streaming XML Documents Using Incrementally Constructed Path-trie

نویسندگان

  • Kazuhito Hagio
  • Shuichi Mitarai
  • Akira Ishino
  • Masayuki Takeda
چکیده

In this paper, we present a streaming XML document filter named DXAXEN which is based on incremental construction of path-trie. It runs very fast, and processes a large number of XPath queries efficiently. Experimental comparison with XMLTK, a well-known streaming XML document filter, shows that DXAXEN is 2–5 times faster and needs only 5–20 percent of memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Filtering of XML Documents with XPath Expressions

We propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that make it especially attractive for largescale publish/subscribe systems. First, XTrie is designed to support effective filtering based on complex XPath expressions (as opposed to simple, single-path speci...

متن کامل

YFilter: Efficient and Scalable Filtering of XML Documents

Soon, much of the data exchanged over the Internet will be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called YFilter, which filters streaming XML documents according to XQuery or XPath queries that involve both path expressions and predicates. Unlike previous work, YFilter uses a novel NFA-based execution model. In this demon...

متن کامل

A New Approach to Filtering of XML Streaming Data

Information processing and retrieval in many applications needs filtering of the XML streams. A streamfilter system examines queries on a continuous stream of XML documents and delivers matched content to the user. This paper proposes a new algorithm named PFilter for stream filtering systems. The PFilter processes a large amount of XPath query expressions to provide the desired XML nodes. PFil...

متن کامل

Space-efficient Data Structures for Collections of Textual Data

This thesis focuses on the design of succinct and compressed data structures for collections of string-based data, specifically sequences of semi-structured documents in textual format, sets of strings, and sequences of strings. The study of such collections is motivated by a large number of applications both in theory and practice. For textual semi-structured data, we introduce the concept of ...

متن کامل

Workload-aware Trie Indexes for XML

Well-designed indexes can dramatically improve query performance. In the context of XML, structural indexes have proven to be particularly effective in supporting efficient XPath queries the core of all XML queries, by capturing the structural correlation between data components in an XML document. The duality of space and performance is an inevitable trade-off at the core of index design. It h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008