Research Report Mining Sequential Patterns: Generalizations and Performance Improvements Limited Distribution Notice Mining Sequential Patterns: Generalizations and Performance Improvements

نویسندگان

  • Ramakrishnan Srikant
  • Rakesh Agrawal
چکیده

This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and speciic requests. After outside publication, requests should be lled only by reprints or legally obtained copies of the article (e.g., payment of royalties). ABSTRACT: The problem of mining sequential patterns was recently introduced in AS95]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-speciied minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. An example of a sequential pattern is \5% of customers bought`Foundation' and`Ringworld' in one transaction , followed by`Second Foundation' in a later transaction". We generalize the problem as follows. First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern. Second, we relax the restriction that the items in an element of a sequential pattern must come from the same transaction, instead allowing the items to be present in a set of transactions whose transaction-times are within a user-speciied time window. Third, given a user-deened taxonomy (is-a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy. We present GSP, a new algorithm that discovers these generalized sequential patterns. Empirical evaluation using synthetic and real-life data indicates that GSP is much faster than the AprioriAll algorithm presented in AS95]. GSP scales linearly with the number of data-sequences, and has very good scale-up properties with respect to the average data-sequence size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Sequential Patterns: Generalizations and Performance Improvements

The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-speci ed minimum support, where the support of a pattern is the number of data-sequences that contain the p...

متن کامل

Research Report MINING SEQUENTIAL PATTERNS: GENERALIZATIONS AND PERFORMANCE IMPROVEMENTS

The problem of mining sequential patterns was recently introduced in [AS95]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-speci ed minimum support, where the support of a pattern is the number of data-sequences that contain th...

متن کامل

Mining and Ranking Generators of Sequential Patterns

Sequential pattern mining first proposed by Agrawal and Srikant has received intensive research due to its wide range applicability in many real-life domains. Various improvements have been proposed which include mining a closed set of sequential patterns. Sequential patterns supported by the same sequences in the database can be considered as belonging to an equivalence class. Each equivalence...

متن کامل

Performance Improvements and Efficient Approach for Mining Periodic Sequential Access Patterns

Surfing the Web has become an important daily activity for many users. Discovering and understanding web users’ surfing behavior are essential for the development of successful web monitoring and recommendation systems. To capture users’ web access behavior, one promising approach is web usage mining which discovers interesting and frequent user access patterns from web usage logs. Web usage mi...

متن کامل

Mining Sequential Patterns Across Data Streams

There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this paper, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an efficient algorithm MILE to manage the mining process. The propose...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996