EvoMiner: Frequent Subtree Mining in Phylogenetic Databases Technical Report #11-08, Dept. of Computer Science, Iowa State University

نویسندگان

  • Akshay Deepak
  • David Fernández-Baca
  • Srikanta Tirthapura
  • Michael J Sanderson
  • Michelle M McMahon
چکیده

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-speci c constant-time candidate generation scheme, an e cient ngerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over Phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enumerating All Maximal Frequent Subtrees

Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identifies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has pr...

متن کامل

Preserving Separation of Concerns Through Compilation

Current aspect-oriented (AO) compilation techniques fail to preserve the separation of concerns for postcompilation phases. At the minimum, it makes efficient incremental compilation and unit testing of AO programs challenging. The contribution of this work is an improved approach for aspect-oriented compilation. Our approach rests on a new interface between the AO high-level language (HLL) com...

متن کامل

Frequent Subtree Mining - An Overview

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more diffic...

متن کامل

A Bibliography and Index of Our Works on Belief Data: Concept of Error and Multilevel Security

In 1988 we initiated our work on belief data. The work proceeded in two phases: in the first phase we formalized the concept of error in everyday record keeping, and in the second phase we considered multilevel security. The purpose of this report is to create an awareness about our works on belief data and to serve as a guide for the following manuscripts. The first two manuscripts are on the ...

متن کامل

Optimal and Approximate Approaches for Selecting Proxy Agents in Mobile Network Backbones

Selecting Proxy Agents in Mobile Network Backbones Ahmed Kamal Dept. of Electrical & Comp. Eng Iowa State University Ames, Iowa 50011-3060 [email protected] Hesham El-Rewini Department of Computer Science & Engineering Southern Methodist University Dallas, Texas 75275-0122 [email protected] Raza Ul-Mustafa Dept. of Electrical & Comp. Eng Iowa State University Ames, Iowa 50011-3060 raza@iastat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011