Using PQ Trees for Comparative Genomics
نویسندگان
چکیده
Permutations on strings representing gene clusters on genomes have been studied earlier in [18, 14, 3, 12, 17] and the idea of a maximal permutation pattern was introduced in [12]. In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees [6]: this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal πpattern [12] and each subgraph of the PQ tree corresponds to a non-maximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome data sets. In our analysis of the whole genomes of human and rat we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, and, with E Coli K-12 and B Subtilis genomes we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters. Further, we show specific instances of functionally related genes in the two cases.
منابع مشابه
Breakpoint Distance and PQ-Trees
The PQ-tree is a fundamental data structure that can encode large sets of permutations. It has recently been used in comparative genomics to model ancestral genomes with some uncertainty: given a phylogeny for some species, extant genomes are represented by permutations on the leaves of the tree, and each internal node in the phylogenetic tree represents an extinct ancestral genome, represented...
متن کاملRecognition of Multiple PQ Issues using Modified EMD and Neural Network Classifier
This paper presents a new framework based on modified EMD method for detection of single and multiple PQ issues. In modified EMD, DWT precedes traditional EMD process. This scheme makes EMD better by eliminating the mode mixing problem. This is a two step algorithm; in the first step, input PQ signal is decomposed in low and high frequency components using DWT. In the second stage, the low freq...
متن کامل8 Comparative Genomics
E. R # Sp Data Growth over the Years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Taxonomy Analysis, Most Sequenced Phyla and Genera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Basic Genome Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Thousands of Genome Sequences . . . . . . . . . . . . . ....
متن کاملStatistical and Combinatorial Aspects of Comparative Genomics*
This document presents a survey of the statistical and combinatorial aspects of four areas of comparative genomics: gene order based measures of evolutionary distances between species, construction of phylogenetic trees, detection of horizontal transfer of genes, and detection of ancient whole genome duplications.
متن کاملAlgorithms for Testing and Embedding Planar Graphs
2 Embedding graphs into planarity 3 2.1 embedding algorithms donot use PQ-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 A planarity embedding algorithm based on the Kuratowski theorem . . . . . . . . 3 2.1.2 An embedding algorithm based on open ear decomposition . . . . . . . . . . . . . . 3 2.1.3 A simplified o (n) planar embedding algorithm for biconnected graphs . . ....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005