Some Limitations of Public Sequence Data for Phylogenetic Inference (in Plants)
نویسندگان
چکیده
The GenBank database contains essentially all of the nucleotide sequence data generated for published molecular systematic studies, but for the majority of taxa these data remain sparse. GenBank has value for phylogenetic methods that leverage data-mining and rapidly improving computational methods, but the limits imposed by the sparse structure of the data are not well understood. Here we present a tree representing 13,093 land plant genera--an estimated 80% of extant plant diversity--to illustrate the potential of public sequence data for broad phylogenetic inference in plants, and we explore the limits to inference imposed by the structure of these data using theoretical foundations from phylogenetic data decisiveness. We find that despite very high levels of missing data (over 96%), the present data retain the potential to inform over 86.3% of all possible phylogenetic relationships. Most of these relationships, however, are informed by small amounts of data--approximately half are informed by fewer than four loci, and more than 99% are informed by fewer than fifteen. We also apply an information theoretic measure of branch support to assess the strength of phylogenetic signal in the data, revealing many poorly supported branches concentrated near the tips of the tree, where data are sparse and the limiting effects of this sparseness are stronger. We argue that limits to phylogenetic inference and signal imposed by low data coverage may pose significant challenges for comprehensive phylogenetic inference at the species level. Computational requirements provide additional limits for large reconstructions, but these may be overcome by methodological advances, whereas insufficient data coverage can only be remedied by additional sampling effort. We conclude that public databases have exceptional value for modern systematics and evolutionary biology, and that a continued emphasis on expanding taxonomic and genomic coverage will play a critical role in developing these resources to their full potential.
منابع مشابه
Ribosomal ITS sequences and plant phylogenetic inference.
One of the most popular sequences for phylogenetic inference at the generic and infrageneric levels in plants is the internal transcribed spacer (ITS) region of the 18S-5.8S-26S nuclear ribosomal cistron. The prominence of this source of nuclear DNA sequence data is underscored by a survey of phylogenetic publications involving comparisons at the genus level or below, which reveals that of 244 ...
متن کاملIdentification of Phomopsis Species on Some Ornamental and Forest Plants in Iran on the Basis of the Morphological and Molecular
The Phomopsis is a genus of imperfect plant pathogenic fungus whose hosts comprise several species in different regions of the world, such as grapes, soybean, acacia, hollyhock, velvetleaf, and several other plants. In this study, samples collected from hollyhock, velvetleaf, purple bauhinia, and acacia plants suspected to be infected with Phomopsis fungi. Samples were culture...
متن کاملIncongruence between primary sequence data and the distribution of a mitochondrial atp1 group II intron among ferns and horsetails.
Using DNA sequence data from multiple genes (often from more than one genome compartment) to reconstruct phylogenetic relationships has become routine. Augmenting this approach with genomic structural characters (e.g., intron gain and loss, changes in gene order) as these data become available from comparative studies already has provided critical insight into some long-standing questions about...
متن کاملPhylogenetic relationships among seed plants: Persistent questions and the limits of molecular data.
Trees inferred from DNA sequence data provide only limited insight into the phylogeny of seed plants because the living lineages (cycads, Ginkgo, conifers, gnetophytes, and angiosperms) represent fewer than half of the major lineages that have been detected in the fossil record. Nevertheless, phylogenetic trees of living seed plants inferred from sequence data can provide a test of relationship...
متن کاملSubgeneric classification of Linaria (Plantaginaceae; Antirrhineae): molecular phylogeny and morphology revisited
Linaria Mill. (Plantaginaceae) with about 160 spp. is the largest genus of the tribe Antirrhineae. We conducted phylogenetic analyses of nuclear ribosomal DNA internal transcribed spacer region (ITS) and chloroplast DNA (rpl32-trnL) sequence data to test the monophyly of currently recognized sections in Linaria. For this purpose 86 species representing seven sections of Linaria and one species ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 9 شماره
صفحات -
تاریخ انتشار 2014