Advances in the recovery of haplotypes from the metagenome
نویسندگان
چکیده
High-throughput DNA sequencing has enabled us to look beyond consensus reference sequences to the variation observed in sequences within organisms; their haplotypes. Recovery, or assembly of haplotypes has proved computationally difficult and there exist many probabilistic heuristics that attempt to recover the original haplotypes for a single organism of known ploidy. However, existing approaches make simplifications or assumptions that are easily violated when investigating sequence variation within a metagenome. We propose the metahaplome as the set of haplotypes for any particular genomic region of interest within a metagenomic data set and present Hansel and Gretel, a data structure and algorithm that together provide a proof of concept framework for the recovery of true haplotypes from a metagenomic data set. The algorithm performs incremental haplotype recovery, using smoothed Naive Bayes — a simple, efficient and effective method. Hansel and Gretel pose several advantages over existing solutions: the framework is capable of recovering haplotypes from metagenomes, does not require a priori knowledge about the input data, makes no assumptions regarding the distribution of alleles at variant sites, is robust to error, and uses all available evidence from aligned reads, without altering or discarding observed variation. We evaluate our approach using synthetic metahaplomes constructed from sets of real genes and show that up to 99% of SNPs on a haplotype can be correctly recovered from short reads that originate from a metagenomic data set.
منابع مشابه
Beta-Globin Gene Cluster Haplotypes in Iranian Sickle Cell Patients: Relation to Some Hematologic
Background: Sickle cell anemia is relatively common in Khuzestan province located in Southwest Iran. The characteristics of sickle cell disease in Iran are apparently different from other regions some of these characteristics might be related to β-chain haplotypes. The purpose of this study was to determine the frequency of β-chain haplotypes in 50 patients with homozygous sickle cell anemia in...
متن کاملPopulation structure and variation in Persian sturgeon (Acipenser percicus ) from the Caspian Sea as determind from mitochondrial DNA sequences of the control region
Mitochondria1 DNA (mtDNA) control region sequences were analyzed to evaluate the population genetic structure of Persian sturgeon (Acipenser persicus) in Caspian Sea. A total of 45 specimens were collected from the different locations of the Caspian Sea. MtDNA control region was amplified using PCR. Direct sequencing was performed according standard method. The results showed that 12 haplotypes...
متن کاملHAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome
SUMMARY Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), provid...
متن کاملStudy of mtDNA vatriation of Russian sturgeon population from the south Caspian Sea using RFLP analysis of PCR amplified ND5/6 gene regions
PCR-based mtDNA analysis (RFLP) was used for the study of population differentiation in the Russian sturgeon (Acipenser gueldenstaedti). The mtDNA ND5/6 gene regions were amplified using PCR techniques followed by RFLP analysis. 39 different composite haplotypes were detected among 62 specimens. 29 haplotypes were rare occuring only once in two regions (west and east areas of the Southern Caspi...
متن کاملStudy of mtDNA vatriation of Russian sturgeon population from the south Caspian Sea using RFLP analysis of PCR amplified ND5/6 gene regions
PCR-based mtDNA analysis (RFLP) was used for the study of population differentiation in the Russian sturgeon (Acipenser gueldenstaedti). The mtDNA ND5/6 gene regions were amplified using PCR techniques followed by RFLP analysis. 39 different composite haplotypes were detected among 62 specimens. 29 haplotypes were rare occuring only once in two regions (west and east areas of the Southern Caspi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016