Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)
نویسندگان
چکیده
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.
منابع مشابه
INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles
BACKGROUND The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necess...
متن کاملA Simple Genome Walking Strategy to Isolate Unknown Genomic Regions Using Long Primer and RAPD Primer
Background: Genome walking is a DNA-cloning methodology that is used to isolate unknown genomic regions adjacent to known sequences. However, the existing genome-walking methods have their own limitations. Objectives: Our aim was to provide a simple and efficient genome-walking technology. Material and Methods: In this paper, we dev...
متن کاملIdentification of the first Transgenic Aquatic Animal in Iran by PCR-Based Method and Protein Analysis
In the recent years, there is evidence of training a red type of zebrafish which differs from wild-type in body color. There is not any document how it reaches to the ornamental fish farms of Iran but at first, it was a doubt it belongs to a morphotype or genetic modification (GM). First of all, a set primer was designed to validate zebrafish species. Mitochondrial 16srDNA was selected and ampl...
متن کاملNumeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملGenomic and Transcriptomic Evidence for Carbohydrate Consumption among Microorganisms in a Cold Seep Brine Pool
The detailed lifestyle of microorganisms in deep-sea brine environments remains largely unexplored. Using a carefully calibrated genome binning approach, we reconstructed partial to nearly-complete genomes of 51 microorganisms in biofilms from the Thuwal cold seep brine pool of the Red Sea. The recovered metagenome-assembled genomes (MAGs) belong to six different phyla: Actinobacteria, Proteoba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2014