Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
نویسندگان
چکیده
BACKGROUND Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.
منابع مشابه
The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines
With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, t...
متن کاملAutomated Gene Ontology annotation for anonymous sequence data
Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequen...
متن کاملImproved Biomolecular Annotation Prediction through Weighting Scheme Methods
Biomolecular annotation databases are very important in modern biomedical and genetic research. Correct interpretation of biological experiments depends on consistency and completeness of such databases. To improve their quality and coverage, computational methods that are able to supply a ranked list of predicted gene or gene products annotations are extremely useful. In this paper we propose ...
متن کاملA new method to measure the semantic similarity of GO terms
MOTIVATION Although controlled biochemical or biological vocabularies, such as Gene Ontology (GO) (http://www.geneontology.org), address the need for consistent descriptions of genes in different data sources, there is still no effective method to determine the functional similarities of genes based on gene annotation information from heterogeneous data sources. RESULTS To address this critic...
متن کاملInvestigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation
MOTIVATION Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS ONE
دوره 4 شماره
صفحات -
تاریخ انتشار 2009