CESSM : Collaborative Evaluation of Semantic Similarity Measures

نویسندگان

  • Catia Pesquita
  • Delphine Pessoa
  • Daniel Faria
  • Francisco M. Couto
چکیده

The application of semantic similarity measures to proteins annotated with Gene Ontology terms has become a common method in bioinformatics. However, the evaluation of these measures is still challenging, since no common standard of evaluation exists. We present an online tool for the automated evaluation of GO-based semantic similarity measures, CESSM, that enables the comparison of new measures against previously published ones considering their relation to sequence, Pfam and EC similarity. The tool also has a collaborative component, by which the authors of published measures can contribute to the enrichment of the evaluation by providing their own results. CESSM is freely available at http://xldb.di.fc.ul.pt/tools/cessm/ BACKGROUND The creation of the Gene Ontology (GO) [1], a controlled vocabulary for the description of gene product functions, triggered the development of computational methods that take advantage of its structured information. One such method is the application of semantic similarity measures to GO terms, whereby the similarity between two terms is calculated according to their relationship in the ontology. Likewise, semantic similarity measures can also be used to calculate the similarity between gene products, provided they are annotated with GO terms. Several semantic similarity measures based on GO have been proposed in recent years [2-12], but the evaluation of their performance has been identified as a relevant problem in the field [13]. Various evaluation strategies have been proposed, including the investigation of the relation between the semantic similarity measure and other gene product or protein similarities (such as sequence[2-7], family [12,7] or expression similarity [8,14,15]); and of the feasibility to use semantic similarity measures in such distinct scenarios as the prediction of subnuclear location [16], the ability to characterize human regulatory pathways [17], or the performance in gene clustering [9,10]. This multiplicity of evaluation strategies arises from the lack of a gold standard suitable to this scenario, driving researchers to use diverse data sets, to which they apply distinct evaluation strategies, thus rendering comparison among different works unfeasible. We present an online tool CESSM (Collaborative Evaluation of Semantic Similarity Measures) for the collaborative and automated evaluation of semantic similarity measures in the context of GO. CESSM allows researchers to compare the performance of their novel semantic similarity measures against several existing ones, using the same protein and annotation dataset and according to three distinct aspects: relation with sequence, EC class and Pfam family similarities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disjunctive shared information between ontology concepts: application to Gene Ontology

BACKGROUND The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctiv...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method

BACKGROUND Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant ter...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009