The Gene Ontology Categorizer
نویسندگان
چکیده
The Gene Ontology Categorizer, developed jointly by the Los Alamos National Laboratory and Procter & Gamble Corp., provides a capability for the categorization task in the Gene Ontology (GO): given a list of genes of interest, what are the best nodes of the GO to summarize or categorize that list? The motivating question is from a drug discovery process, where after some gene expression analysis experiment, we wish to understand the overall effect of some cell treatment or condition by identifying 'where' in the GO the differentially expressed genes fall: 'clustered' together in one place? in two places? uniformly spread throughout the GO? 'high', or 'low'? In order to address this need, we view bio-ontologies more as combinatorially structured databases than facilities for logical inference, and draw on the discrete mathematics of finite partially ordered sets (posets) to develop data representation and algorithms appropriate for the GO. In doing so, we have laid the foundations for a general set of methods to address not just the categorization task, but also other tasks (e.g. distances in ontologies and ontology merger and exchange) in both the GO and other bio-ontologies (such as the Enzyme Commission database or the MEdical Subject Headings) cast as hierarchically structured taxonomic knowledge systems.
منابع مشابه
POSOLE: Automated Ontological Annotation for Function Prediction
The system we have developed is called POSOLE, or the POSet Ontology Laboratory Environment. POSOLE consists of a set of modules supporting ontology representation, categorization of nodes in the ontology, and analysis. The analysis modules provide support for analysis of the ontological structure, the structure of input queries to the categorization module with respect to that structure, and t...
متن کاملAutomatic assignment of biomedical categories: toward a generic approach
MOTIVATION We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS In order to evaluate the robustness of our approach we t...
متن کاملPathogens and Genome Normalization for Literature-based Knowledge Discovery
We present a new approach for pathogens and gene product normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, Staphyloccocus aureus, ...) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase, ...
متن کاملIdentification and prioritization genes related to Hypercholesterolemia QTLs using gene ontology and protein interaction networks
Gene identification represents the first step to a better understanding of the physiological role of the underlying protein and disease pathways, which in turn serves as a starting point for developing therapeutic interventions. Familial hypercholesterolemia is a hereditary metabolic disorder characterized by high low-density lipoprotein cholesterol levels. Hypercholesterolemia is a quantitativ...
متن کاملToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
We report on the original implementation of named entity recognition (NER) modules based on an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), developed to perform biomedical documents classification and prioritization for the previous Biocreative campaign in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). ToxiCat NER modul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 Suppl 1 شماره
صفحات -
تاریخ انتشار 2004