Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics

نویسندگان

  • Hendrik Blockeel
  • Leander Schietgat
  • Jan Struyf
  • Saso Dzeroski
  • Amanda Clare
چکیده

Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Organization Workshop Co-chairs Program Committee Additional Referees an Ensemble Method for Multi-label Classification Using a Transportation Model 49 Ignoring Co-occurring Sources in Learning from Multi-labeled Data Leads Evaluation of Distance Measures for Hierarchical Multi-label Classification in Functional Genomics

Hierarchical multi-label classification (HMLC) is a variant of classification where instances may belong to multiple classes that are organized in a hierarchy. The approach we used is based on decision trees and is set in the predictive clustering trees framework (PCTs), which is implemented in the CLUS system. In this work, we are investigating how different distance measures for hierarchies i...

متن کامل

Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction

The genome-wide hierarchical classification of gene functions, using biomolecular data from high-throughput biotechnologies, is one of the central topics in bioinformatics and functional genomics. In this paper we present a multilabel hierarchical algorithm inspired by the “true path rule” that governs both the Gene Ontology and the Functional Catalogue (FunCat). In particular we propose an enh...

متن کامل

Feature ranking for multi-label classification using predictive clustering trees

In this work, we present a feature ranking method for multilabel data. The method is motivated by the the practically relevant multilabel applications, such as semantic annotation of images and videos, functional genomics, music and text categorization etc. We propose a feature ranking method based on random forests. Considering the success of the feature ranking using random forest in the task...

متن کامل

Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labe...

متن کامل

Hierarchical Multilabel Classification Trees for Gene Function Prediction (Extended Abstract)

Prediction of gene function is a so-called hierarchical multilabel classification (HMC) task: a single instance can be labelled with multiple classes rather than just one (i.e., a gene can have multiple functions), and these classes are organized in a hierarchy. Many machine learning methods focus on learning predictive models with a single target variable. One can then learn to predict all cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006