A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach

نویسندگان

  • Newton Spolaôr
  • Everton Alvares Cherman
  • Maria Carolina Monard
  • Huei Diana Lee
چکیده

Feature selection is an important task in machine learning, which can effectively reduce the dataset dimensionality by removing irrelevant and/or redundant features. Although a large body of research deals with feature selection in single-label data, in which measures have been proposed to filter out irrelevant features, this is not the case for multi-label data. This work proposes multi-label feature selection methods which use the filter approach. To this end, two standard multi-label feature selection approaches, which transform the multi-label data into single-label data, are used. Besides these two problem transformation approaches, we use ReliefF and Information Gain to measure the goodness of features. This gives rise to four multi-label feature selection methods. A thorough experimental evaluation of these methods was carried out on 10 benchmark datasets. Results show that ReliefF is able to select fewer features without diminishing the quality of the classifiers constructed using the features selected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Accuracy Based Feature Ranking Metric for Multi-Label Text Classification

In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label text classification is used to reduce the dimensionality of datasets by filtering out irrelevant and redundant features. The process of dimensionality reduction in multi-label cla...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Feature Selection for Improving Multi-Label Classification using MEKA

The extensive dimensionality in multi-label classification can be overcome by selecting representative words that describe an instance and removing the redundant and insignificant ones. The popular technique of feature selection when applied reduces the size of the dataset and hence speeds up and improves the accuracy of the learning process of classification. This paper looks at the performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Electr. Notes Theor. Comput. Sci.

دوره 292  شماره 

صفحات  -

تاریخ انتشار 2013