Accuracy Based Feature Ranking Metric for Multi-Label Text Classification

نویسندگان

  • Muhammad Nabeel Asim
  • Abdur Rehman
  • Umar Shoaib
چکیده

In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label text classification is used to reduce the dimensionality of datasets by filtering out irrelevant and redundant features. The process of dimensionality reduction in multi-label classification is a different scenario because here features may belong to more then one classes. Label and instance space is rapidly increasing by the grandiose of Internet, which is challenging for Multi-Label Classification (MLC). Feature selection is crucial for reduction of data in MLC. Method adaptation and data set transformation are two techniques used to select features in multi label text classification. In this paper, we present dataset transformation technique to reduce the dimensionality of multi-label text data. We used two model transformation approaches: Binary Relevance, and Label Power set for transformation of data from multi-label to single label. The Process of feature selection is done using filter approach which utilizes the data to decide the importance of features without applying learning algorithm. In this paper we used a simple measure (ACC2) for feature selection in multi-label text data. We used problem transformation approach to apply single label feature selection measures on multi-label text data; did the comparison of ACC2 with two other feature selection methods, information gain (IG) and Relief measure. Experimentation is done on three bench mark datasets and their empirical evaluation results are shown. ACC2 is found to perform better than IG and Relief in 80% cases of our experiments. Keywords—Binary relevance (BR); label powerset (LP); ACC2; information gain (IG); Relief-F (RF)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-topic Text Categorization Based on Ranking Approach

This paper is devoted to the multi-topic (multilabel) text classification problem. We propose two methods for reduction from ranking to the multi-label case. Unlike existing multi-label classification methods based on reduction from ranking problem, where the complex classification (threshold) function is being defined on the input feature space, in our approach we propose the construction of s...

متن کامل

Feature ranking for multi-label classification using predictive clustering trees

In this work, we present a feature ranking method for multilabel data. The method is motivated by the the practically relevant multilabel applications, such as semantic annotation of images and videos, functional genomics, music and text categorization etc. We propose a feature ranking method based on random forests. Considering the success of the feature ranking using random forest in the task...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017