IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

نویسندگان

چکیده مقاله:

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomplete and redundant data. These methods are often applied in the pre-processing phase of machine learning algorithms. Three types of data reduction methods can be applied to data: 1. Feature reduction.2. Instance reduction: 3. Discretizing feature values. In this paper, a new algorithm, based on ReliefF, is introduced to decrease both instances and features. The proposed algorithm can run on nominal and numeric features and on data sets with missing values. In addition, in this algorithm, the selection of instances from each class is proportional to the prior probability of classes. The proposed algorithm can run parallel on a multi-core CPU, which decreases the runtime significantly and has the ability to run on big data sets. One type of instance reduction is instance selection. There are many issues in designing instance selection algorithms such as representing the reduced set, how to make a subset of instances, choosing distance function, evaluating designed reduction algorithm, the size of reduced data set and determining the critical and border instances. There are three ways of creating a subset of instances. 1) Incremental. 2) Decremental. 3) Batch. In this paper, we use the batch way for selecting instances. Another important issue is measuring the similarity of instances by a distance function. We use Jaccard index and Manhattan distance for measuring. Also, the decision on how many and what kind of instances should be removed and which must remain is another important issue. The goal of this paper is reducing the size of the stored set of instances while maintaining the quality of dataset. So, we remove very similar and non-border instances in terms of the specified reduction rate. The other type of data reduction that is performed in our algorithm is feature selection. Feature selection methods divide into three categories: wrapper methods, filter methods, and hybrid methods. Many feature selection algorithms are introduced. According to many parameters, these algorithms are divided into different categories; For example, based on the search type for the optimal subset of the features, they can be categorized into three categories: Exponential Search, Sequential Search, and Random Search. Also, an assessment of a feature or a subset of features is done to measure its usefulness and relevance by the evaluation measures that are categorized into various metrics such as distance, accuracy, consistency, information, etc. ReliefF is a feature selection algorithm used for calculating a weight for each feature and ranking features. But this paper is used ReliefF for ranking instances and features. This algorithm works as follows: First, the nearest neighbors of each instances are found. Then, based on the evaluation function, for each instance and feature, a weight is calculated, and eventually, the features and instances that are more weighed are retained and the rest are eliminated. IFSB-ReliefF (Instance and Feature Selection Based on ReliefF) algorithm is tested on two datasets and then C4.5 algorithm classifies the reduced data. Finally, the obtained results from the classification of reduced data sets are compared with the results of some instance and feature selection algorithms that are run separately.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ReliefF-based Multi-label Feature Selection

In recent years, multi-label learning has been used to deal with data attributed to multiple labels simultaneously and has been increasingly applied to various applications. As many other machine learning tasks, multi-label learning also suffers from the curse of dimensionality; so extracting good features using multiple labels of the datasets becomes an important step prior to classification. ...

متن کامل

ReliefF-MI: An extension of ReliefF to multiple instance learning

In machine learning the so-called curse of dimensionality, pertinent to many classification algorithms, denotes the drastic increase in computational complexity and classification error with data having a great number of dimensions. In this context, feature selection techniques try to reduce dimensionality finding a new more compact representation of instances selecting the most informative fea...

متن کامل

ReliefMSS: a variation on a feature ranking ReliefF algorithm

Relief algorithms are successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation. In this paper, we propose a variant of ReliefF algorithm: ReliefMSS. We analyse the ReliefMSS parameters and compare ReliefF and ReliefMSS performances as regards the number of iterations, the number of random attribut...

متن کامل

ReliefF-Based EEG Sensor Selection Methods for Emotion Recognition

Electroencephalogram (EEG) signals recorded from sensor electrodes on the scalp can directly detect the brain dynamics in response to different emotional states. Emotion recognition from EEG signals has attracted broad attention, partly due to the rapid development of wearable computing and the needs of a more immersive human-computer interface (HCI) environment. To improve the recognition perf...

متن کامل

A Computer-Aided Diagnosis System for Dynamic Contrast-Enhanced MR Images Based on Level Set Segmentation and ReliefF Feature Selection

This study established a fully automated computer-aided diagnosis (CAD) system for the classification of malignant and benign masses via breast magnetic resonance imaging (BMRI). A breast segmentation method consisting of a preprocessing step to identify the air-breast interfacing boundary and curve fitting for chest wall line (CWL) segmentation was included in the proposed CAD system. The Chan...

متن کامل

Malicious Detection Based on ReliefF and Boosting Multidimensional Features

—Aiming at the problem of large overhead and low accuracy on the identification of obfuscated and malicious code, a new algorithm is proposed to detect malicious code by identifying multidimensional features based on ReliefF and Boosting techniques. After a disassembly analysis and static analysis for the clustered malicious code families, the algorithm extracts features from four dimensions: ...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 17  شماره 4

صفحات  49- 66

تاریخ انتشار 2021-02

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023