Opinion Formation Inspired Search Strategies for Feature Selection

نویسندگان

Martin Macaš

Martin Saska

چکیده

The problem of finding a feature subset that corresponds to a sufficiently small classification error is intensively studied in pattern recognition. This thesis focuses on wrapper approach to feature selection, which uses an estimate of classification error as the search criterion. A novel family of optimization methods is introduced, which is created through simple modifications of opinion formation models known from computational psychology. The novel algorithmic framework called Social Impact Theory based Optimization (SITO) is based on a set of simple agents. Each agent persuades the others and tries to change their attitudes about the solution of an optimization problem. The optimization capabilities of the novel methods are experimentally demonstrated. Further, one particular instance of the SITO framework called simplified SITO (SSITO) is applied on the wrapper feature selection and compared to other techniques. Although it is very simple and has a small number of parameters, the SSITO method finds significantly better criterion value than the other common optimization techniques. Although its superiority from the testing error point of view is statistically significant only when compared to sequential forward floating selection, it is evident that it also outperforms the other techniques. The difference between criterion values and testing error values is caused by inaccuracy of the error estimate as the search criterion. Therefore, the second part of the thesis mainly focuses on some improvements of the feature selection criterion. We consider in particular the case of a small sample size with a few hundred instances, which is common in many practical applications. We propose the use of complete cross-validation for 1-nearest neighbor classifier as the feature selection criterion. Furthermore, a technique for calculating the complete bootstrap for 1-nearest-neighbor classifier is derived. The complete bootstrap and the complete cross-validation error estimates with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and crossvalidation in combination with different optimization techniques. It is shown that the novel estimates significantly improve the testing errors. The second improvement is the reduced initialization, which causes that evaluated subsets of features are smaller, which leads to faster feature selection. Finally, it must be mentioned that we also demonstrate the benefits and properties of our approaches on two important and novel real-world biomedical applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Feature Selection Ensemble

Many strategies have been exploited for the task of feature selection, in an effort to identify more compact and better quality feature subsets. Such techniques typically involve the use of an individual feature significance evaluation, or a measurement of feature subset consistency, that work together with a search algorithm in order to determine a quality subset. Feature selection ensemble ai...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Opinion Formation Inspired Search Strategies for Feature Selection

نویسندگان

چکیده

منابع مشابه

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

Feature Selection Ensemble

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

عنوان ژورنال:

اشتراک گذاری