Opinion Formation Inspired Search Strategies for Feature Selection
نویسندگان
چکیده
The problem of finding a feature subset that corresponds to a sufficiently small classification error is intensively studied in pattern recognition. This thesis focuses on wrapper approach to feature selection, which uses an estimate of classification error as the search criterion. A novel family of optimization methods is introduced, which is created through simple modifications of opinion formation models known from computational psychology. The novel algorithmic framework called Social Impact Theory based Optimization (SITO) is based on a set of simple agents. Each agent persuades the others and tries to change their attitudes about the solution of an optimization problem. The optimization capabilities of the novel methods are experimentally demonstrated. Further, one particular instance of the SITO framework called simplified SITO (SSITO) is applied on the wrapper feature selection and compared to other techniques. Although it is very simple and has a small number of parameters, the SSITO method finds significantly better criterion value than the other common optimization techniques. Although its superiority from the testing error point of view is statistically significant only when compared to sequential forward floating selection, it is evident that it also outperforms the other techniques. The difference between criterion values and testing error values is caused by inaccuracy of the error estimate as the search criterion. Therefore, the second part of the thesis mainly focuses on some improvements of the feature selection criterion. We consider in particular the case of a small sample size with a few hundred instances, which is common in many practical applications. We propose the use of complete cross-validation for 1-nearest neighbor classifier as the feature selection criterion. Furthermore, a technique for calculating the complete bootstrap for 1-nearest-neighbor classifier is derived. The complete bootstrap and the complete cross-validation error estimates with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and crossvalidation in combination with different optimization techniques. It is shown that the novel estimates significantly improve the testing errors. The second improvement is the reduced initialization, which causes that evaluated subsets of features are smaller, which leads to faster feature selection. Finally, it must be mentioned that we also demonstrate the benefits and properties of our approaches on two important and novel real-world biomedical applications.
منابع مشابه
Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملFeature Selection Ensemble
Many strategies have been exploited for the task of feature selection, in an effort to identify more compact and better quality feature subsets. Such techniques typically involve the use of an individual feature significance evaluation, or a measurement of feature subset consistency, that work together with a search algorithm in order to determine a quality subset. Feature selection ensemble ai...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003