Implicit Feature Selection with the Value Difference Metric
نویسندگان
چکیده
The nearest neighbour paradigm provides an effective approach to supervised learning. However, it is especially susceptible to the presence of irrelevant attributes. Whilst many approaches have been proposed that select only the most relevant attributes within a data set, these approaches involve pre-processing the data in some way, and can often be computationally complex. The Value Difference Metric (VDM) is a symbolic distance metric used by a number of different nearest neighbour learning algorithms. This paper demonstrates how the VDM can be used to reduce the impact of irrelevant attributes on classification accuracy without the need for pre-processing the data. We illustrate how this metric uses simple probabilistic techniques to weight features in the instance space, and then apply this weighting technique to an alternative symbolic distance metric. The resulting distance metrics are compared in terms of classification accuracy, on a number of real-world and artificial data sets.
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملImplicit Feature Selection with the Value Diierence Metric
The nearest neighbour paradigm provides an eeective approach to supervised learning. However, it is especially susceptible to the presence of irrelevant attributes. Whilst many approaches have been proposed that select only the most relevant attributes within a data set, these approaches involve pre-processing the data in some way, and can often be computationally complex. The Value Diierence M...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998