نتایج جستجو برای: imbalanced data

تعداد نتایج: 2412732  

2004
Chao Chen Andy Liaw

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, F-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accurac...

Journal: :IJKESDP 2011
Hien M. Nguyen Eric W. Cooper Katsuari Kamei

Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline b...

2011
Antonio Maratea Alfredo Petrosino

Many critical application domains present issues related to imbalanced learning classification from imbalanced data. Using conventional techniques produces biased results, as the over-represented class dominates the learning process and tend to naturally attract predictions. As a consequence, the false negative rate may result unacceptable and the chosen classifier unusable. We propose a classi...

2011
Yubin Park Joydeep Ghosh

This paper introduces a novel splitting criterion parametrized by a scalar ‘α’ to build a class-imbalance resistant ensemble of decision trees. The proposed splitting criterion generalizes information gain in C4.5, and its extended form encompasses Gini(CART) and DKM splitting criteria as well. Each decision tree in the ensemble is based on a different splitting criterion enforced by a distinct...

2012
Da Kuang Charles X. Ling Jun Du

Mining class-imbalanced data is a common yet challenging problem in data mining and machine learning. When the class is imbalanced, the error rate of the rare class is usually much higher than that of the majority class. How many samples do we need in order to bound the error of the rare class (and the majority class)? If the misclassification cost of the class is known, can the costweighted er...

2011
Alina Lazar Bradley Shellito

This paper describes a method of improving the prediction of urbanization. The four datasets used in this study were extracted using Geographical Information Systems (GIS). Each dataset contains seven independent variables related to urban development and a class label which denotes the urban areas versus the rural areas. Two classification methods Support Vector Machines (SVM) and Neural Netwo...

2008
Albert Orriols-Puig Ester Bernadó-Mansilla

This chapter investigates the capabilities of XCS for mining imbalanced datasets. Initial experiments show that, for moderate and high class imbalances, XCS tends to evolve a large proportion of overgeneral classifiers. Theoretical analyses are developed, deriving an imbalance bound up to which XCS should be able to differentiate between accurate and overgeneral classifiers. Some relevant param...

Journal: :CoRR 2015
Heng Wang Zubin Abraham

Common statistical prediction models often require and assume stationarity in the data. However, in many practical applications, changes in the relationship of the response and predictor variables are regularly observed over time, resulting in the deterioration of the predictive performance of these models. This paper presents Linear Four Rates (LFR), a framework for detecting these concept dri...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید