smote

Blending Propensity Score Matching and Synthetic Minority Over-sampling Technique for Imbalanced Classification

2014

William A. Rivera Amit Goel Peter Kincaid

Real world data sets often contain disproportionate sample sizes of observed groups making the task of prediction algorithms very difficult. One of the many ways to combat inherit bias from class imbalance data is to perform re-sampling. In this paper we discuss two popular re-sampling approaches proposed in literature, Synthetic Minority Over-sampling Technique (SMOTE) and Propensity Score Mat...

متن کامل

High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences

2007

Cristiane Neri Nobre J. Miguel Ortega Antônio de Pádua Braga

An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Sm...

متن کامل

SMOTE for Regression

2013

Luís Torgo Rita P. Ribeiro Bernhard Pfahringer Paula Branco

Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extr...

متن کامل

Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark

Journal: :Neurocomputing 2021

One of the main goals Big Data research, is to find new data mining methods that are able process large amounts in acceptable times. In classification, as traditional class imbalance a common problem must be addressed, case also looking for solution can applied an execution time. this paper we present Approx-SMOTE, parallel implementation SMOTE algorithm Apache Spark framework. The key differen...

متن کامل

Possible explanation on the effect of variable selection on PAM used with SMOTE In our simulation studies with high-dimensional class-imbalanced data

2013

In our simulation studies with high-dimensional class-imbalanced data we observed that under the null case SMOTE had hardly any effect on classification with PAM, when all the p = 1000 simulated variables where considered. On the other hand, if only a subset of the variables was used (G = 40), SMOTE seemed beneficial in reducing the class-imbalance problem of PAM, decreasing the number of sampl...

متن کامل

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data

2013

Kung-Jeng Wang Bunjira Makond Kung-Min Wang

BACKGROUND Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-s...

متن کامل

Analysis of imbalanced data set problem: The case of churn prediction for telecommunication

Journal: :Artif. Intell. Research 2017

Chun Gui

Class-imbalanced datasets are common in the field of mobile Internet industry. We tested three kinds of feature selection techniques-Random Forest (RF), Relative Weight (RW) and Standardized Regression Coefficients (SRC); three kinds of balance methods-over-sampling (OS), under-sampling (US) and synthetic minority over-sampling (SMOTE); a widely used classification method-RF. The combined model...

متن کامل

Predicting credit card customer churn in banks using data mining

Journal: :IJDATS 2008

Dudyala Anil Kumar Vadlamani Ravi

In this paper, we solve the customer credit card churn prediction via data mining. We developed an ensemble system incorporating majority voting and involving Multilayer Perceptron (MLP), Logistic Regression (LR), decision trees (J48), Random Forest (RF), Radial Basis Function (RBF) network and Support Vector Machine (SVM) as the constituents. The dataset was taken from the Business Intelligenc...

متن کامل

Modified SMOTE Using Mutual Information and Different Sorts of Entropies

2018

Sima Sharifirad Azra Nazari Mehdi Ghatee

SMOTE is one of the oversampling techniques for balancing the datasets and it is considered as a pre-processing step in learning algorithms. In this paper, four new enhanced SMOTE are proposed that include an improved version of KNN in which the attribute weights are defined by mutual information firstly and then they are replaced by maximum entropy, Renyi entropy and Tsallis entropy. These fou...

متن کامل

Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE

2017

Rafet Sifa Julian Runge Christian Bauckhage Daniel Klapper

In non-contractual freemium and sharing economy settings, a small share of users often drives the largest part of revenue for firms and co-finances the free provision of the product or service to a large number of users. Successfully retaining and upselling such high-value users can be crucial to firms’ survival. Predictions of customers’ Lifetime Value (LTV) are a much used tool to identify hi...

متن کامل