نتایج جستجو برای: smote
تعداد نتایج: 650 فیلتر نتایج به سال:
Real world data sets often contain disproportionate sample sizes of observed groups making the task of prediction algorithms very difficult. One of the many ways to combat inherit bias from class imbalance data is to perform re-sampling. In this paper we discuss two popular re-sampling approaches proposed in literature, Synthetic Minority Over-sampling Technique (SMOTE) and Propensity Score Mat...
An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Sm...
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extr...
One of the main goals Big Data research, is to find new data mining methods that are able process large amounts in acceptable times. In classification, as traditional class imbalance a common problem must be addressed, case also looking for solution can applied an execution time. this paper we present Approx-SMOTE, parallel implementation SMOTE algorithm Apache Spark framework. The key differen...
In our simulation studies with high-dimensional class-imbalanced data we observed that under the null case SMOTE had hardly any effect on classification with PAM, when all the p = 1000 simulated variables where considered. On the other hand, if only a subset of the variables was used (G = 40), SMOTE seemed beneficial in reducing the class-imbalance problem of PAM, decreasing the number of sampl...
BACKGROUND Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-s...
Class-imbalanced datasets are common in the field of mobile Internet industry. We tested three kinds of feature selection techniques-Random Forest (RF), Relative Weight (RW) and Standardized Regression Coefficients (SRC); three kinds of balance methods-over-sampling (OS), under-sampling (US) and synthetic minority over-sampling (SMOTE); a widely used classification method-RF. The combined model...
In this paper, we solve the customer credit card churn prediction via data mining. We developed an ensemble system incorporating majority voting and involving Multilayer Perceptron (MLP), Logistic Regression (LR), decision trees (J48), Random Forest (RF), Radial Basis Function (RBF) network and Support Vector Machine (SVM) as the constituents. The dataset was taken from the Business Intelligenc...
SMOTE is one of the oversampling techniques for balancing the datasets and it is considered as a pre-processing step in learning algorithms. In this paper, four new enhanced SMOTE are proposed that include an improved version of KNN in which the attribute weights are defined by mutual information firstly and then they are replaced by maximum entropy, Renyi entropy and Tsallis entropy. These fou...
In non-contractual freemium and sharing economy settings, a small share of users often drives the largest part of revenue for firms and co-finances the free provision of the product or service to a large number of users. Successfully retaining and upselling such high-value users can be crucial to firms’ survival. Predictions of customers’ Lifetime Value (LTV) are a much used tool to identify hi...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید