smote

ovarian cancer classification using hybrid synthetic minority over-sampling technique and neural network

Journal: :journal of advances in computer research 0

moshood a. hambali computer science dept., federal university wukari, nigeria morufat d. gbolagade computer science dept., al-hikmah university, ilorin, nigeria

every woman is at risk of ovarian cancer; about 90 percent of women who develop ovarian cancer are above 40 years of age, with the high number of ovarian cancers occurring at the age of 60 years and above. early and correct diagnosis of ovarian cancer can allow proper treatment and as a result reduce the mortality rate. in this paper, we proposed a hybrid of synthetic minority over-sampling tec...

متن کامل

Investigating the performance improvement by sampling techniques in EEG data

2010

V. Baby Deepa M. Kumarasamy

In this paper the performance of oversampling methods such as SMOTE (Synthetic Minority Over-sampling Technique) and PCA (Principal Component Analysis) which are used for preprocessing are applied for the Brain computer interface dataset. The pre-processed data is used for classification by SMO and Naïve Bayes. In the EEG recordings, the transient events are detected while predicting the condit...

متن کامل

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

2015

Maciej Zieba Jakub M. Tomczak Adam Gonczarek

The problem of imbalanced data, i.e., when the class labels are unequally distributed, is encountered in many real-life application, e.g., credit scoring, medical diagnostics. Various approaches aimed at dealing with the imbalanced data have been proposed. One of the most well known data pre-processing method is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE may generate ...

متن کامل

Classification of rare land cover types: Distinguishing annual and perennial crops in an agricultural catchment in South Korea

2018

Christina Bogner Bumsuk Seo Dorian Rohner Björn Reineking

Many environmental data are inherently imbalanced, with some majority land use and land cover types dominating over rare ones. In cultivated ecosystems minority classes are often the target as they might indicate a beginning land use change. Most standard classifiers perform best on a balanced distribution of classes, and fail to detect minority classes. We used the synthetic minority oversampl...

متن کامل

Enhancing Efficiency and Accuracy of Imbalanced Datasets Using Fuzzy Neural Network

2014

S. Lavanya

In Data Mining the class Imbalance classification problem is considered to be one of the emergent challenges. This problem occurs when the number of examples that represents one of the classes of the dataset is much lower than the other classes. To tackle with imbalance problem, preprocessing the datasets applied with oversampling method (SMOTE) was previously proposed. Generalized instances ar...

متن کامل

Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm

Journal: :Eng. Appl. of AI 2016

Enislay Ramentol I. Gondres S. Lajes Rafael Bello Yailé Caballero Mota Chris Cornelis Francisco Herrera

For any electric power system, it is crucial to guarantee a reliable performance of its High Voltage Circuit Breaker (HCVB). Determining when the HCVB needs maintenance is an important and non-trivial problem, since these devices are used over extensive periods of time. In this paper, we propose the use of data mining techniques in order to predict the need of maintenance. In the corresponding ...

متن کامل

Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

2005

Hui Han Wenyuan Wang Binghuan Mao

In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority oversampling technique (S...

متن کامل

An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

2016

A. Alfattni

Abstract—Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalance...

متن کامل

Combing Data Filter and Data Sampling for Cross-Company Defect Prediction: An Empricial Study

2017

Xiao Yu Man Wu Yan Zhang Mandi Fu

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a prediction model with high performance. On the other hand, the CC data has the highly imbalanced nature betwe...

متن کامل

Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project

2017

Sherif Sakr Radwa El Shawi Amjad M. Ahmed Waqas T. Qureshi Clinton A. Brawner Steven J. Keteyian Michael J. Blaha Mouaz H. Al-Mallah

BACKGROUND Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medic...

متن کامل