An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

نویسندگان

  • Davi E. N. Frossard
  • Igor O. Nunes
  • Renato A. Krohling
چکیده

Techniques such as clusterization, neural networks and decision making usually rely on algorithms that are not well suited to deal with missing values. However, real world data frequently contains such cases. The simplest solution is to either substitute them by a best guess value or completely disregard the missing values. Unfortunately, both approaches can lead to biased results. In this paper, we propose a technique for dealing with missing values in heterogeneous data using imputation based on the k-nearest neighbors algorithm. It can handle real (which we refer to as crisp henceforward), interval and fuzzy data. The effectiveness of the algorithm is tested on several datasets and the numerical results are promising.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dealing with Missing Values in Data

Many existing industrial and research data sets contain missing values due to various reasons, such as manual data entry procedures, equipment errors and incorrect measurements. Problems associated with missing values are loss of efficiency, complications in handling and analyzing the data and bias resulting from differences between missing and complete data. The important factor for selection ...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

An Enhanced Approach for Treating Missing Value using Boosted K-NN

Knowledge Discovery in Dataset (KDD) plays a vital role in information analysis and retrieval based applications. Quality of data is the most indispensable component of KDD. The factor which affects the quality of datasets is presence of missing values. The data collected from the real world often contains serious data quality troubles such as incomplete, redundant, inconsistent, and/or noisy d...

متن کامل

A Novel Hybrid Approach to Estimating Missing Values in Databases Using K-nearest Neighbors and Neural Networks

Missing values in datasets and databases can be estimated via statistics, machine learning and artificial intelligence methods. This paper uses a novel hybrid neural network and weighted nearest neighbors to estimate missing values and provides good results with high performance. In this work, four different characteristic datasets were used and missing values were estimated. Error ratio, corre...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.04037  شماره 

صفحات  -

تاریخ انتشار 2016