selection data mining

Handling Sparse Data Sets by Applying Contrast Set Mining in Feature Selection

Journal: :JSW 2016

Dijana Oreski Mario Konecki

A data set is sparse if the number of samples in a data set is not sufficient to model the data accurately. Recent research emphasized interest in applying data mining and feature selection techniques to real world problems, many of which are characterized as sparse data sets. The purpose of this research is to define new techniques for feature selection in order to improve classification accur...

متن کامل

Very many variables and limited numbers of observations; The p>>n problem in current statistical applications

2012

Johann Sölkner

New technologies have led to an “explosion” of data available to document states and processes in very many fields. Tools of data mining are being used to extract relevant information. If this information is used in decision making, analytical statistics can provide formal tests comparing the outcomes of different scenarios. Statistics has traditionally dealt with limited information, both in t...

متن کامل

A Comparative Study of Real-Valued Negative Selection to Statistical Anomaly Detection Techniques

2005

Thomas Stibor Jonathan Timmis Claudia Eckert

The (randomized) real-valued negative selection algorithm is an anomaly detection approach, inspired by the negative selection immune system principle. The algorithm was proposed to overcome scaling problems inherent in the hamming shape-space negative selection algorithm. In this paper, we investigate termination behavior of the realvalued negative selection algorithm with variable-sized detec...

متن کامل

Success Is Hidden in the Students' Data

2012

Dimitrios Kravvaris Katia Kermanidis Eleni Thanou

The contribution of data mining to education as well as research in this area is done on a variety of levels and can affect the instructors’ approach to learning. This particular study focuses on problems associated with classification and attribute selection. An effort to forecast the results takes place before the educational process ends in order to prevent a potential learning failure. The ...

متن کامل

EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

2011

Kun-Yi Hsin Hugh P. Morgan Steven R. Shave Andrew C. Hinton Paul Taylor Malcolm D. Walkinshaw

We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has bee...

متن کامل

Dimensionality and data reduction in telecom churn prediction

Journal: :Kybernetes 2014

Wei-Chao Lin Chih-Fong Tsai Shih-Wen Ke

Purpose – Churn prediction is a very important task for successful customer relationship management. In general, churn prediction can be achieved by many data mining techniques. However, during data mining, dimensionality reduction (or feature selection) and data reduction are the two important data preprocessing steps. In particular, the aims of feature selection and data reduction are to filt...

متن کامل

Classification and Comparative Study of Data Mining Classifiers with Feature Selection on Binomial Data Set

2012

Pushpalata Pujari

This paper describes about the performance analysis of different data mining classifiers before and after feature selection on binomial data set. Three data mining classifiers Logistic Regression, SVM and Neural Network classifiers are considered in this paper for classification. The Congressional Voting Records data set is a binomial data set investigated in this study is taken from UCI machin...

متن کامل

Komparasi Algoritma Klasifikasi Data Mining Menggunakan Optimize Selection untuk Peminatan Program Studi

Journal: :Building of Informatics, Technology and Science (BITS) 2022

The selection of a study program is unique opportunity for student. STMIK IKMI Cirebon now KIP Kuliah provider, offering three program. research problem the unavailability model student interest in program, so it necessary to carry out an by applying algorithm classification model. used as comparison Decision Tree (C4.5), Naive Bayes, k-Nearest Neighbor and Support Vector Machine. applies Optim...

متن کامل

Detection of financial statement fraud and feature selection using data mining techniques

Journal: :Decision Support Systems 2011

Pediredla Ravisankar Vadlamani Ravi G. Raghava Rao Indranil Bose

a r t i c l e i n f o Keywords: Data mining Financial fraud detection Feature selection t-statistic Neural networks SVM GP Recently, high profile cases of financial statement fraud have been dominating the news. This paper uses data mining techniques such as Multilayer to identify companies that resort to financial statement fraud. Each of these techniques is tested on a dataset involving 202 C...

متن کامل

Discovery of Genotype-to-Phenotype Associations: A Grid-enabled Scientific Workflow Setting

2009

Lefteris Koumakis Stelios Sfakianakis Vassilis Moustakis George Potamias

The heterogeneity and scale of the data generated by high throughput genotyping association studies calls for seamless access to respective distributed data sources. Toward this end the utilization of state of the art data resource management and integration methodologies such as Grid and Web Services is of paramount importance for the realization of efficient and secure knowledge discovery sce...

متن کامل