Zeta: A Global Method for Discretization of Continuous Variables

نویسندگان

  • K. M. Ho
  • Paul D. Scott
چکیده

Discretization of continuous variables so they may be used in conjunction with machine learning or statistical techniques that require nominal data is an important problem to be solved in developing generally applicable methods for data mining. This paper introduces a new technique for discretization of such variables based on zeta, a measure of strength of association between nominal variables developed for this purpose. Following a review of existing techniques for discretization we define zeta, a measure based on minimisation of the error rate when each value of an independent variable must predict a different value of a dependent variable. We then describe both how a continuous variable may be dichotomised by searching for a maximum value of zeta, and how a heuristic extension of this method can partition a continuous variable into more than two categories. A series of experimental evaluations of zeta-discretization, including comparisons with other published methods, show that zeta-discretization runs considerably faster than other techniques without any loss of accuracy. We conclude that zeta-discretization offers considerable advantages over alternative procedures and discuss some of the ways in which it could be enhanced.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Global Discretization Method Technical Report Number 296

The development of an effective and efficient method for discretization of continuous variables is an important problem to be solved in developing generally applicable methods for data mining. In Ho and Scott 1997, we describe a new technique for discretization of continuous variables based on zeta, a measure of strength of association between nominal variables. The old zeta method partitions a...

متن کامل

Zeta: A Global Method for Discretization of Cotitinuous Variables

This paper introduces a new technique for discretization of continuous variables based on zeru, a measure of strength of association between nominal variables developed for this purpose. Zeta is defined as the maximal accuracy achievable if each value of an independent variable must predict a different value of a dependent variable. We describe both how a continuous variable may be dichotomised...

متن کامل

A global optimal algorithm for class-dependent discretization of continuous data

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and...

متن کامل

Logic-Based Methods for Global Optimization

Abstract. Logic-based methods provide a strategy for applying convex nonlinear programming to nonconvex global optimization. Such methods assume that the problem becomes convex when selected variables are fixed. The selected variables must be discrete, or else discretized if they are continuous. We provide a tutorial survey of disjunctive programming with convex relaxations, logic-based outer a...

متن کامل

Multi-Colony Ant Algorithm for Continuous Multi-Reservoir Operation Optimization Problem

Ant Colony Optimization (ACO) algorithms are basically developed for discrete optimization and hence their application to continuous optimization problems require the transformation of a continuous search space to a discrete one by discretization of the continuous decision variables. Thus, the allowable continuous range of decision variables is usually discretized into a discrete set of allowab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997