E cient Construction of Regression Trees with Range and Region Splitting

نویسندگان

  • Yasuhiko Morimoto
  • Hiromu Ishii
  • Shinichi Morishita
چکیده

We propose an e cient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple. To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric attributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside. The contributions of this paper are as follows. We present an e cient algorithm for computing R 2 R that minimizes the mean squared error after the introduction of the test with the region R. Experiments con rmed that the use of region splitting gives a smaller mean squared error of regression trees. Our approach can also generate smaller regression trees. This research is partially supported by the Advanced Software Enrichment Project of the Information-Technology Promotion Agency. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 23rd VLDB Conference Athens, Greece, 1997

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Salicylic Acid and Potassium on Some Characteristics Nut and Physiological Parameters of Pistachio Trees Cv. Owhadi

The effect of three salicylic acid (0, 50 and 100 mg l-1) and K2SO4 (0, 0.1 and 0.2 %) levels on some characteristics nut and physiological parameters of pistachio trees cv. ‘Owhadi’ were investigated. Treatments were applied at endospermic growth stage of seed and cotyledons appearance. The results showed that potassium increase yield, splitting percentage; nut fresh mass and kernel dry mass a...

متن کامل

Simplifying Model Trees with Regression and Splitting Nodes

Model trees are tree-based regression models that associate leaves with linear regression models. A new method for the stepwise induction of model trees (SMOTI) has been developed. Its main characteristic is the construction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. In this way, intern...

متن کامل

Simplification Methods for Model Trees with Regression and Splitting Nodes

Model trees are tree-based regression models that associate leaves with linear regression models. A new method for the stepwise induction of model trees (SMOTI) has been developed. Its main characteristic is the construction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. In this way, intern...

متن کامل

Analysis of the microbial quality in drinking water distribution networks using the logistic regression model in Dasht-e Azadegan county, an arid region in the southwest of Iran

The microbial quality of water plays a key role in community health. The present study aimed to determine the microbial quality of the drinking water distribution networks in the urban and rural areas of Dasht-e Azadegan County, Iran and assess the influential factors in the quality of drinking water.In this descriptive-analytical study, 907 drinking water samples were collected from the urban ...

متن کامل

Mining Tolerance Regions with Model Trees

Many problems encountered in practice involve the prediction of a continuous attribute associated with an example. This problem, known as regression, requires that samples of past experience with known continuous answers are examined and generalized in a regression model to be used in predicting future examples. Regression algorithms deeply investigated in statistics, machine learning and data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997