E cient Construction of Regression Trees with Range and Region Splitting
نویسندگان
چکیده
We propose an e cient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple. To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric attributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside. The contributions of this paper are as follows. We present an e cient algorithm for computing R 2 R that minimizes the mean squared error after the introduction of the test with the region R. Experiments con rmed that the use of region splitting gives a smaller mean squared error of regression trees. Our approach can also generate smaller regression trees. This research is partially supported by the Advanced Software Enrichment Project of the Information-Technology Promotion Agency. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 23rd VLDB Conference Athens, Greece, 1997
منابع مشابه
The Effect of Salicylic Acid and Potassium on Some Characteristics Nut and Physiological Parameters of Pistachio Trees Cv. Owhadi
The effect of three salicylic acid (0, 50 and 100 mg l-1) and K2SO4 (0, 0.1 and 0.2 %) levels on some characteristics nut and physiological parameters of pistachio trees cv. ‘Owhadi’ were investigated. Treatments were applied at endospermic growth stage of seed and cotyledons appearance. The results showed that potassium increase yield, splitting percentage; nut fresh mass and kernel dry mass a...
متن کاملSimplifying Model Trees with Regression and Splitting Nodes
Model trees are tree-based regression models that associate leaves with linear regression models. A new method for the stepwise induction of model trees (SMOTI) has been developed. Its main characteristic is the construction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. In this way, intern...
متن کاملSimplification Methods for Model Trees with Regression and Splitting Nodes
Model trees are tree-based regression models that associate leaves with linear regression models. A new method for the stepwise induction of model trees (SMOTI) has been developed. Its main characteristic is the construction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. In this way, intern...
متن کاملAnalysis of the microbial quality in drinking water distribution networks using the logistic regression model in Dasht-e Azadegan county, an arid region in the southwest of Iran
The microbial quality of water plays a key role in community health. The present study aimed to determine the microbial quality of the drinking water distribution networks in the urban and rural areas of Dasht-e Azadegan County, Iran and assess the influential factors in the quality of drinking water.In this descriptive-analytical study, 907 drinking water samples were collected from the urban ...
متن کاملMining Tolerance Regions with Model Trees
Many problems encountered in practice involve the prediction of a continuous attribute associated with an example. This problem, known as regression, requires that samples of past experience with known continuous answers are examined and generalized in a regression model to be used in predicting future examples. Regression algorithms deeply investigated in statistics, machine learning and data ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997