Evaluation of a New Incremental Classification Tree Algorithm for Mining High Speed Data Streams

نویسنده

  • N. Sivakumar
چکیده

Abstract—A new model for online machine learning process of high speed data stream is proposed, to minimize the severe restrictions associated with the existing computer learning algorithms. Most of the existing models have three principle steps. In the first step, the system would create a model incrementally. In the second step the time taken by the examples to complete a prescribed procedure with their arrival speed is computed. In the third and final step of the model the size of memory required for computation is predicted in advance. To overcome these restrictions we proposed this new data stream classification algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to be an excellent solution for processing larger data streams. In this paper, we present the experimental results of our new algorithm and prove that our method would eradicate the problems of the existing method.A new model for online machine learning process of high speed data stream is proposed, to minimize the severe restrictions associated with the existing computer learning algorithms. Most of the existing models have three principle steps. In the first step, the system would create a model incrementally. In the second step the time taken by the examples to complete a prescribed procedure with their arrival speed is computed. In the third and final step of the model the size of memory required for computation is predicted in advance. To overcome these restrictions we proposed this new data stream classification algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to be an excellent solution for processing larger data streams. In this paper, we present the experimental results of our new algorithm and prove that our method would eradicate the problems of the existing method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Cost-Efficient Mining Techniques for Data Streams

A data stream is a continuous and high-speed flow of data items. High speed refers to the phenomenon that the data rate is high relative to the computational power. The increasing focus of applications that generate and receive data streams stimulates the need for online data stream analysis tools. Mining data streams is a real time process of extracting interesting patterns from high-speed dat...

متن کامل

Discovering Evolutionary Classifier over High Speed Non-static Stream

With the emergence of large-volume and high-speed streaming data, mining data streams has become a focus of increasing interests. The major new challenges in streaming data mining are as follows: (1) since streams may flow in and out indefinitely and in fast speed, it is usually expected that a stream mining process can only scan a data stream once; and (2) since the characteristics of the data...

متن کامل

Info-fuzzy algorithms for mining dynamic data streams

Most data mining algorithms assume static behavior of the incoming data. In the real world, the situation is different and most continuously collected data streams are generated by dynamic processes, which may change over time, in some cases even drastically. The change in the underlying concept, also known as concept drift, causes the data mining model generated from past examples to become le...

متن کامل

Incrementally Optimized Decision Tree for Mining Imperfect Data Streams

The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016