data stream algorithm

Internal Clustering Evaluation of Data Streams

2015

Marwan Hassani Thomas Seidl

Clustering validation is a crucial part of choosing a clustering algorithm which performs best for an input data. Internal clustering validation is efficient and realistic, whereas external validation requires a ground truth which is not provided in most applications. In this paper, we analyze the properties and performances of eleven internal clustering measures. In particular, as the importan...

متن کامل

An Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream

2009

Show-Jane Yen Yue-Shi Lee Cheng-Wei Wu Chin-Lin Lin

Data mining refers to the process of revealing unknown and potentially useful information from a large database. Frequent itemsets mining is one of the foundational problems in data mining, which is to discover the set of products that purchased frequently together by customers from a transaction database. However, there may be a large number of patterns generated from database, and many of the...

متن کامل

Evaluation of a New Incremental Classification Tree Algorithm for Mining High Speed Data Streams

2016

N. Sivakumar

Abstract—A new model for online machine learning process of high speed data stream is proposed, to minimize the severe restrictions associated with the existing computer learning algorithms. Most of the existing models have three principle steps. In the first step, the system would create a model incrementally. In the second step the time taken by the examples to complete a prescribed procedure...

متن کامل

Scaling Out the Performance of Service Monitoring Applications with BlockMon

2013

Davide Simoncelli Maurizio Dusi Francesco Gringoli Saverio Niccolini

To cope with real-time data analysis as the amount of data being exchanged over the network increases, an idea is to re-design algorithms originally implemented on the monitoring probe to work in a distributed manner over a stream-processing platform. In this paper we show preliminary performance analysis of a Twitter trending algorithm when running over BlockMon, an open-source monitoring plat...

متن کامل

Interval Count Semi-Joins

2018

Panagiotis Bouros Nikos Mamoulis

Interval joins find applications in several domains, including temporal and spatial databases, uncertain data management, streaming data processing. In this paper, we study the evaluation of an interval count semi-join (ICS J ) operation that can be used for selecting or ranking intervals based on the number of join pairs they appear in. We extend the state-of-the-art algorithm for interval joi...

متن کامل

Temporal Structure Learning for Clustering Massive Data Streams in Real-Time

2011

Michael Hahsler Margaret H. Dunham

This paper describes one of the first attempts to model the temporal structure of massive data streams in real-time using data stream clustering. Recently, many data stream clustering algorithms have been developed which efficiently find a partition of the data points in a data stream. However, these algorithms disregard the information represented by the temporal order of the data points in th...

متن کامل

Finding Frequent Items in Data Streams

2002

Moses Charikar Kevin Chen Martin Farach-Colton

We present a 1-pass algorithm for estimating the most frequent items in a data stream using very limited storage space. Our method relies on a novel data structure called a count sketch, which allows us to estimate the frequencies of all the items in the stream. Our algorithm achieves better space bounds than the previous best known algorithms for this problem for many natural distributions on ...

متن کامل

Proposal of Data Processing Platform for Direct Marketing Data

Journal: :CoRR 2016

Jorge Luis Rivero Pérez Yaimara Peñate Santana Pedro Harenton Martínez López

Data mining has been widely used to identify potential customers for a new product or service. In this article is done a study of previous work relating to the application of data mining methodologies for software projects, specifically for direct marketing projects. Several data sets of demographic and historical customer purchases data available for evaluation of algorithms in this area, some...

متن کامل

Mining Concept-Drifting Data Streams

2010

Haixun Wang Philip S. Yu Jiawei Han

Knowledge discovery from infinite data streams is an important and difficult task.We are facing two challenges, the overwhelming volume and the concept drifts of the streaming data. In this chapter, we introduce a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, et...

متن کامل

Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Journal: :Data Knowl. Eng. 2009

Nishad Manerikar Themis Palpanas

The problem of detecting frequent items in streaming data is relevant to many different applications across many domains. Several algorithms, diverse in nature, have been proposed in the literature for the solution of the above problem. In this paper, we review these algorithms, and we present the results of the first extensive comparative experimental study of the most prominent algorithms in ...

متن کامل