Benchmarking Stream Clustering Algorithms within the MOA Framework

نویسندگان

  • Philipp Kranen
  • Hardy Kremer
  • Timm Jansen
  • Thomas Seidl
  • Albert Bifet
  • Geoff Holmes
  • Bernhard Pfahringer
چکیده

In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We present a novel clustering evaluation framework for data streams. It is an extension of Massive Online Analysis (MOA), a software environment for implementation and evaluation of algorithms for online learning from evolving data streams. Our stream clustering algorithm evaluation framework includes a collection of online clustering methods and offers tools for extensive evaluation and visualization. Moreover, it allows for bidirectional interaction with WEKA, since it uses the same internal data structures. Our framework is designed for extensibility, allowing straightforward adding of more algorithms, evaluation measures, and data feeds. It is released under the GNU GPL license.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework

Most available static data are becoming more and more highdimensional. Therefore, subspace clustering, which aims at finding clusters not only within the full dimension but also within subgroups of dimensions, has gained a significant importance. Recently, OpenSubspace framework was proposed to evaluate and explorate subspace clustering algorithms in WEKA with a rich body of most state of the a...

متن کامل

Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends

Streaming data are potentially infinite sequence of incoming data at very high speed and may evolve over the time. This causes several challenges in mining large scale high speed data streams in real time. Hence, this field has gained a lot of attention of researchers in previous years. This paper discusses various challenges associated with mining such data streams. Several available stream da...

متن کامل

MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering

In today’s applications, massive, evolving data streams are ubiquitous. Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problems of scaling up the implementation of state of the art algorithms to real world dataset sizes and of making algorithm...

متن کامل

Effective Evaluation Measures for Subspace Clustering of Data Streams

Nowadays, most streaming data sources are becoming highdimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other ...

متن کامل

Detecting Sentiment Change in Twitter Streaming Data

MOA-TweetReader is a real-time system to read tweets in real time, to detect changes, and to find the terms whose frequency changed. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. MOA-TweetReader is a softw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010