User-Friendly Clustering for Atmospheric Data Analysis
نویسندگان
چکیده
Atmospheric data analysis is an important area of scientific endeavor, with both government and industrial applications. Our work focuses on clustering particle data acquired via an Aerosol Time-of-Flight Mass Spectrometer (ATOFMS), which is sold and marketed by TSI, Inc. Most papers and software tools developed by the single-particle mass spectrometry community use the ART-2a clustering algorithm. We present in this paper a comparison of the well-known K-means algorithm with ART-2a in this application area. Specifically, we show that despite the entrenched position of the ART-2a algorithm in this domain, Kmeans is faster, more scalable, and considerably easier for practitioners to use while obtaining results of similar accuracy. For data mining practitioners in general and for those who develop software in particular, our work shows that in an important application area K-means is much easier for users to use than ART-2a without sacrificing accuracy. For researchers in the single-particle mass spectrometry community, our experiments demonstrate that ART-2a presents some issues that may be of concern. We propose that K-means offers an attractive alternative.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملGAP: A graphical environment for matrix visualization and cluster analysis
GAP is a Java-designed exploratory data analysis (EDA) software for matrix visualization (MV) and clustering of high-dimensional data sets. It provides direct visual perception for exploring structure of a given data matrix and its corresponding proximity matrices, for variables and subjects. Various matrix permutation algorithms and clustering methods with validation indices are implemented fo...
متن کاملEvolutionary User Clustering Based on Time-Aware Interest Changes in the Recommender System
The plenty of data on the Internet has created problems for users and has caused confusion in finding the proper information. Also, users' tastes and preferences change over time. Recommender systems can help users find useful information. Due to changing interests, systems must be able to evolve. In order to solve this problem, users are clustered that determine the most desirable users, it pa...
متن کاملWeb-based Visualization and Analysis of Atmospheric Nucleation Processes
Nucleation phenomena play a pivotal role in many atmospheric and technological processes. However, understanding atmospheric nucleation processes has been difficult due to the lack of effective data exploration tools. In this paper, we present a web-based tool that allows remote users to mine the wealth of particle-based nucleation simulation data through web-based visualization and analysis. T...
متن کاملUser Review Sentiment Classification and Aggregation
User reviews provide a wealth of information but are often overwhelming in volume. In this work we propose a novel approach to extract positive and negative sentiments from user review data leveraging only the overall review scores that are part of the data itself. We then investigate clustering techniques to identify key positive and negative sentiment aspects to provide a user friendly summar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005