Cost-Effective Clustering through Active Feature-value Acquisition

ثبت نشده
چکیده

Many datasets include feature values that are missing but may be acquired at a cost. In this paper, we consider the clustering task for such datasets, and address the problem of acquiring missing feature values that improve clustering quality in a cost-effective manner. Since acquiring all missing information may be unnecessarily expensive, we propose a framework for iteratively selecting feature values that result in highest improvements in clustering quality per unit cost. Our framework can be adapted to different clustering algorithms, and we illustrate it in the context of two popular methods, K-Means and hierarchical agglomerative clustering. Experimental results on several datasets demonstrate clustering accuracy improvements provided by the proposed framework over random acquisition. Additional experiments demonstrate the performance of the framework for different cost structures, and explore several alternative formulations of the acquisition strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Information Acquisition for Improved Clustering

Many datasets include feature values that are missing but may be acquired at a cost. In this paper, we consider the clustering task for such datasets, and address the problem of acquiring missing feature values that improve clustering quality in a cost-effective manner. Since acquiring all missing information may be unnecessarily expensive, we propose a framework for iteratively selecting featu...

متن کامل

Active Feature Acquisition with Supervised Matrix Completion

Feature missing is a serious problem in many applications, which may lead to low quality of training data and further significantly degrade the learning performance. While feature acquisition usually involves special devices or complex process, it is expensive to acquire all feature values for the whole dataset. On the other hand, features may be correlated with each other, and some values may ...

متن کامل

Active Feature-Value Acquisition

Most induction algorithms for building predictive models take as input training data in the form of feature vectors. Acquiring the values of features may be costly, and simply acquiring all values may be wasteful, or prohibitively expensive. Active feature-value acquisition (AFA) selects features incrementally in an attempt to improve the predictive model most cost-effectively. This paper prese...

متن کامل

CUSTOMER CLUSTERING BASED ON FACTORS OF CUSTOMER LIFETIME VALUE WITH DATA MINING TECHNIQUE

Organizations have used Customer Lifetime Value (CLV) as an appropriate pattern to classify their customers. Data mining techniques have enabled organizations to analyze their customers’ behaviors more quantitatively. This research has been carried out to cluster customers based on factors of CLV model including length, recency, frequency, and monetary (LRFM) through data mining. Based on LRFM,...

متن کامل

Value of Information Lattice: Exploiting Probabilistic Independence for Effective Feature Subset Acquisition

We address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost and misclassification cost is minimized...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008