Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification
نویسندگان
چکیده
Streams of objects that are associated with one or more labels at the same time appear in many applications. However, stream classification of multi-label data is largely unexplored. Existing approaches try to tackle the problem by transferring traditional single-label stream classification practices to the multi-label domain. Nevertheless, they fail to consider some of the unique properties of the problem such as within and between class imbalance and multiple concept drift. To deal with these challenges, this paper proposes a novel multilabel stream classification approach that employs two windows for each label, one for positive and one for negative examples. Instance-sharing is exploited for space efficiency, while a time-efficient instantiation based on the k-Nearest Neighbor algorithm is also proposed. Finally, a batch-incremental thresholding technique is proposed to further deal with the class imbalance problem. Results of an empirical comparison against two other methods on three real world datasets are in favor of the proposed approach.
منابع مشابه
Detecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملA Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, ...
متن کاملRecursive least square perceptron model for non-stationary and imbalanced data stream classification
Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. ‘‘Concept drift’’ (or nonstationarity) is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers...
متن کاملSurvey on Pattern Optimization for Novel Class in MCM for Stream Data Classification
The classification of stream data is somehow difficult. Existing data stream classification techniques assume that total number of classes in the stream is fixed. Therefore, instances belonging to a novel class are misclassified by the existing techniques. Because data streams have endless length, conventional multi pass learning algorithms are not appropriate as they would require infinite sto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011