Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

نویسنده

چکیده مقاله:

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. For instance, in the problem of semantic segmentation of images using texture-based features, the number of features can be infinitely growing. In these dynamically growing scenarios, a rudimentary approach is waiting a long time for all features to become available and then carry out the feature selection methods. However, due to the importance of optimal decisions at every time step, a more rational approach is to design an online streaming feature selection (OSFS) method which selects a best feature subset from so-far-seen information and updates the subset on the fly when new features stream in. Any OSFS method must satisfy three critical conditions; first, it should not require any domain knowledge about feature space, because the full feature space is unknown or inaccessible. Second, it should allow efficient incremental updates in selected features. Third, it should be as accurate as possible at each time instance to allow having reliable classification and learning tasks at that time instance. In this paper, OSFS is considered from the geometric series of features adjacency matrix and, a new OSFS algorithm called OSFS-GS is proposed. This algorithm ranks features based on path integrals and the centrality concept on an online feature adjacency graph. The most appealing characteristics of the proposed algorithm are; 1) all possible subsets of features are considered in evaluating the rank of a given feature, 2) it is extremely efficient, as it converts the feature ranking problem to simply calculating the geometric series of an adjacency matrix and 3) beside selected features subset, it uses a redundant features subset that provides the reconsideration of good features at different time instances. This algorithm is compared with three state-of-the-art OSFS algorithms, namely information-investing, fast-OSFS and OSFSMI. The information-investing algorithm is an embedded online feature selection method that considers the feature selection as a part of learning process. This algorithm selects a new incoming feature if it reduces the model entropy more than the cost of the feature coding. The fast-OSFS algorithm is a filter method that gradually generates a Markov-blanket of feature space using causality-based measures. For any new incoming feature, this algorithm executes two processes: an online relevance analysis and then an online redundancy analysis. OSFSMI is a similar algorithm to fast-OSFS, in which uses information theory for feature analysis. The algorithms are extensively evaluated on eight high-dimensional datasets in terms of compactness, classification accuracy and run-time. In order to provide OSF scenario, features are considered one by one. Moreover, in order to strengthen the comparison, the results are averaged over 30 random streaming orders. Experimental results demonstrate that OSFS-GS algorithm achieves better accuracies than the three existing OSFS algorithms.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection

In the paper, we consider an interesting and challenging problem, online streaming feature selection, in which the size of the feature set is unknown, and not all features are available from learning while leaving the number of observations constant. In this problem, the candidate features arrive one at a time, and the learner's task is to select a “best so far” set of features from streaming f...

متن کامل

Online Streaming Feature Selection

We study an interesting and challenging problem, online streaming feature selection, in which the size of the feature set is unknown, and not all features are available for learning while leaving the number of observations constant. In this problem, the candidate features arrive one at a time, and the learner's task is to select a “best so far” set of features from streaming features. Standard ...

متن کامل

LOFS: Library of Online Streaming Feature Selection

As an emerging research direction, online streaming feature selection deals with sequentially added dimensions in a feature space while the number of data instances is fixed. Online streaming feature selection provides a new, complementary algorithmic methodology to enrich online feature selection, especially targets to high dimensionality in big data analytics. This paper introduces the first ...

متن کامل

assessment of the efficiency of s.p.g.c refineries using network dea

data envelopment analysis (dea) is a powerful tool for measuring relative efficiency of organizational units referred to as decision making units (dmus). in most cases dmus have network structures with internal linking activities. traditional dea models, however, consider dmus as black boxes with no regard to their linking activities and therefore do not provide decision makers with the reasons...

the impact of skopos on syntactic features of the target text

the present study is an experimental case study which investigates the impacts, if any, of skopos on syntactic features of the target text. two test groups each consisting of 10 ma students translated a set of sentences selected from advertising texts in the operative and informative mode. the resulting target texts were then statistically analyzed in terms of the number of words, phrases, si...

15 صفحه اول

the use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach

abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...

15 صفحه اول

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 17  شماره 4

صفحات  3- 14

تاریخ انتشار 2021-02

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023