An Algorithm for Mining Frequent Itemsets from Library Big Data

نویسنده

Xingjian Li

چکیده

Frequent itemset mining plays an important part in college library data analysis. Because there are a lot of redundant data in library database, the mining process may generate intra-property frequent itemsets, and this hinders its efficiency significantly. To address this issue, we propose an improved FP-Growth algorithm we call RFP-Growth to avoid generating intra-property frequent itemsets, and to further boost its efficiency, implement its MapReduce version with additional prune strategy. The proposed algorithm was tested using both synthetic and real world library data, and the experimental results showed that the proposed algorithm outperformed existing algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

An Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree

Frequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset mining is increasing. Therefore, the combination of fast implementation and low memory consumption, especially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm [1], for hardware. In this paper, we p...

متن کامل

Weighted Itemset Mining from Bigdata using Hadoop

Data items have been extracted using an empirical data mining technique called frequent itemset mining. In majority of theapplication contexts items are enriched with weights. Pushing an item weights into the itemset extraction process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient weighteditemset mining algorithms a...

متن کامل

MINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS

This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9 شماره

صفحات -

تاریخ انتشار 2014

An Algorithm for Mining Frequent Itemsets from Library Big Data

نویسنده

چکیده

منابع مشابه

Data sanitization in association rule mining based on impact factor

A New Algorithm for High Average-utility Itemset Mining

An Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree

Weighted Itemset Mining from Bigdata using Hadoop

MINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS

عنوان ژورنال:

اشتراک گذاری