A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification

نویسندگان

چکیده

For higher dimensional or "Big Data (BD)" clustering and classification, the dimensions of documents have to be considered. The overhead classifying methods might also reduced by resolving volumetric issue documents. However, shortened collection potentially generate noise abnormalities. Previous abnormality information removal strategies include several different approaches that already been established throughout time. To increase classification accuracy, current classifications new has created conduct must deal with some most difficult issues in BD document categorization clustering. Hence, goals this research are derived from can solved only expanding accuracy classifiers. Superior clusters may achieved using effective "Dimensionality Reduction (DR)". As first step research, we introduce a unique DR approach preserves word frequency collection, allowing algorithm obtain improved (or) at least equal levels lower dimensionality set When "Word Patterns (WPs)" during "WP Clustering (WPC)", imply WP "Similarity Function (SF)" for Computation (SC)" used as part WPC. is accomplished use gained various clusters. Finally, provide Measures" SC high texts deliver SF classification. With assessment criteria like "Information-Ratio Dimension-Reduction", "Accuracy", "Recall", discovered proposed method paired (WP-SC) scaled extremely effectively "Dataset’s (DS)" surpasses technique AFO-MKSVM. According findings, WP-SC produced more favorable outcomes than LDA-SVM AFO-MKSVM approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

Fast Kernel Matrix Computation for Big Data Clustering

Kernel k-Means is a basis for many state of the art global clustering approaches. When the number of samples grows too big, however, it is extremely time-consuming to compute the entire kernel matrix and it is impossible to store it in the memory of a single computer. The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix. The com...

متن کامل

Diagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms

Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...

متن کامل

Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering

When the data vectors are high dimensional it is com putationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute simi larities or distances in the original data space It is therefore necessary to reduce the dimensionality before for example clustering the data If the dimensionality is very high like in the WEBSOM method which orga nizes textual docume...

متن کامل

Dimensionality Reduction for Distance Based Video Clustering

Clustering of video sequences is essential in order to perform video summarization. Because of the high spatial and temporal dimensions of the video data, dimensionality reduction becomes imperative before performing Euclidean distance based clustering. In this paper, we present non-adaptive dimensionality reduction approaches using random projections on the video data. Assuming the data to be ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Advanced Research in Applied Sciences and Engineering Technology

سال: 2023

ISSN: ['2462-1943']

DOI: https://doi.org/10.37934/araset.32.1.238251