Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

نویسندگان

چکیده

With the boom in development of computer hardware and software, social media, IoT platforms, communications, there has been exponential growth volume data produced worldwide. Among these data, relational datasets are growing popularity as they provide unique insights regarding evolution communities their interactions. Relational naturally non-negative, sparse, extra-large. usually contain triples (subject, relation, object) represented graphs/multigraphs, called knowledge graphs, which need to be embedded into a low-dimensional dense vector space. various embedding models, RESCAL allows learning extract posterior distributions over latent variables make predictions missing relations. However, is computationally demanding requires fast distributed implementation analyze extra-large real-world datasets. Here we introduce non-negative algorithm for heterogeneous CPU/GPU architectures with automatic selection number (model selection), pyDRESCALk. We demonstrate correctness pyDRESCALk large synthetic tensors efficacy showing near-linear scaling that concurs theoretical complexities. Finally, determines an 11-terabyte 9-exabyte sparse tensor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Negative Tensor Factorization with RESCAL

Non-negative data is generated by a broad selection of applications today, e.g in gene expression analysis or imaging. Many factorization techniques have been extended to account for this natural constraint and have become very popular due to their decomposition into interpretable latent factors. Generally relational data like protein interaction networks or social network data can also be seen...

متن کامل

Towards Exascale Distributed Data Management

Exascale eScience infrastructures” will face important and critical challenges both from computational and data perspectives. Increasingly complex and parallel scientific codes will lead to the production of huge amount of data. The large volume of data and the time needed to locate, access, analyze and visualize data will greatly impact on the scientific productivity of scientists and research...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

Allocation models for DMUs with negative data

The formulas of cost and allocative efficiencies of decision making units (DMUs) with positive data cannot be used for DMUs with negative data. On the other hand, these formulas are needed to analyze the productivity and performance of DMUs with negative data. To this end, this study introduces the cost and allocative efficiencies of DMUs with negative data and demonstrates that the introduc...

متن کامل

Distributed Black-Box Software Testing Using Negative Selection

In the software development process, testing is one of the most human intensive steps. Many researchers try to automate test case generation to reduce the manual labor of this step. Negative selection is a famous algorithm in the field of Artificial Immune System (AIS) and many different applications has been developed using its idea. In this paper we have designed a new algorithm based on nega...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Parallel and Distributed Computing

سال: 2023

ISSN: ['1096-0848', '0743-7315']

DOI: https://doi.org/10.1016/j.jpdc.2023.04.010