Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

نویسندگان

چکیده

We propose a methodology to address the programmability issues derived from emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and inversion (DMFI) as use case, target two modern architectures (AMD Rome Huawei Kunpeng 920) that exhibit configurable topologies. Our pursues performance portability across different configurations by proposing multi-domain implementations for DMFI plus hybrid task- loop-level parallelization configures multi-threaded executions fix core-to-data binding, exploiting locality at expense minor code modifications. In addition, introduce generalization offers support virtually any topology in present future experimentation on three representative linear algebra operations validates proposal, reveals insights necessity adapting both codes their execution improve data access locality, reports inter- intra-socket competitive with state-of-the-art message-passing implementations, maintaining ease development usually associated programming.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient parallel independent subsets and matrix factorizations

A parallel algorithm is given for computation of a maximal linearly independent subset of a set of vectors over a eld. The algorithm uses polylogarithmic time and uses a number of processors that diiers by only a polylog factor from the number required for fast parallel matrix inversion. It is used to produce eecient parallel algorithms for orthogonalizations of arbitrary matrices over real eld...

متن کامل

A Direct Matrix Inversion-Less Analysis for Distribution System Power Flow Considering Distributed Generation

This paper presents a new direct matrix inversion-less analysis for radial distribution systems (RDSs). The method can successfully deal with weakly meshed distribution systems. (WMDSs). Being easy to implement, direct methods (DMs) provide an excellent performance. Matrix inversion is the mean reason of divergence and low-efficiency in power flow algorithms. In this paper, the performance of t...

متن کامل

Invertastic: Large-scale Dense Matrix Inversion

Linear algebraic techniques are widely used in scientific computing, often requiring large-scale parallel resources such as those provided by the ARCHER service. Libraries exist to facilitate the development of appropriate parallel software, but use of these involves intricacies in decomposition of the problem, managing parallel input and output, passing messages and the execution of the linear...

متن کامل

Efficient High-precision Dense Matrix Algebra on Parallel Architectures for Nonlinear Discrete Optimization

We provide a proof point for the idea that matrix-based algorithms for discrete optimization problems, mainly conceived for proving theoretical efficiency, can be easily and efficiently implemented on massively-parallel architectures by exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision dense linear algebra. We have successfully implemented our alg...

متن کامل

Parallel Block Matrix Factorizations for Distributed Memory Multicomputers

EEcient and scalable parallel block algorithms for the LU factor-ization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Parallel and Distributed Computing

سال: 2023

ISSN: ['1096-0848', '0743-7315']

DOI: https://doi.org/10.1016/j.jpdc.2023.01.004