cyclic parallel

On parallel implementation of the one-sided Jacobi algorithm for singular value decompositions

1995

Bing Bing Zhou Richard P. Brent

In this paper we give evidence to show that in onesided Jacobi SVD computation the sorting of column norms in each sweep is very important. Two parallel Jacobi orderings are described. These orderings can generate n(n 1)=2 di erent index pairs and sort column norms at the same time. The one-sided Jacobi SVD algorithm using these parallel orderings converges in about the same number of sweeps as...

متن کامل

Efficient Parallel Algorithms for Hierarchically Semiseparable Matrices

2011

SHEN WANG XIAOYE LI JIANLIN XIA MAARTEN V. DE HOOP

Recently, hierarchically semiseparable (HSS) matrices have been used in the development of fast direct sparse solvers. Key applications of HSS algorithms, coupled with multifrontal solvers, appear in solving certain large-scale computational inverse problems. Here, we develop massively parallel HSS algorithms appearing in these solution methods, namely, parallel HSS construction using the rank ...

متن کامل

Fibroblast responses to cyclic mechanical stretching depend on cell orientation to the stretching direction.

Journal: :Journal of biomechanics 2004

James H-C Wang Guoguang Yang Zhaozhu Li Wei Shen

Fibroblasts in intact tendons align with stretching direction, but they tend to orient randomly in healing tendons. Therefore, a question arises: Do fibroblast responses to mechanical stretching depend on their orientation? To address this question, human patellar tendon fibroblasts were grown in custom-made silicone dishes that possess microgrooved culture surfaces. The direction of the microg...

متن کامل

A Parallel Fast Direct Solver for Block Tridiagonal Systemswith

1996

Tuomo Rossi Jari Toivanen

A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the ve{point nite diierence scheme or the piecewise linear nite elements on a triangulated rectangular mesh. The Divide & Conquer method has the arithm...

متن کامل

Reducing Software Overheads in Parallel Linear Algebra Libraries

1997

Peter Strazdins

A list of technical reports, including some abstracts and copies of some full reports may be found at: Abstract. Software overheads can be a signiicant cause of performance degradation in parallel numerical libraries. This paper examines the nature and extent of software overheads in an implementation of parallel LAPACK on distributed memory multiprocessors, where block-partitioned algorithms w...

متن کامل

Comparison of Message Aggregation Strategies for Parallel Simulations on a High Performance Cluster

2000

C. D. Pham

Parallel simulations of fine grain applications usually generate a large amount of messages. The overhead for sending these messages over an interconnection network can dramatically limit the speedup of a parallel simulation. In this case, message aggregation techniques can increase the granularity of the application and reduce the communication overhead. This paper compares sender-initiated an...

متن کامل

A versatile parallel block-tridiagonal solver for spectral codes

2010

Jungpyo Lee John C. Wright

Three-dimensional (3-D) processor configuration of a parallel solver is introduced to solve a massive block-tridiagonal matrix system in this paper. The purpose of the added parallelization dimension is to retard the saturation of the scaling due to communication overhead and an inefficient parallelization. The semi-empirical formula for the matrix operation count of the typical parallel algori...

متن کامل

Simulation of Earthquake Liquefaction Response on Parallel Computers

2003

J. Lu K. H. Law A. Elgamal

This paper presents a parallel nonlinear finite element program, ParCYCLIC, which is designed for the analysis of cyclic seismically-induced liquefaction problems. Key elements of the computational strategy employed in ParCYCLIC include the deployment of an automatic domain decomposer, the use of the multilevel nested dissection algorithm for the ordering of finite element nodes, and the develo...

متن کامل

Modeling load imbalance and fuzzy barriers for scalable shared-memory multiprocessors

1995

Alexandre E. Eichenberger Santosh G. Abraham

We propose an analytical model that quantifies the overall execution time of a parallel region in the presence of non-deterministic load imbalance introduced by network contention and by random replacement policy in processor caches. We present a novel model that evaluates the expected hit ratio and variance introduced by a cache accessed with a cyclic access stream. We also model the performan...

متن کامل

On Parallel CRC Computations

2004

P. Sissokho T. Whalen

We propose a recursive formula for computing the remainder of a Euclidean division of polynomials (with binary coefficients), which operates in parallel on w bits at a time and takes t new incoming bits at each stage. We use this formula to design a fast parallel Cyclic Redundancy Check (CRC) system which is a look-ahead scheme that trades in arbitrary depth (processing time per cycle) and thro...

متن کامل