نتایج جستجو برای: cyclic parallel

تعداد نتایج: 320785  

1995
Bing Bing Zhou Richard P. Brent

In this paper we give evidence to show that in onesided Jacobi SVD computation the sorting of column norms in each sweep is very important. Two parallel Jacobi orderings are described. These orderings can generate n(n 1)=2 di erent index pairs and sort column norms at the same time. The one-sided Jacobi SVD algorithm using these parallel orderings converges in about the same number of sweeps as...

2011
SHEN WANG XIAOYE LI JIANLIN XIA MAARTEN V. DE HOOP

Recently, hierarchically semiseparable (HSS) matrices have been used in the development of fast direct sparse solvers. Key applications of HSS algorithms, coupled with multifrontal solvers, appear in solving certain large-scale computational inverse problems. Here, we develop massively parallel HSS algorithms appearing in these solution methods, namely, parallel HSS construction using the rank ...

Journal: :Journal of biomechanics 2004
James H-C Wang Guoguang Yang Zhaozhu Li Wei Shen

Fibroblasts in intact tendons align with stretching direction, but they tend to orient randomly in healing tendons. Therefore, a question arises: Do fibroblast responses to mechanical stretching depend on their orientation? To address this question, human patellar tendon fibroblasts were grown in custom-made silicone dishes that possess microgrooved culture surfaces. The direction of the microg...

1996
Tuomo Rossi Jari Toivanen

A parallel fast direct solver based on the Divide & Conquer method for linear systems with separable block tridiagonal matrices is considered. Such systems appear, for example, when discretizing the Poisson equation in a rectangular domain using the ve{point nite diierence scheme or the piecewise linear nite elements on a triangulated rectangular mesh. The Divide & Conquer method has the arithm...

1997
Peter Strazdins

A list of technical reports, including some abstracts and copies of some full reports may be found at: Abstract. Software overheads can be a signiicant cause of performance degradation in parallel numerical libraries. This paper examines the nature and extent of software overheads in an implementation of parallel LAPACK on distributed memory multiprocessors, where block-partitioned algorithms w...

2000
C. D. Pham

Parallel simulations of fine grain applications usually generate a large amount of messages. The overhead for sending these messages over an interconnection network can dramatically limit the speedup of a parallel simulation. In this case, message aggregation techniques can increase the granularity of the application and reduce the communication overhead. This paper compares sender-initiated an...

2010
Jungpyo Lee John C. Wright

Three-dimensional (3-D) processor configuration of a parallel solver is introduced to solve a massive block-tridiagonal matrix system in this paper. The purpose of the added parallelization dimension is to retard the saturation of the scaling due to communication overhead and an inefficient parallelization. The semi-empirical formula for the matrix operation count of the typical parallel algori...

2003
J. Lu K. H. Law A. Elgamal

This paper presents a parallel nonlinear finite element program, ParCYCLIC, which is designed for the analysis of cyclic seismically-induced liquefaction problems. Key elements of the computational strategy employed in ParCYCLIC include the deployment of an automatic domain decomposer, the use of the multilevel nested dissection algorithm for the ordering of finite element nodes, and the develo...

1995
Alexandre E. Eichenberger Santosh G. Abraham

We propose an analytical model that quantifies the overall execution time of a parallel region in the presence of non-deterministic load imbalance introduced by network contention and by random replacement policy in processor caches. We present a novel model that evaluates the expected hit ratio and variance introduced by a cache accessed with a cyclic access stream. We also model the performan...

2004
P. Sissokho T. Whalen

We propose a recursive formula for computing the remainder of a Euclidean division of polynomials (with binary coefficients), which operates in parallel on w bits at a time and takes t new incoming bits at each stage. We use this formula to design a fast parallel Cyclic Redundancy Check (CRC) system which is a look-ahead scheme that trades in arbitrary depth (processing time per cycle) and thro...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید