linear speedup

نتایج جستجو برای: linear speedup

تعداد نتایج: 490347 فیلتر نتایج به سال:

A software system for scalable parameter estimation on clusters

2007

Tom Bulatewicz Daniel Andresen Stephen Welch Wei Jin Sanjoy Das Matthew Miller

Advancements in data collection and high performance computing are making sophisticated model calibration possible throughout the modeling and simulation community. The model calibration process, in which the appropriate input values are estimated for unknown parameters, is typically a computationally intensive task and necessitates the use of distributed software components. These components a...

متن کامل

Agglomerative Mean-Shift Clustering via Query Set Compression

2009

Xiao-Tong Yuan Bao-Gang Hu Ran He

Mean-Shift (MS) is a powerful non-parametric clustering method. Although good accuracy can be achieved, its computational cost is particularly expensive even on moderate data sets. In this paper, for the purpose of algorithm speedup, we develop an agglomerative MS clustering method called Agglo-MS, along with its mode-seeking ability and convergence property analysis. Our method is built upon a...

متن کامل

A Parallel Algorithm for Discrete Gabor Transforms

2007

Kshitij Sudan Nipun Saggar Asok De

Serial algorithms to evaluate the Gabor transform of a discrete signal are bound by the length of signal for which the transform can be evaluated. The time taken, if machine memory and other factors are ignored, grows as order O(N) making it unsuitable for transforms of large signal lengths. In this paper, we present a parallel algorithm to generate the computationally intensive Gabor transform...

متن کامل

A Mathematical Framework for Parallel Computing of Discrete-Time Discrete-Frequency Transforms in Multi-Core Processors

2014

Pablo Soto-Quiros

This paper presents a mathematical framework for a family of discrete-time discrete-frequency transforms in terms of matrix signal algebra. The matrix signal algebra is a mathematics environment composed of a signal space, a finite dimensional linear operators and special matrices where algebraic methods are used to generate these signal transforms as computational estimators. The matrix signal...

متن کامل

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

2015

Xiangru Lian Yijun Huang Yuncheng Li Ji Liu

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provi...

متن کامل

A scalable RBF-FD method for atmospheric flow

Journal: :J. Comput. Physics 2015

Martin Tillenius Elisabeth Larsson Erik Lehto Natasha Flyer

Radial basis function-generated finite difference (RBF–FD) methods have recently been proposed as very interesting for global scale geophysical simulations, and have been shown to outperform established pseudo-spectral and discontinuous Galerkin methods for shallow water test problems. In order to be competitive for very large scale simulations, the RBF–FD methods needs to be efficiently implem...

متن کامل

Providing GPU Capability to LU and QR within the ScaLAPACK Framework

2012

Peng Du Stanimire Tomov Jack Dongarra

In the field of dense linear matrix computations on distributed memory systems, ScaLAPACK has established its importance over the years with its high performance and scalability. Since the introduction of CUDA based GPGPU computing in 2008, methods to efficiently use such computing power on distributed memory systems equipped with multicore CPUs, have attracted much attention. In this work we i...

متن کامل

High Performance Convolutional Neural Networks for Document Processing

2006

Kumar Chellapilla Sidd Puri Patrice Simard

Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic...

متن کامل

SIMD-Based Implementations of Sieving in Integer-Factoring Algorithms

2013

Binanda Sengupta Abhijit Das

The best known integer-factoring algorithms consist of two stages: the sieving stage and the linear-algebra stage. Efficient parallel implementations of both these stages have been reported in the literature. All these implementations are based on multi-core or distributed parallelization. In this paper, we experimentally demonstrate that SIMD instructions available in many modern processors ca...

متن کامل

Performance evaluation of load distribution strategies in parallel branch and bound computations

1995

Cheng-Zhong Xu Stefan Tschöke Burkhard Monien

Load distribution is essential for eecient use of processors in parallel branch-and-bound computations because the computation generates and consumes non-uniform subproblems at runtime. This paper presents six decentralized load distribution strategies. They are incorporated in a runtime support system, and evaluated in the solution of set partitioning problems on two parallel computer systems....

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید