نتایج جستجو برای: linear speedup

تعداد نتایج: 490347  

2007
Tom Bulatewicz Daniel Andresen Stephen Welch Wei Jin Sanjoy Das Matthew Miller

Advancements in data collection and high performance computing are making sophisticated model calibration possible throughout the modeling and simulation community. The model calibration process, in which the appropriate input values are estimated for unknown parameters, is typically a computationally intensive task and necessitates the use of distributed software components. These components a...

2009
Xiao-Tong Yuan Bao-Gang Hu Ran He

Mean-Shift (MS) is a powerful non-parametric clustering method. Although good accuracy can be achieved, its computational cost is particularly expensive even on moderate data sets. In this paper, for the purpose of algorithm speedup, we develop an agglomerative MS clustering method called Agglo-MS, along with its mode-seeking ability and convergence property analysis. Our method is built upon a...

2007
Kshitij Sudan Nipun Saggar Asok De

Serial algorithms to evaluate the Gabor transform of a discrete signal are bound by the length of signal for which the transform can be evaluated. The time taken, if machine memory and other factors are ignored, grows as order O(N) making it unsuitable for transforms of large signal lengths. In this paper, we present a parallel algorithm to generate the computationally intensive Gabor transform...

2014
Pablo Soto-Quiros

This paper presents a mathematical framework for a family of discrete-time discrete-frequency transforms in terms of matrix signal algebra. The matrix signal algebra is a mathematics environment composed of a signal space, a finite dimensional linear operators and special matrices where algebraic methods are used to generate these signal transforms as computational estimators. The matrix signal...

2015
Xiangru Lian Yijun Huang Yuncheng Li Ji Liu

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provi...

Journal: :J. Comput. Physics 2015
Martin Tillenius Elisabeth Larsson Erik Lehto Natasha Flyer

Radial basis function-generated finite difference (RBF–FD) methods have recently been proposed as very interesting for global scale geophysical simulations, and have been shown to outperform established pseudo-spectral and discontinuous Galerkin methods for shallow water test problems. In order to be competitive for very large scale simulations, the RBF–FD methods needs to be efficiently implem...

2012
Peng Du Stanimire Tomov Jack Dongarra

In the field of dense linear matrix computations on distributed memory systems, ScaLAPACK has established its importance over the years with its high performance and scalability. Since the introduction of CUDA based GPGPU computing in 2008, methods to efficiently use such computing power on distributed memory systems equipped with multicore CPUs, have attracted much attention. In this work we i...

2006
Kumar Chellapilla Sidd Puri Patrice Simard

Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic...

2013
Binanda Sengupta Abhijit Das

The best known integer-factoring algorithms consist of two stages: the sieving stage and the linear-algebra stage. Efficient parallel implementations of both these stages have been reported in the literature. All these implementations are based on multi-core or distributed parallelization. In this paper, we experimentally demonstrate that SIMD instructions available in many modern processors ca...

1995
Cheng-Zhong Xu Stefan Tschöke Burkhard Monien

Load distribution is essential for eecient use of processors in parallel branch-and-bound computations because the computation generates and consumes non-uniform subproblems at runtime. This paper presents six decentralized load distribution strategies. They are incorporated in a runtime support system, and evaluated in the solution of set partitioning problems on two parallel computer systems....

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید