moDNN: Memory Optimal Deep Neural Network Training on Graphics Processing Units
نویسندگان
چکیده
منابع مشابه
Cofactorization on Graphics Processing Units
We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those ...
متن کاملRapid Training of Acoustic Models using Graphics Processing Units
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data. Even with a large cluster of machines the entire training process can take many weeks. To overcome this development bottleneck we propose a new framework for rapid training of acoustic mo...
متن کاملBiological Sequence Alignment on Graphics Processing Units
Sequence alignment is a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the rapid growth rate of biological sequence databases. In this paper we present a new approach to high performance biological sequence database scanning on graphics processing units. Using modern graphics processing units for high performance computing is facilitated ...
متن کاملSystolic neighborhood search on graphics processing units
In this paper, we propose a parallel processing model based on systolic computing merged with concepts of evolutionary algorithms. The proposed model works over a Graphics Processing Unit using the structure of threads as cells that form a systolic mesh. Data passes through those cells, each one performing a simple computing operation. The systolic algorithm is implemented using NVIDIA’s comput...
متن کاملParallel Genetic Programming on Graphics Processing Units
In program inference, the evaluation of how well a candidate solution solves a certain task is usually a computationally intensive procedure. Most of the time, the evaluation involves either submitting the program to a simulation process or testing its behavior on many input arguments; both situations may turn out to be very time-consuming. Things get worse when the optimization algorithm needs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2019
ISSN: 1045-9219,1558-2183,2161-9883
DOI: 10.1109/tpds.2018.2866582