Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks
نویسندگان
چکیده
Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep is a challenging task. Many alternatives proposed place of end-to-end back-propagation. Layer-wise one them, which trains single layer at time, rather than the whole layers simultaneously. In this paper, we study layer-wise using block coordinate gradient descent (BCGD) for linear networks. We establish general convergence analysis BCGD found optimal rate, results fastest decrease loss. identify effects depth, width, initialization. When orthogonal-like initialization employed, show that width intermediate plays no role gradient-based beyond certain threshold. Besides, use could drastically accelerate when it compared to those depth 1 network, even computational cost considered. Numerical examples are provided justify our theoretical findings demonstrate performance by BCGD.
منابع مشابه
Greedy Layer-Wise Training of Deep Networks
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until re...
متن کاملLayer-wise training of deep networks using kernel similarity
Deep learning has shown promising results in many machine learning applications. The hierarchical feature representation built by deep networks enable compact and precise encoding of the data. A kernel analysis of the trained deep networks demonstrated that with deeper layers, more simple and more accurate data representations are obtained. In this paper, we propose an approach for layer-wise t...
متن کاملLayer-wise training of deep generative models
When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret autoencoders in this setting as generative models, by showing tha...
متن کاملa phonological contrastive analysis of kurdish and english
deposite the different criticisms on contrastive analysis it has been proved that the results of it(when processed)can be usuful in a tefl environment,specially at the level of phonology.this study is an attempt to compare and contrast the sound systems of kurdish and english for pedagogical aims. the consonants,vowels,stress and intonation of the twolanguages are described by the same model-ta...
15 صفحه اولOn layer-wise representations in deep neural networks
On Layer-Wise Representations in Deep Neural Networks It is well-known that deep neural networks are forming an efficient internal representation of the learning problem. However, it is unclear how this efficient representation is distributed layer-wise, and how it arises from learning. In this thesis, we develop a kernel-based analysis for deep networks that quantifies the representation at ea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Analysis and Applications
سال: 2021
ISSN: ['1793-6861', '0219-5305']
DOI: https://doi.org/10.1142/s0219530521500263