Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks

نویسندگان

چکیده

Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep is a challenging task. Many alternatives proposed place of end-to-end back-propagation. Layer-wise one them, which trains single layer at time, rather than the whole layers simultaneously. In this paper, we study layer-wise using block coordinate gradient descent (BCGD) for linear networks. We establish general convergence analysis BCGD found optimal rate, results fastest decrease loss. identify effects depth, width, initialization. When orthogonal-like initialization employed, show that width intermediate plays no role gradient-based beyond certain threshold. Besides, use could drastically accelerate when it compared to those depth 1 network, even computational cost considered. Numerical examples are provided justify our theoretical findings demonstrate performance by BCGD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Greedy Layer-Wise Training of Deep Networks

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until re...

متن کامل

Layer-wise training of deep networks using kernel similarity

Deep learning has shown promising results in many machine learning applications. The hierarchical feature representation built by deep networks enable compact and precise encoding of the data. A kernel analysis of the trained deep networks demonstrated that with deeper layers, more simple and more accurate data representations are obtained. In this paper, we propose an approach for layer-wise t...

متن کامل

Layer-wise training of deep generative models

When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret autoencoders in this setting as generative models, by showing tha...

متن کامل

‏‎a phonological contrastive analysis of kurdish and english‎‏

deposite the different criticisms on contrastive analysis it has been proved that the results of it(when processed)can be usuful in a tefl environment,specially at the level of phonology.this study is an attempt to compare and contrast the sound systems of kurdish and english for pedagogical aims. the consonants,vowels,stress and intonation of the twolanguages are described by the same model-ta...

15 صفحه اول

On layer-wise representations in deep neural networks

On Layer-Wise Representations in Deep Neural Networks It is well-known that deep neural networks are forming an efficient internal representation of the learning problem. However, it is unclear how this efficient representation is distributed layer-wise, and how it arises from learning. In this thesis, we develop a kernel-based analysis for deep networks that quantifies the representation at ea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Analysis and Applications

سال: 2021

ISSN: ['1793-6861', '0219-5305']

DOI: https://doi.org/10.1142/s0219530521500263