Batch normalization (BN) is comprised of a component followed by an affine transformation and has become essential for training deep neural networks. Standard initialization each BN in network sets the scale shift to 1 0, respectively. However, after we have observed that these parameters do not alter much from their initialization. Furthermore, noticed process can still yield overly large valu...