The search cost of neural architecture (NAS) has been largely reduced by differentiable and weight-sharing methods. Such methods optimize a super-network with all possible edges operations, determine the optimal sub-network discretization, i.e., pruning off operations/edges small weights. However, discretization process performed on either operations or incurs significant inaccuracy thus qualit...