Existing image captioning models are usually trained by cross-entropy (XE) loss and reinforcement learning (RL), which set ground-truth words as hard targets force the model to learn from them. However, widely adopted training strategies may suffer misalignment in XE inappropriate reward assignment RL training. To tackle these problems, we introduce an attribute enhanced teacher that serves a b...