Existing neural machine translation (NMT) systems utilize sequence-to-sequence networks to generate target word by word, and then make the generated at each time-step counterpart in references as consistent possible. However, trained model tends focus on ensuring accuracy of current does not consider its future cost which means expected generating subsequent (i.e., next word). To respond this i...