Abstract Direct decoding for task-oriented dialogue is known to suffer from the explaining-away effect, manifested in models that prefer short and generic responses. Here we argue use of Bayes’ theorem factorize task into two models, distribution context given response, prior response itself. This approach, an instantiation noisy channel model, both mitigates effect allows principled incorporat...