Last week, we saw how we could represent clustering with a probabilistic model. In this model, called a Gaussian mixture model, we model each datapoint x i as originating from some cluster, with a corresponding cluster label y i distributed according to p(y), and the corresponding distribution for that cluster given by a multivariate Gaussian: p(x|y = k) =