model for a here is t o t o m m m m N Pr Pr log 0.154 Pr Pr log 1.17 log + ≈ + , where N is the number of data points. Clearly it will take very little time for this number to reach 3, regardless of how the model priors compare. The more times the learner must use its grammar to encode speech, the less probable that particular data set is. As we have seen, this type of result falls out under th...