Abstract The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution solutions, small tend to incorporate sampling error. This paper analyzes how size GP affects error and contributes answering question populations. First, we present probabilistic model expect...