Abstract Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes common in many application scenarios, including clinical and health records, census survey data. Distance-based methods, particular, have limited applicability to since they do not capture the complexity of relationships among different values a attribute. Althoug...