MixedDataImpute: Missing Data Imputation for Continuous and Categorical Data using Nonparametric Bayesian Joint Models

Many datasets include a mix of continuous and categorical variables with missing values. In a paper published in the Journal of the American Statistical Association, we developed a joint model for such mixed data that can be used for multiple imputation. The approach uses a nonparametric Bayesian mixture model as the imputation engine. The mixture model comprises one set of mixture components with multivariate normal kernels for the continuous variables, and a separate set of mixture components with products of independent multinomial kernels for the categorical variables. The model induces dependence between the continuous and categorical variables in two ways, namely (i) by allowing the means of the multivariate normal distributions to depend on the categorical variables, and (ii) by using a tensor factorization prior that links the two sets of membership components.