NPBayesImpute: Non-parametric Bayesian Multiple Imputation for Categorical Data

These R routines create multiple imputations of missing at random categorical data, with or without structural zeros. Imputations are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling. 

Many datasets comprise exclusively categorical variables that suffer from missing data.  When the number of variables is large, it can be challenging to specify models for use in multiple imputation (MI) of missing data.  One approach is to use Bayesian latent class models for MI.  In a series of papers, we showed that these models can capture complex dependencies and hence serve as effective MI engines.  This R software package implements MI via latent class models when the categorical data include structural zeros (i.e., some combinations have zero probability).  The package also includes an option for MI in categorical data without structural zeros.  The package is available on CRAN.