CENTER FOR STATISTICAL RESEARCH & METHODOLOGY SEMINAR
Research & Methodology Directorate
Multiple Imputation of Missing Categorical and Continuous Values
via Bayesian Mixture Models with Local Dependence
Carnegie Mellon University
November 19, 2015
11:00 am – Noon
Seminar Room 5K410 (Red Area)
We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, developed as a flexible engine for multiple imputation of missing values. Taking a Bayesian nonparametric approach allows the model to grow in complexity with sample size, allowing it to capture complex and potentially unanticipated features in the data. The prior is carefully constructed to allow for many forms of dependence among the observed variables while remaining computationally tractable.
We apply the model to impute missing values due to item nonresponse in an evaluation of the redesign of the Survey of Income and Program Participation (SIPP). The goal is to compare estimates from a field test with the new design to estimates from a comparable sample under the old design. We show that accounting for item nonresponse changes some conclusions about the comparability of the distributions in the two datasets. We also perform an extensive repeated sampling simulation using similar data from complete cases in an existing SIPP panel, comparing our proposed model to a default application of multiple imputation by chained equations. Imputations based on the proposed model tend to have better repeated sampling properties than the default application of chained equations in this realistic setting.