TY - JOUR T1 - Sequential identification of nonignorable missing data mechanisms JF - Statistica Sinica Y1 - Submitted A1 - Mauricio Sadinle A1 - Jerome P. Reiter KW - Identification KW - Missing not at random KW - Non-parametric saturated KW - Partial ignorability KW - Sensitivity analysis AB - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated. ER - TY - JOUR T1 - Bayesian estimation of bipartite matchings for record linkage JF - Journal of the American Statistical Association Y1 - 2017 A1 - Mauricio Sadinle KW - Assignment problem KW - Bayes estimate KW - Data matching KW - Fellegi-Sunter decision rule KW - Mixture model KW - Rejection option AB - The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador. VL - 112 UR - http://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1148612 IS - 518 ER - TY - RPRT T1 - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Records Systems Y1 - 2012 A1 - Mauricio Sadinle A1 - Stephen E. Fienberg JF - arXiv UR - https://arxiv.org/abs/1205.3217 ER - TY - CONF T1 - Logit-Based Confidence Intervals for Single Capture-Recapture Estimation T2 - American Statistical Association Pittsburgh Chapter Banquet Y1 - 2012 A1 - Mauricio Sadinle JF - American Statistical Association Pittsburgh Chapter Banquet CY - Pittsburgh, PA N1 - April 9, 2012 ER - TY - CONF T1 - MulFiles Record Linkage Using a Generalized Fellegi-Sunter Framework T2 - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University Y1 - 2012 A1 - Mauricio Sadinle JF - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University ER - TY - JOUR T1 - Bayesian estimation of bipartite matchings for record linkage JF - Journal of the American Statistical Association Y1 - 0 A1 - Mauricio Sadinle AB - The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador. ER -