TY - JOUR T1 - Sequential identification of nonignorable missing data mechanisms JF - Statistica Sinica Y1 - Submitted A1 - Mauricio Sadinle A1 - Jerome P. Reiter KW - Identification KW - Missing not at random KW - Non-parametric saturated KW - Partial ignorability KW - Sensitivity analysis AB - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated. ER - TY - JOUR T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching JF - Annals of Economics and Statistics Y1 - 2018 A1 - John M. Abowd A1 - Francis Kramarz A1 - Sebastien Perez-Duarte A1 - Ian M. Schmutte ER - TY - ABST T1 - Sequential Prediction of Respondent Behaviors Leading to Error in Web-based Surveys Y1 - 2017 A1 - Eck, Adam A1 - Soh, Leen-Kiat ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2017 A1 - John M. Abowd A1 - Francis Kramarz A1 - Sebastien Perez-Duarte A1 - Ian M. Schmutte AB - We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Labor Dynamics Institute UR - http://digitalcommons.ilr.cornell.edu/ldi/40/ ER - TY - JOUR T1 - Stop or continue data collection: A nonignorable missing data approach for continuous variables JF - Journal of Official Statistics Y1 - 2017 A1 - T. Paiva A1 - J. P. Reiter AB - We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents' data, and adjust the probabilities of the mixture components to generate nonrespondents' distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures. ER - TY - JOUR T1 - Simultaneous edit-imputation and disclosure limitation for business establishment data JF - Journal of Applied Statistics Y1 - 2016 A1 - H. J. Kim A1 - J. P. Reiter A1 - A. F. Karr AB - Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks. ER - TY - JOUR T1 - Spatial Variation in the Quality of American Community Survey Estimates JF - Demography Y1 - 2016 A1 - Folch, David C. A1 - Arribas-Bel, Daniel A1 - Koschinsky, Julia A1 - Spielman, Seth E. VL - 53 ER - TY - JOUR T1 - Synthetic establishment microdata around the world JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Vilhuber, Lars A1 - Abowd, John M. A1 - Reiter, Jerome P. KW - Business data KW - confidentiality KW - differential privacy KW - international comparison KW - Multiple imputation KW - synthetic AB - In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business microdata is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic \emph{establishment} microdata. This overview situates those papers, published in this issue, within the broader literature. VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji964 IS - 1 ER - TY - JOUR T1 - The SAR model for very large datasets: A reduced-rank approach JF - Econometrics Y1 - 2015 A1 - Burden, S. A1 - Cressie, N. A1 - Steel, D.G. VL - 3 UR - http://www.mdpi.com/2225-1146/3/2/317 IS - 2 ER - TY - JOUR T1 - Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples JF - Political Analysis Y1 - 2015 A1 - Y. Si A1 - J.P. Reiter A1 - D.S. Hillygus VL - 23 UR - http://pan.oxfordjournals.org/cgi/reprint/mpu009?%20ijkey=joX8eSl6gyIlQKP&keytype=ref ER - TY - JOUR T1 - Simultaneous Edit-Imputation for Continuous Microdata JF - Journal of the American Statistical Association Y1 - 2015 A1 - Kim, H. J. A1 - Cox, L. H. A1 - Karr, A. F. A1 - Reiter, J. P. A1 - Wang, Q. VL - 110 UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881 ER - TY - JOUR T1 - Small Area Estimation via Multivariate Fay-Herriot Models With Latent Spatial Dependence JF - Australian & New Zealand Journal of Statistics Y1 - 2015 A1 - Porter, A.T. A1 - Wikle, C.K. A1 - Holan, S.H. VL - 57 UR - http://arxiv.org/abs/1310.7211 ER - TY - JOUR T1 - Spatio-temporal change of support with application to American Community Survey multi-year period estimates JF - Stat Y1 - 2015 A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. A1 - Holan, Scott H. KW - Bayesian KW - change-of-support KW - dynamical KW - hierarchical models KW - mixed-effects model KW - Moran's I KW - multi-year period estimate AB - We present hierarchical Bayesian methodology to perform spatio-temporal change of support (COS) for survey data with Gaussian sampling errors. This methodology is motivated by the American Community Survey (ACS), which is an ongoing survey administered by the US Census Bureau that provides timely information on several key demographic variables. The ACS has published 1-year, 3-year, and 5-year period estimates, and margins of errors, for demographic and socio-economic variables recorded over predefined geographies. The spatio-temporal COS methodology considered here provides data users with a way to estimate ACS variables on customized geographies and time periods while accounting for sampling errors. Additionally, 3-year ACS period estimates are to be discontinued, and this methodology can provide predictions of ACS variables for 3-year periods given the available period estimates. The methodology is based on a spatio-temporal mixed-effects model with a low-dimensional spatio-temporal basis function representation, which provides multi-resolution estimates through basis function aggregation in space and time. This methodology includes a novel parameterization that uses a target dynamical process and recently proposed parsimonious Moran's I propagator structures. Our approach is demonstrated through two applications using public-use ACS estimates and is shown to produce good predictions on a hold-out set of 3-year period estimates. Copyright © 2015 John Wiley & Sons, Ltd. VL - 4 UR - http://dx.doi.org/10.1002/sta4.94 ER - TY - JOUR T1 - Statistical Disclosure Limitation in the Presence of Edit Rules JF - Journal of Official Statistics Y1 - 2015 A1 - Kim, H.J. A1 - Karr, A.F. A1 - Reiter, J.P. VL - 31 ER - TY - JOUR T1 - A stochastic bioenergetics model based approach to translating large river flow and temperature in to fish population responses: the pallid sturgeon example JF - Geological Society Y1 - 2015 A1 - Wildhaber, M.L. A1 - Dey, R. A1 - Wikle, C.K. A1 - Anderson, C.J. A1 - Moran, E.H. A1 - Franz, K.J. VL - 408 ER - TY - JOUR T1 - Stop or continue data collection: A nonignorable missing data approach for continuous variables JF - ArXiv Y1 - 2015 A1 - T. Paiva A1 - J.P. Reiter KW - Methodology AB - We present an approach to inform decisions about nonresponse followup sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents' data, and adjust the probabilities of the mixture components to generate nonrespondents' distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures. UR - http://arxiv.org/abs/1511.02189 IS - 1511.02189 ER - TY - JOUR T1 - Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach JF - Annals of the Association of American Geographers Y1 - 2015 A1 - Seth E. Spielman A1 - Alex Singleton AB - In 2010 the American Community Survey (ACS) replaced the long form of the decennial census as the sole national source of demographic and economic data for small geographic areas such as census tracts. These small area estimates suffer from large margins of error, however, which makes the data difficult to use for many purposes. The value of a large and comprehensive survey like the ACS is that it provides a richly detailed, multivariate, composite picture of small areas. This article argues that one solution to the problem of large margins of error in the ACS is to shift from a variable-based mode of inquiry to one that emphasizes a composite multivariate picture of census tracts. Because the margin of error in a single ACS estimate, like household income, is assumed to be a symmetrically distributed random variable, positive and negative errors are equally likely. Because the variable-specific estimates are largely independent from each other, when looking at a large collection of variables these random errors average to zero. This means that although single variables can be methodologically problematic at the census tract scale, a large collection of such variables provides utility as a contextual descriptor of the place(s) under investigation. This idea is demonstrated by developing a geodemographic typology of all U.S. census tracts. The typology is firmly rooted in the social scientific literature and is organized around a framework of concepts, domains, and measures. The typology is validated using public domain data from the City of Chicago and the U.S. Federal Election Commission. The typology, as well as the data and methods used to create it, is open source and published freely online. VL - 105 UR - http://dx.doi.org/10.1080/00045608.2015.1052335 ER - TY - CONF T1 - Survey Informatics: The Future of Survey Methodology and Survey Statistics Training in the Academy? T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Allan L. McCutcheon JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Synthetic Establishment Microdata Around the World Y1 - 2015 A1 - Vilhuber, Lars A1 - Abowd, John A. A1 - Reiter, Jerome P. AB - Synthetic Establishment Microdata Around the World Vilhuber, Lars; Abowd, John A.; Reiter, Jerome P. In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business micro data is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic establishment microdata. This overview situates those papers, published in this issue, within the broader literature. PB - Cornell University UR - http://hdl.handle.net/1813/42340 ER - TY - JOUR T1 - Savings from ages 16 to 35: A test to inform Child Development Account policy JF - Poverty & Public Policy Y1 - 2014 A1 - Friedline, T. A1 - Nam, I. VL - 6 UR - http://onlinelibrary.wiley.com/store/10.1002/pop4.59/asset/pop459.pdf IS - 1 ER - TY - JOUR T1 - Seeing the Non-Stars: (Some) Sources of Bias in Past Disambiguation Approaches and a New Public Tool Leveraging Labeled Records JF - Research Policy Y1 - 2014 A1 - Ventura, S. A1 - Nugent, R. A1 - Fuchs, E. N1 - Selected for Special Issue on Big Data ER - TY - ABST T1 - SIPP: From Conventional Questionnaire to Event History Calendar Interviewing Y1 - 2014 A1 - Belli, R.F. N1 - Workshop on ìConducting Research using the Survey of Income and Program Participation (SIPP). Presented at Duke University, Social Science Research Institute, Durham, NC ER - TY - CONF T1 - SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication T2 - AISTATS 2014 Proceedings, JMLR Y1 - 2014 A1 - Steorts, R. A1 - Hall, R. A1 - Fienberg, S. E. JF - AISTATS 2014 Proceedings, JMLR PB - W& CP VL - 33 ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2014 A1 - Abowd, John M. A1 - Kramarz, Francis A1 - Perez-Duarte, Sebastien A1 - Schmutte, Ian M. AB - Sorting Between and Within Industries: A Testable Model of Assortative Matching Abowd, John M.; Kramarz, Francis; Perez-Duarte, Sebastien; Schmutte, Ian M. We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Cornell University UR - http://hdl.handle.net/1813/52607 ER - TY - JOUR T1 - Spatial Collective Intelligence? Accuracy, Credibility in Crowdsourced Data JF - Cartography and Geographic Information Science Y1 - 2014 A1 - Spielman, S. E. VL - 41 UR - http://go.galegroup.com/ps/i.do?action=interpret&id=GALE|A361943563&v=2.1&u=nysl_sc_cornl&it=r&p=AONE&sw=w&authCount=1 IS - 2 ER - TY - JOUR T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates JF - Spatial Statistics Y1 - 2014 A1 - Porter, A. T., A1 - Holan, S.H., A1 - Wikle, C.K., A1 - Cressie, N. VL - 10 UR - http://arxiv.org/pdf/1303.6668v3.pdf ER - TY - ABST T1 - Spatial Fay-Herriot Models for Small Area Estimation With Functional Covariates Y1 - 2014 A1 - Holan, S.H. ER - TY - CONF T1 - Spiny CACTOS: OSN Users Attitudes and Perceptions Towards Cryptographic Access Control Tools T2 - Proceedings of the Workshop on Usable Security (USEC) Y1 - 2014 A1 - Balsa, E., A1 - Brandimarte, L., A1 - Acquisti, A., A1 - Diaz, C., A1 - Gürses, S. JF - Proceedings of the Workshop on Usable Security (USEC) UR - https://www.internetsociety.org/doc/spiny-cactos-osn-users-attitudes-and-perceptions-towards-cryptographic-access-control-tools ER - TY - CONF T1 - Supporting Planners' Work with Uncertain Demographic Data T2 - GIScience Workshop on Uncertainty Visualization Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Nagle, N. N. A1 - Folch, D. C. JF - GIScience Workshop on Uncertainty Visualization VL - 23 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf. ER - TY - CONF T1 - Supporting Planners' work with Uncertain Demographic Data T2 - Proceedings of IEEE VIS 2014 Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Nagle, N. N. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Folch, D. C. JF - Proceedings of IEEE VIS 2014 PB - Proceedings of IEEE VIS 2014 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf ER - TY - CONF T1 - Survey Fusion for Data that Exhibit Multivariate, Spatio-Temporal Dependencies T2 - Joint Statistical Meetings 2014 Y1 - 2014 A1 - Bradley, J.R. JF - Joint Statistical Meetings 2014 ER - TY - CONF T1 - Survey Informatics: Ideas, Opportunities, and Discussions T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Eck, A. A1 - Soh, L-K JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - ABST T1 - A Survey of Contemporary Spatial Models for Small Area Estimation Y1 - 2014 A1 - Porter, A.T. ER - TY - JOUR T1 - SynLBD 2.0: Improving the Synthetic Longitudinal Business Database JF - Statistical Journal of the International Association for Official Statistics Y1 - 2014 A1 - S. K. Kinney A1 - J. P. Reiter A1 - J. Miranda VL - 30 ER - TY - CONF T1 - Sleights of Privacy: Framing, Disclosures, and the Limits of Transparency T2 - Proceedings of the Ninth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2013 A1 - Adjerid, I. A1 - Acquisti, A. A1 - Loewenstein, G. JF - Proceedings of the Ninth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY ER - TY - ABST T1 - Some Historical Remarks on Spatial Statistics, Spatio-Temporal Statistics Y1 - 2013 A1 - Cressie, N. JF - Reading Group, University of Missouri ER - TY - THES T1 - Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables (Ph.D. Thesis) T2 - Department of Statistical Science Y1 - 2013 A1 - Jared S. Murray AB - This thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm. The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science. The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks. The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index. JF - Department of Statistical Science PB - Duke University UR - http://dukespace.lib.duke.edu/dspace/handle/10161/8253 ER - TY - ABST T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates Y1 - 2013 A1 - Porter, A.T. ER - TY - CHAP T1 - Spatio-temporal Design: Advances in Efficient Data Acquisition T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2013 A1 - Holan, S. A1 - Wikle, C. ED - Jorge Mateu ED - Werner Muller KW - semiparametric dynamic design for non-Gaussian spatio-temporal data JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley SN - 9780470974292 ER - TY - ABST T1 - Statistics and the Environment: Overview and Challenges Y1 - 2013 A1 - Wikle, C.K. N1 - Invited Introductory Overview Lecture ER - TY - ABST T1 - Statistics for Spatio-Temporal Data Y1 - 2013 A1 - Cressie, N. JF - Invited One-Day Short Course at the U.S. Census Bureau ER - TY - CHAP T1 - Semiparametric Dynamic Design of Monitoring Networks for Non-Gaussian Spatio-Temporal Data T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2012 A1 - Holan, S. A1 - Wikle, C.K. ED - Jorge Mateu ED - Werner Muller JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley CY - Chichester, UK UR - http://onlinelibrary.wiley.com/doi/10.1002/9781118441862.ch12/summary ER - TY - CONF T1 - Sleight of Privacy T2 - Conference on Web Privacy Measurement Y1 - 2012 A1 - Idris Adjerid A1 - Alessandro Acquisti A1 - Laura Brandimarte JF - Conference on Web Privacy Measurement ER - TY - THES T1 - Smooth Post-Stratification in Multiple Capture Recapture Y1 - 2012 A1 - Zachary Kurtz PB - Carnegie Mellon University N1 - Department of Statistics ER - TY - ABST T1 - Spatio-Temporal Statistics at Mizzou, Truman School of Public Affairs Y1 - 2012 A1 - Wikle, C.K. ER - TY - CONF T1 - Statistics in Service to the Nation T2 - Presentation Samuel S. Wilks Lecture Y1 - 2012 A1 - Stephen E. Fienberg JF - Presentation Samuel S. Wilks Lecture CY - Princeton, NJ N1 - April 23, 2012 ER - TY - JOUR T1 - Secure multiparty linear regression based on homomorphic encryption JF - Journal of Official Statistics Y1 - 2011 A1 - Robert Hall A1 - Stephen E. Fienberg A1 - Yuval Nardi VL - 27 ER -