TY - JOUR T1 - The role of statistical disclosure limitation in total survey error JF - Total Survey Error in Practice Y1 - 2017 A1 - A. F. Karr KW - big data issues KW - data quality KW - data swapping KW - decision quality KW - risk-utility paradigms KW - Statistical Disclosure Limitation KW - total survey error AB - This chapter presents the thesis, which is statistical disclosure limitation (SDL) that ought to be viewed as an integral component of total survey error (TSE). TSE and SDL will move forward together, but integrating multiple criteria: cost, risk, data quality, and decision quality. The chapter explores the value of unifying two key TSE procedures - editing and imputation - with SDL. It discusses “Big data” issues, which contains a mathematical formulation that, at least conceptually and at some point in the future, does unify TSE and SDL. Modern approaches to SDL are based explicitly or implicitly on tradeoffs between disclosure risk and data utility. There are three principal classes of SDL methods: reduction/coarsening techniques; perturbative methods; and synthetic data methods. Data swapping is among the most frequently applied SDL methods for categorical data. The chapter sketches how it can be informed by knowledge of TSE. ER - TY - JOUR T1 - Simultaneous edit-imputation and disclosure limitation for business establishment data JF - Journal of Applied Statistics Y1 - 2016 A1 - H. J. Kim A1 - J. P. Reiter A1 - A. F. Karr AB - Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks. ER - TY - CHAP T1 - Analytical frameworks for data release: A statistical view T2 - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches Y1 - 2014 A1 - A. F. Karr A1 - J. P. Reiter JF - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches PB - Cambridge University Press CY - New York City, NY ER - TY - JOUR T1 - Why data availability is such a hard problem JF - Statistical Journal of the International Association for Official Statistics Y1 - 2014 A1 - A. F. Karr KW - Data Archive KW - Data availability KW - public good KW - replicability KW - reproducibility AB - If data availability were a simple problem, it would already have been resolved. In this paper, I argue that by viewing data availability as a public good, it is possible to both understand the complexities with which it is fraught and identify a path to a solution. VL - 30 IS - 2 ER -