TY  - JOUR
T1  - The role of statistical disclosure limitation in total survey error
JF  - Total Survey Error in Practice
Y1  - 2017
A1  - A. F. Karr
KW  - big data issues
KW  - data quality
KW  - data swapping
KW  - decision quality
KW  - risk-utility paradigms
KW  - Statistical Disclosure Limitation
KW  - total survey error
AB  - This chapter presents the thesis, which is statistical disclosure limitation (SDL) that ought to be viewed as an integral component of total survey error (TSE). TSE and SDL will move forward together, but integrating multiple criteria: cost, risk, data quality, and decision quality. The chapter explores the value of unifying two key TSE procedures - editing and imputation - with SDL. It discusses “Big data” issues, which contains a mathematical formulation that, at least conceptually and at some point in the future, does unify TSE and SDL. Modern approaches to SDL are based explicitly or implicitly on tradeoffs between disclosure risk and data utility. There are three principal classes of SDL methods: reduction/coarsening techniques; perturbative methods; and synthetic data methods. Data swapping is among the most frequently applied SDL methods for categorical data. The chapter sketches how it can be informed by knowledge of TSE.
ER  - 

TY  - JOUR
T1  - Simultaneous edit-imputation and disclosure limitation for business establishment data
JF  - Journal of Applied Statistics
Y1  - 2016
A1  - H. J. Kim
A1  - J. P. Reiter
A1  - A. F. Karr
AB  - Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks.
ER  - 

TY  - CHAP
T1  - Analytical frameworks for data release: A statistical view
T2  - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches
Y1  - 2014
A1  - A. F. Karr
A1  - J. P. Reiter
JF  - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches
PB  - Cambridge University Press
CY  - New York City, NY
ER  - 

TY  - JOUR
T1  - Why data availability is such a hard problem
JF  - Statistical Journal of the International Association for Official Statistics
Y1  - 2014
A1  - A. F. Karr
KW  - Data Archive
KW  - Data availability
KW  - public good
KW  - replicability
KW  - reproducibility
AB  - If data availability were a simple problem, it would already have been resolved. In this paper, I argue that by viewing data availability as a public good, it is possible to both understand the complexities with which it is fraught and identify a path to a solution.
VL  - 30
IS  - 2
ER  -