Data Anonymization: Balancing Data Confidentiality and Data Quality
Instructor: Lawrence Cox, NISS
Course Description: The raw material of many statistical data products is confidential information pertaining to individual persons or entities. Ethical statistical practice, often codified in national laws, requires that release of the data product not reveal this information to unauthorized parties. This is data confidentiality (also called data anonymization ). Many data confidentiality methods, developed over recent decades, will be summarized and emphasized in this course. These include: rounding, perturbation, suppression, tabular adjustment, post-randomization, swapping, and data synthesis. Data confidentiality methods abbreviate, distort or replace original data, with consequent effects on data quality and usability . The second emphasis of this course will be on identifying and assessing these quality effects. The third emphasis will be on identifying or modifying data confidentiality methods to achieve an acceptable confidentiality-quality balance in the released data product.