TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial. A complete archive of the data and programs used in this paper is available via http://doi.org/10.5281/zenodo.345385. PB - Cornell University UR - http://hdl.handle.net/1813/39081 ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52612 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study Y1 - 2016 A1 - Mccue, Kristin A1 - Abowd, John A1 - Levenstein, Margaret A1 - Patki, Dhiren A1 - Rodgers, Ann A1 - Shapiro, Matthew A1 - Wasi, Nada AB - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Michigan UR - http://hdl.handle.net/1813/43895 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 4 Y1 - 2016 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB -

NCRN Newsletter: Volume 2 - Issue 4 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from September 2015 through December 2015. NCRN Newsletter Vol. 2, Issue 4: January 28, 2016.

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/42394 ER - TY - RPRT T1 - NCRN Newsletter: Volume 3 - Issue 1 Y1 - 2016 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 3 - Issue 1 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from January 2016 through May 2016. NCRN Newsletter Vol. 3, Issue 1: June 10, 2016 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/44199 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 1 Y1 - 2015 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 2 - Issue 1 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from October 2014 to January 2015. NCRN Newsletter Vol. 2, Issue 1: January 30, 2015. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40193 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 2 Y1 - 2015 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 2 - Issue 2 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from January 2015 to May 2015. NCRN Newsletter Vol. 2, Issue 2: May 12, 2015. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40194 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 2 Y1 - 2015 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 2 - Issue 2 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from February 2015 to May 2015. NCRN Newsletter Vol. 2, Issue 2: May 12, 2015. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/44200 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 3 Y1 - 2015 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB -

NCRN Newsletter: Volume 2 - Issue 3 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from June 2015 through August 2015. NCRN Newsletter Vol. 2, Issue 3: September 15, 2015.

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/42393 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 2 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 2 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from November 2013 to March 2014. NCRN Newsletter Vol. 1, Issue 2: March 20, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40233 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 3 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 3 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from March 2014 to July 2014. NCRN Newsletter Vol. 1, Issue 3: July 23, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40234 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 4 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 4 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from July 2014 to October 2014. NCRN Newsletter Vol. 1, Issue 4: October 15, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40192 ER - TY - RPRT T1 - Encoding Provenance of Social Science Data: Integrating PROV with DDI Y1 - 2013 A1 - Lagoze, Carl A1 - Block, William C A1 - Williams, Jeremy A1 - Abowd, John A1 - Vilhuber, Lars AB - Encoding Provenance of Social Science Data: Integrating PROV with DDI Lagoze, Carl; Block, William C; Williams, Jeremy; Abowd, John; Vilhuber, Lars Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. Submitted to EDDI13 5th Annual European DDI User Conference December 2013, Paris, France PB - Cornell University UR - http://hdl.handle.net/1813/34443 ER - TY - RPRT T1 - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Y1 - 2013 A1 - Vilhuber, Lars A1 - Abowd, John A1 - Block, William A1 - Lagoze, Carl A1 - Williams, Jeremy AB - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Vilhuber, Lars; Abowd, John; Block, William; Lagoze, Carl; Williams, Jeremy Social science researchers are increasingly interested in making use of confidential micro-data that contains linkages to the identities of people, corporations, etc. The value of this linking lies in the potential to join these identifiable entities with external data such as genome data, geospatial information, and the like. Leveraging these linkages is an essential aspect of “big data” scholarship. However, the utility of these confidential data for scholarship is compromised by the complex nature of their management and curation. This makes it difficult to fulfill US federal data management mandates and interferes with basic scholarly practices such as validation and reuse of existing results. We describe in this paper our work on the CED2AR prototype, a first step in providing researchers with a tool that spans the confidential/publicly-accessible divide, making it possible for researchers to identify, search, access, and cite those data. The particular points of interest in our work are the cloaking of metadata fields and the expression of provenance chains. For the former, we make use of existing fields in the DDI (Data Description Initiative) specification and suggest some minor changes to the specification. For the latter problem, we investigate the integration of DDI with recent work by the W3C PROV working group that has developed a generalizable and extensible model for expressing data provenance. PB - Cornell University UR - http://hdl.handle.net/1813/34534 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 1 Y1 - 2013 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 1 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from July 2013 to November 2013. NCRN Newsletter Vol. 1, Issue 1: November 17, 2013 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40232 ER - TY - RPRT T1 - Presentation: Predicting Multiple Responses with Boosting and Trees Y1 - 2013 A1 - Li, Ping A1 - Abowd, John AB - Presentation: Predicting Multiple Responses with Boosting and Trees Li, Ping; Abowd, John Presentation by Ping Li and John Abowd at FCSM on November 4, 2013 PB - Cornell University UR - http://hdl.handle.net/1813/40255 ER - TY - RPRT T1 - Presentation: Revisiting the Economics of Privacy: Population Statistics and Privacy as Public Goods Y1 - 2012 A1 - Abowd, John AB - Presentation: Revisiting the Economics of Privacy: Population Statistics and Privacy as Public Goods Abowd, John Anonymization and data quality are intimately linked. Although this link has been properly acknowledged in the Computer Science and Statistical Disclosure Limitation literatures, economics offers a framework for formalizing the linkage and analyzing optimal decisions and equilibrium outcomes. The opinions expressed in this presentation are those of the author and neither the National Science Foundation nor the Census Bureau. PB - Cornell University UR - http://hdl.handle.net/1813/30937 ER -