%0 Journal Article %D Submitted %T A framework for sharing confidential research data, applied to investigating differential pay by race in the U. S. government %A Barrientos, A. F. %A Bolton, A. %A Balmat, T. %A Reiter, J. P. %A Machanavajjhala, A. %A Chen, Y. %A Kneifel, C. %A DeLong, M. %A de Figueiredo, J. M. %X Data stewards seeking to provide access to large-scale social science data face a difficult challenge. They have to share data in ways that protect privacy and confidentiality, are informative for many analyses and purposes, and are relatively straightforward to use by data analysts. We present a framework for addressing this challenge. The framework uses an integrated system that includes fully synthetic data intended for wide access, coupled with means for approved users to access the confidential data via secure remote access solutions, glued together by verification servers that allow users to assess the quality of their analyses with the synthetic data. We apply this framework to data on the careers of employees of the U. S. federal government, studying differentials in pay by race. The integrated system performs as intended, allowing users to explore the synthetic data for potential pay differentials and learn through verifications which findings in the synthetic data hold up in the confidential data and which do not. We find differentials across races; for example, the gap between black and white female federal employees' pay increased over the time period. We present models for generating synthetic careers and differentially private algorithms for verification of regression results. %G eng %0 Conference Paper %B IEEE International Conference on Data Mining %D 2017 %T Differentially private regression diagnostics %A Chen, Y. %A Machanavajjhala, A. %A Reiter, J. P. %A Barrientos, A. %X Many data producers seek to provide users access to confidential data without unduly compromising data subjects' privacy and confidentiality. When intense redaction is needed to do so, one general strategy is to require users to do analyses without seeing the confidential data, for example, by releasing fully synthetic data or by allowing users to query remote systems for disclosure-protected outputs of statistical models. With fully synthetic data or redacted outputs, the analyst never really knows how much to trust the resulting findings. In particular, if the user did the same analysis on the confidential data, would regression coefficients of interest be statistically significant or not? We present algorithms for assessing this question that satisfy differential privacy. We describe conditions under which the algorithms should give accurate answers about statistical significance. We illustrate the properties of the methods using artificial and genuine data. %B IEEE International Conference on Data Mining %G eng %0 Report %D 2017 %T Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? %A Weinberg, Daniel %A Abowd, John M. %A Belli, Robert F. %A Cressie, Noel %A Folch, David C. %A Holan, Scott H. %A Levenstein, Margaret C. %A Olson, Kristen M. %A Reiter, Jerome P. %A Shapiro, Matthew D. %A Smyth, Jolene %A Soh, Leen-Kiat %A Spencer, Bruce %A Spielman, Seth E. %A Vilhuber, Lars %A Wikle, Christopher %X

Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong

%I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/52650 %9 Preprint %0 Report %D 2017 %T Unique Entity Estimation with Application to the Syrian Conflict %A Chen, B. %A Shrivastava, A. %A Steorts, R. C. %K Computer Science - Data Structures and Algorithms %K Computer Science - Databases %K Statistics - Applications %X Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the data generation process, error rates, and computational scalability that make it a difficult task for real applications. In this paper, we focus on a related problem of unique entity estimation, which is the task of estimating the unique number of entities and associated standard errors in a data set with duplicate entities. Unique entity estimation shares many fundamental challenges of entity resolution, namely, that the computational cost of all-to-all entity comparisons is intractable for large databases. To circumvent this computational barrier, we propose an efficient (near-linear time) estimation algorithm based on locality sensitive hashing. Our estimator, under realistic assumptions, is unbiased and has provably low variance compared to existing random sampling based approaches. In addition, we empirically show its superiority over the state-of-the-art estimators on three real applications. The motivation for our work is to derive an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict. Our methodology, when applied to the Syrian data set, provides an estimate of $191,874 \pm 1772$ documented, identifiable deaths, which is very close to the Human Rights Data Analysis Group (HRDAG) estimate of 191,369. Our work provides an example of challenges and efforts involved in solving a real, noisy challenging problem where modeling assumptions may not hold. %B arXiv %G eng %U https://arxiv.org/abs/1710.02690 %0 Report %D 2016 %T 2017 Economic Census: Towards Synthetic Data Sets %A Caldwell, Carol %A Thompson, Katherine Jenny %X 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny %I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/52165 %9 Preprint %0 Generic %D 2016 %T Data management and analytic use of paradata: SIPP-EHC audit trails %A Lee, Jinyoung %A Seloske, Ben %A Córdova Cazar, Ana Lucía %A Eck, Adam %A Kirchner, Antje %A Belli, Robert F. %G eng %0 Report %D 2016 %T NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms %A Brandimarte, Laura %A Chiew, Ernest %A Ventura, Sam %A Acquisti, Alessandro %X NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms Brandimarte, Laura; Chiew, Ernest; Ventura, Sam; Acquisti, Alessandro Geolocation refers to the automatic identification of the physical locations of Internet users. In an online survey experiment, we studied respondent reactions towards different types of geolocation. After coordinating with US Census Bureau researchers, we designed and administered a replica of a census form to a sample of respondents. We also created slightly different forms by manipulating the type of geolocation implemented. Using the IP address of each respondent, we approximated the geographical coordinates of the respondent and displayed this location on a map on the survey. Across different experimental conditions, we manipulated the map interface between the three interfaces on the Google Maps API: default road map, Satellite View, and Street View. We also provided either a specific, pinpointed location, or a set of two circles of 1- and 2-miles radius. Snapshots of responses were captured at every instant information was added, altered, or deleted by respondents when completing the survey. We measured willingness to provide information on the typical Census form, as well as privacy concerns associated with geolocation technologies and attitudes towards the use of online geographical maps to identify one’s exact current location. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting %I Carnegie-Mellon University %G eng %U http://hdl.handle.net/1813/43889 %9 Preprint %0 Report %D 2016 %T NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata %A Córdova Cazar, Ana Lucía %A Belli, Robert %X NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata Córdova Cazar, Ana Lucía; Belli, Robert Over the past decades, time use researchers have been increasingly interested in analyzing wellbeing in tandem with the use of time (Juster and Stafford, 1985; Krueger et al, 2009). Many methodological issues have arose in this endeavor, including the concern about the quality of the time use data. Survey researchers have increasingly turned to the analysis of paradata to better understand and model data quality. In particular, it has been argued that paradata may serve as proxy of the respondents’ cognitive response process, and can be used as an additional tool to assess the impact of data generation on data quality. In this presentation, data quality in the American Time Use Survey (ATUS) will be assessed through the use of paradata and survey responses. Specifically, I will talk about a data quality index I have created, which includes measures of different types of ATUS errors (e.g. low number of reported activities, failures to report an activity), and paradata variables (e.g. response latencies, incompletes). The overall objective of this study is to contribute to data quality assessment in the collection of timeline data from national surveys by providing insights on those interviewing dynamics that most impact data quality. These insights will help to improve future instruments and training of interviewers, as well as to reduce costs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting %I University of Nebraska %G eng %U http://hdl.handle.net/1813/43896 %9 Preprint %0 Report %D 2016 %T NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets %A Caldwell, Carol %A Thompson, Katherine Jenny %X NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny %I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/52165 %9 Preprint %0 Report %D 2016 %T NCRN Meeting Spring 2017: Practical Issues in Anonymity %A Clifton, Chris %A Merill, Shawn %A Merill, Keith %X NCRN Meeting Spring 2017: Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith %I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/52166 %9 Preprint %0 Report %D 2016 %T Practical Issues in Anonymity %A Clifton, Chris %A Merill, Shawn %A Merill, Keith %X Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith %I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/52166 %9 Preprint %0 Journal Article %J Journal of Survey Statistics and Methodology %D 2015 %T Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples %A Schifeling, T. %A Cheng, C. %A Hillygus, D. S. %A Reiter, J. P. %X Panel surveys typically su↵er from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, panel data alone cannot inform the extent of the bias from the attrition, so that analysts using the panel data alone must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during some later wave of the panel. Refreshment samples o↵er information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the refreshment sample itself. As we illustrate, nonignorable unit nonresponse can significantly compromise the analyst’s ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences—corrected for panel attrition—are to di↵erent assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. %B Journal of Survey Statistics and Methodology %V 3 %P 265-295 %G eng %U http://jssam.oxfordjournals.org/content/3/3/265.abstract %N 3 %& 265 %R 10.1093/jssam/smv007 %0 Journal Article %J Statistical Methods and Applications %D 2015 %T Bayesian Hierarchical Statistical SIRS Models %A Zhuang, L. %A Cressie, N. %B Statistical Methods and Applications %V 23 %P 601-646 %G eng %R 10.1007/s10260-014-0280-9 %0 Journal Article %J Statistical Science %D 2015 %T Capturing multivariate spatial dependence: Model, estimate, and then predict %A Cressie, N. %A Burden, S. %A Davis, W. %A Krivitsky, P. %A Mokhtarian, P. %A Seusse, T. %A Zammit-Mangion, A. %B Statistical Science %V 30 %P 170-175 %8 06/2015 %G eng %U http://projecteuclid.org/euclid.ss/1433341474 %N 2 %R 10.1214/15-STS517 %0 Conference Paper %B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) %D 2015 %T Changing ‘Who’ or ‘Where’: Implications for Data Quality in the American Time Use Survey %A Deal, C.E. %A Kirchner, A. %A Cordova-Cazar, A.L. %A Ellyne, L. %A Belli, R.F. %B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) %C Hollywood, Florida %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Journal Article %J Bayesian Analysis %D 2015 %T Comment on Article by Ferreira and Gamerman %A Cressie, N. %A Chambers, R. L. %B Bayesian Analysis %V 10 %P 741-748 %8 04/2015 %G eng %U http://projecteuclid.org/euclid.ba/1429880217 %N 3 %R doi:10.1214/15-BA944B %0 Journal Article %J Bayesian Analysis %D 2015 %T Comment: Spatial sampling designs depend as much on “how much?” and “why?” as on “where?” %A Cressie, N. %A Chambers, R. L. %X A comment on “Optimal design in geostatistics under preferential sampling” by G. da Silva Ferreira and D. Gamerman %B Bayesian Analysis %G eng %0 Journal Article %J Test %D 2015 %T Comparing and selecting spatial predictors using local criteria %A Bradley, J.R. %A Cressie, N. %A Shi, T. %B Test %V 24 %P 1-28 %8 03/2015 %G eng %U http://dx.doi.org/10.1007/s11749-014-0415-1 %N 1 %& 1 %R 10.1007/s11749-014-0415-1 %0 Book Section %B Geometry Driven Statistics %D 2015 %T Evaluation of diagnostics for hierarchical spatial statistical models %A Cressie, N. %A Burden, S. %E I.L. Dryden %E J.T. Kent %B Geometry Driven Statistics %7 1 %I Wiley %C Chinchester %P 241-256 %@ 978-1118866573 %G eng %U http://niasra.uow.edu.au/content/groups/public/@web/@inf/@math/documents/doc/uow169240.pdf %& 12 %0 Journal Article %J Stat %D 2015 %T Figures of merit for simultaneous inference and comparisons in simulation experiments %A Cressie, N. %A Burden, S. %B Stat %V 4 %P 196-211 %8 08/2015 %G eng %U http://onlinelibrary.wiley.com/doi/10.1002/sta4.88/epdf %N 1 %& 196 %R 10.1002/sta4.88 %0 Journal Article %J Mathematical Geosciences %D 2015 %T Hot enough for you? A spatial exploratory and inferential analysis of North American climate-change projections %A Cressie, N. %A Kang, E.L. %B Mathematical Geosciences %G eng %U http://dx.doi.org/10.1007/s11004-015-9607-9 %R 10.1007/s11004-015-9607-9 %0 Journal Article %J Statistics in Medicine %D 2015 %T Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis %A Siddique, J. %A Reiter, J. P. %A Brincks, A. %A Gibbons, R. %A Crespi, C. %A Brown, C. H. %B Statistics in Medicine %G eng %U http://onlinelibrary.wiley.com/doi/10.1002/sim.6562/abstract %R 10.1002/sim.6562 %0 Journal Article %D 2015 %T Multivariate Spatial Covariance Models: A Conditional Approach %A Cressie, N. %A Zammit-Mangion, A. %X Multivariate geostatistics is based on modelling all covariances between all possible combinations of two or more variables at any sets of locations in a continuously indexed domain. Multivariate spatial covariance models need to be built with care, since any covariance matrix that is derived from such a model must be nonnegative-definite. In this article, we develop a conditional approach for spatial-model construction whose validity conditions are easy to check. We start with bivariate spatial covariance models and go on to demonstrate the approach's connection to multivariate models defined by networks of spatial variables. In some circumstances, such as modelling respiratory illness conditional on air pollution, the direction of conditional dependence is clear. When it is not, the two directional models can be compared. More generally, the graph structure of the network reduces the number of possible models to compare. Model selection then amounts to finding possible causative links in the network. We demonstrate our conditional approach on surface temperature and pressure data, where the role of the two variables is seen to be asymmetric. %G eng %U https://arxiv.org/abs/1504.01865 %0 Report %D 2015 %T NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics %A Cressie, Noel %A Holan, Scott H. %A Wikle, Christopher K. %X NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Cressie, Noel; Holan, Scott H.; Wikle, Christopher K. Presentation at the NCRN Spring 2015 Meeting %I NCRN Coordinating Office %G eng %U http://hdl.handle.net/1813/40179 %9 Preprint %0 Journal Article %J Multivariate Behavioral Research %D 2015 %T A nonparametric, multiple imputation-based method for the retrospective integration of data sets %A M.M. Carrig %A D. Manrique-Vallier %A K. Ranby %A J.P. Reiter %A R. Hoyle %B Multivariate Behavioral Research %V 50 %P 383-397 %G eng %U http://www.tandfonline.com/doi/full/10.1080/00273171.2015.1022641 %N 4 %& 383 %R 10.1080/00273171.2015.1022641 %0 Journal Article %J Journal of Poverty %D 2015 %T Preventive policy strategy for banking the unbanked: Savings accounts for teenagers? %A Friedline, T. %A Despard, M. %A Chowa, G. %K financial assets %K savings %K Survey of Income and Program Participation (SIPP) %K teenagers %K unbanked %K young adults %X Concern over percentages of unbanked and underbanked households in the United States and their lack of connectedness to the financial mainstream has led to policy strategies geared toward reaching these households. Using nationally-representative longitudinal data, a preventive strategy for banking households is tested that asks whether young adults are more likely to be banked and own a diversity of financial assets when they are connected to the financial mainstream as teenagers. Young adults are more likely to own checking accounts, savings accounts, certificates of deposit, and stocks when they had savings accounts as teenagers. Policy implications are discussed. %B Journal of Poverty %V 20 %P 2-33 %8 07/2015 %G eng %U http://www.tandfonline.com/doi/full/10.1080/10875549.2015.1015068 %N 1 %& 2 %R 10.1080/10875549.2015.1015068 %0 Journal Article %J Test %D 2015 %T Rejoinder on: Comparing and selecting spatial predictors using local criteria %A Bradley, J.R. %A Cressie, N. %A Shi, T. %B Test %V 24 %P 54-60 %8 03/2015 %G eng %U http://dx.doi.org/10.1007/s11749-014-0414-2 %N 1 %R 10.1007/s11749-014-0414-2 %0 Journal Article %J Econometrics %D 2015 %T The SAR model for very large datasets: A reduced-rank approach %A Burden, S. %A Cressie, N. %A Steel, D.G. %B Econometrics %V 3 %P 317-338 %G eng %U http://www.mdpi.com/2225-1146/3/2/317 %N 2 %R 10.3390/econometrics3020317 %0 Journal Article %J Journal of the American Statistical Association %D 2015 %T Simultaneous Edit-Imputation for Continuous Microdata %A Kim, H. J. %A Cox, L. H. %A Karr, A. F. %A Reiter, J. P. %A Wang, Q. %B Journal of the American Statistical Association %V 110 %P 987-999 %G eng %U http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881 %R 10.1080/01621459.2015.1040881 %0 Conference Paper %B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) %D 2015 %T The Use of Paradata to Evaluate Interview Complexity and Data Quality (in Calendar and Time Diary Surveys) %A Cordova-Cazar, A.L. %A Belli, R.F. %B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) %C Hollywood, Florida %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Conference Paper %B American Association for Public Opinion Research 2014 Annual Conference %D 2014 %T Call back later: The association of recruitment contact and error in the American Time Use Survey %A Countryman, A. %A Cordova-Cazar, A.L. %A Deal, C.E. %A Belli, R.F. %B American Association for Public Opinion Research 2014 Annual Conference %C Anaheim, CA %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Journal Article %J ArXiv %D 2014 %T A Comparison of Spatial Predictors when Datasets Could be Very Large %A Bradley, J. R. %A Cressie, N. %A Shi, T. %K Statistics - Methodology %X

In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of CO2 data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.

%B ArXiv %G eng %U http://arxiv.org/abs/1410.7748 %N 1410.7748 %0 Book Section %B Lecture Notes in Computer Science %D 2014 %T Enabling statistical analysis of suppressed tabular data, in Privacy in Statistical Databases %A L. Cox %B Lecture Notes in Computer Science %I Springer %C Heidelberg %V 8744 %P 1-10 %G eng %0 Conference Paper %B Midwest Association for Public Opinion Research Annual Conference %D 2014 %T Hours or Minutes: Does One Unit Fit All? %A Cochran, B. %A Smyth, J.D. %B Midwest Association for Public Opinion Research Annual Conference %C Chicago, IL %G eng %U http://www.mapor.org/conferences.html %0 Journal Article %J Statistics in Medicine %D 2014 %T Imputation of confidential data sets with spatial locations using disease mapping models %A T. Paiva %A A. Chakraborty %A J.P. Reiter %A A.E. Gelfand %B Statistics in Medicine %V 33 %P 1928-1945 %G eng %0 Conference Paper %B American Association for Public Opinion Research 2014 Annual Conference %D 2014 %T Interviewer variance and prevalence of verbal behaviors in calendar and conventional interviewing %A Belli, R.F. %A Charoenruk, N., %B American Association for Public Opinion Research 2014 Annual Conference %C Anaheim, CA %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Conference Paper %B XVIII International Sociological Association World Congress of Sociology %D 2014 %T Interviewer variance of interviewer and respondent behaviors: A comparison between calendar and conventional interviewing %A Belli, R.F. %A Charoenruk, N., %B XVIII International Sociological Association World Congress of Sociology %C Yokohama, Japan %G eng %U https://isaconf.confex.com/isaconf/wc2014/webprogram/Paper34278.html %0 Journal Article %J Journal of Business and Economic Statistics %D 2014 %T Multiple imputation of missing or faulty values under linear constraints %A Kim, H. J. %A Reiter, J. P. %A Wang, Q. %A Cox, L. H. %A Karr, A. F. %X

Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.

%B Journal of Business and Economic Statistics %V 32 %P 375-386 %G eng %& 375 %R 10.1080/07350015.2014.885435 %0 Conference Paper %B Paper presented at the annual conference of the Midwest Association for Public Opinion Research %D 2014 %T Remembering where: A look at the American Time Use Survey %A Deal, C. %A Cordova-Cazar, A.L. %A Countryman, A. %A Kirchner, A. %A Belli, R.F. %B Paper presented at the annual conference of the Midwest Association for Public Opinion Research %C Chicago, IL %8 11/2014 %G eng %U http://www.mapor.org/conferences.html %0 Book Section %B The Routledge Handbook of Poverty in the United States %D 2014 %T The Rise of Incarceration Among the Poor with Mental Illnesses: How Neoliberal Policies Contribute %A Camp, J. %A Haymes, S. %A Haymes, M. V. d. %A Miller, R.J. %B The Routledge Handbook of Poverty in the United States %I Routledge %G eng %0 Journal Article %J Spatial Statistics %D 2014 %T Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates %A Porter, A. T., %A Holan, S.H., %A Wikle, C.K., %A Cressie, N. %B Spatial Statistics %V 10 %P 27-42 %G eng %U http://arxiv.org/pdf/1303.6668v3.pdf %( 2013 %0 Journal Article %J Journal of Privacy and Confidentiality %D 2014 %T Top-Coding and Public Use Microdata Samples from the U.S. Census Bureau %A Crimi, N. %A Eddy, W. C. %B Journal of Privacy and Confidentiality %V 6 %P 21–58 %G eng %U http://repository.cmu.edu/jpc/vol6/iss2/2/ %0 Book Section %B Online Panel Surveys: An Interdisciplinary Approach %D 2014 %T The Untold Story of Multi-Mode (Online and Mail) Consumer Panels: From Optimal Recruitment to Retention and Attrition %A McCutcheon, Allan L. %A Rao, K., %A Kaminska, O. %E Callegaro, M. %E Baker, R. %E Bethlehem, J. %E Göritz, A. %E Krosnick, J. %E Lavrakas, P. %B Online Panel Surveys: An Interdisciplinary Approach %I Wiley %G eng %R 10.1002/9781118763520.ch5 %0 Conference Paper %B American Association for Public Opinion Research 2014 Annual Conference %D 2014 %T The use of paradata (in time use surveys) to better evaluate data quality %A Cordova-Cazar, A.L. %A Belli, R.F. %B American Association for Public Opinion Research 2014 Annual Conference %C Anaheim, CA %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Report %D 2014 %T Using Social Media to Measure Labor Market Flows %A Antenucci, Dolan %A Cafarella, Michael J %A Levenstein, Margaret C. %A Ré, Christopher %A Shapiro, Matthew %G eng %U http://www-personal.umich.edu/~shapiro/papers/LaborFlowsSocialMedia.pdf %9 Mimeo %0 Conference Paper %B Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) %D 2014 %T Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences %A Woodruff, A. %A Pihur, V. %A Acquisti, A. %A Consolvo, S. %A Schmidt, L. %A Brandimarte, L. %B Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) %I ACM %C New York, NY %G eng %U https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff %0 Generic %D 2013 %T Bayesian inference for the Spatial Random Effects Model %A Cressie, N. %B Department of Statistics, Macquarie University %I Macquarie University %8 July %G eng %0 Conference Paper %B International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies %D 2013 %T Comparing and Selecting Predictors Predictors Using Local Criteria %A Cressie, N. %B International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies %I International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies %C Padua, Italy %8 March %G eng %0 Conference Paper %B American Association for Public Opinion Research 2013 Annual Conference %D 2013 %T Examining item nonresponse through paradata and respondent characteristics: A multilevel approach %A Cordova-Cazar, A.L. %B American Association for Public Opinion Research 2013 Annual Conference %C Boston, MA %G eng %U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx %0 Journal Article %J Ohio State Law Journal %D 2013 %T From Facebook Regrets to Facebook Privacy Nudges %A Wang, Y. %A Leon, P. G. %A Chen, X. %A Komanduri, S. %A Norcie, G. %A Scott, K. %A Acquisti, A. %A Cranor, L. F. %A Sadeh, N. %B Ohio State Law Journal %G eng %0 Journal Article %J Statistics Views %D 2013 %T Hierarchical Spatio-Temporal Models and Survey Research %A Wikle, C. %A Holan, S. %A Cressie, N. %B Statistics Views %8 May %G eng %U http://www.statisticsviews.com/details/feature/4730991/Hierarchical-Spatio-Temporal-Models-and-Survey-Research.html %( Wiley %0 Journal Article %J Spatial Statistics %D 2013 %T Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions %A Sengupta, A. %A Cressie, N. %K EM algorithm %K Empirical Bayes %K Geostatistical process %K Maximum likelihood estimation %K MCMC %K SRE model %B Spatial Statistics %V 4 %P 14-44 %G eng %U http://www.sciencedirect.com/science/article/pii/S2211675313000055 %R 10.1016/j.spasta.2013.02.002 %0 Generic %D 2013 %T How can survey estimates of small areas be improved by leveraging social-media data? %A Cressie, N. %A Holan, S. %A Wikle, C. %B The Survey Statistician %8 July %G eng %U http://isi.cbs.nl/iass/N68.pdf %0 Conference Paper %B Proceedings of Learning from Authoritative Security Experiment Results (LASER) %D 2013 %T Is it the Typeset or the Type of Statistics? Disfluent Font and Self-Disclosure %A Balebako, R. %A Pe'er, E. %A Brandimarte, L. %A Cranor, L. F. %A Acquisti, A. %B Proceedings of Learning from Authoritative Security Experiment Results (LASER) %I USENIX Association %C New York, NY %G eng %U https://www.usenix.org/laser2013/program/balebako %0 Thesis %B Social Work %D 2013 %T Mental Disorders and Inequality in the United States: Intersection of race, gender, and disability on employment and income %A Camp, J. %B Social Work %I Wayne State University %V Ph.D. %G eng %0 Journal Article %J WebDB %D 2013 %T Ringtail: a generalized nowcasting system. %A Antenucci, Dolan %A Li, Erdong %A Liu, Shaobo %A Zhang, Bochun %A Cafarella, Michael J %A Ré, Christopher %X Social media nowcasting—using online user activity to de- scribe real-world phenomena—is an active area of research to supplement more traditional and costly data collection methods such as phone surveys. Given the potential impact of such research, we would expect general-purpose nowcast- ing systems to quickly become a standard tool among non- computer scientists, yet it has largely remained a research topic. We believe a major obstacle to widespread adoption is the nowcasting feature selection problem. Typical now- casting systems require the user to choose a handful of social media objects from a pool of billions of potential candidates, which can be a time-consuming and error-prone process. We have built Ringtail, a nowcasting system that helps the user by automatically suggesting high-quality signals. We demonstrate that Ringtail can make nowcasting easier by suggesting relevant features for a range of topics. The user provides just a short topic query (e.g., unemployment) and a small conventional dataset in order for Ringtail to quickly return a usable predictive nowcasting model. %B WebDB %V 6 %P 1358-1361 %G eng %U http://cs.stanford.edu/people/chrismre/papers/Ringtail-VLDB-demo.pdf %& 1358 %0 Journal Article %J WebDB %D 2013 %T Ringtail: Feature Selection for Easier Nowcasting. %A Antenucci, Dolan %A Cafarella, Michael J %A Levenstein, Margaret C. %A Ré, Christopher %A Shapiro, Matthew %X In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have. We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user. %B WebDB %P 49-54 %G eng %U http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf %& 49 %0 Generic %D 2013 %T Some Historical Remarks on Spatial Statistics, Spatio-Temporal Statistics %A Cressie, N. %B Reading Group, University of Missouri %8 April %G eng %0 Generic %D 2013 %T Statistics for Spatio-Temporal Data %A Cressie, N. %B Invited One-Day Short Course at the U.S. Census Bureau %8 April %G eng %0 Generic %D 2012 %T Confidentiality and Privacy Protection in a Non-US Census Context %A Anne-Sophie Charest %I Carnegie Mellon University %8 April %G eng %0 Thesis %D 2012 %T Creation and Analysis of Differentially-Private Synthesis Datasets %A Anne-Sophie Charest %I Carnegie Mellon University %G eng %9 phd %0 Conference Paper %B Proceedings of the Survey Research Section of the SSC %D 2012 %T Differential Privacy for Synthetic Datasets %A Anne-Sophie Charest %B Proceedings of the Survey Research Section of the SSC %C Guelph, Ontario %G eng %0 Conference Paper %B Privacy in Statistical Databases %D 2012 %T Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables %A Anne-Sophie Charest %E Josep Domingo-Ferrer %E Ilenia Tinnirello %B Privacy in Statistical Databases %I Springer %V 7556 %P 257-272 %@ 978-3-642-33627-0 %G eng %R 10.1007/978-3-642-33627-0_20 %0 Generic %D 2012 %T Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions %A Sengupta, A. %A Cressie, N. %I The Ohio State University %G eng %0 Generic %D 2012 %T Inference for Count Data using the Spatial Random Effects Model %A Cressie, N. %8 May %G eng %0 Conference Paper %B Midwest Association for Public Opinion Research 2012 Annual Conference %D 2012 %T Interviewer variance of interviewer and respondent behaviors: A new frontier in analyzing the interviewer-respondent interaction %A Charoenruk, N. %A Parkhurst, B. %A Ay, M. %A Belli, R. F. %B Midwest Association for Public Opinion Research 2012 Annual Conference %C Chicago, IL %8 November %G eng %U http://www.mapor.org/conferences.html %0 Journal Article %J Annals of Applied Statistics %D 0 %T Biomass prediction using density dependent diameter distribution models %A Schliep, E.M. %A A.E. Gelfand %A J.S. Clark %A B.J. Tomasek %X Prediction of aboveground biomass, particularly at large spatial scales, is necessary for estimating global-scale carbon sequestration. Since biomass can be measured only by sacrificing trees, total biomass on plots is never observed. Rather, allometric equations are used to convert individual tree diameter to individual biomass, perhaps with noise. The values for all trees on a plot are then summed to obtain a derived total biomass for the plot. Then, with derived total biomasses for a collection of plots, regression models, using appropriate environmental covariates, are employed to attempt explanation and prediction. Not surprisingly, when out-of-sample validation is examined, such a model will predict total biomass well for holdout data because it is obtained using exactly the same derived approach. Apart from the somewhat circular nature of the regression approach, it also fails to employ the actual observed plot level response data. At each plot, we observe a random number of trees, each with an associated diameter, producing a sample of diameters. A model based on this random number of tree diameters provides understanding of how environmental regressors explain abundance of individuals, which in turn explains individual diameters. We incorporate density dependence because the distribution of tree diameters over a plot of fixed size depends upon the number of trees on the plot. After fitting this model, we can obtain predictive distributions for individual-level biomass and plot-level total biomass. We show that predictive distributions for plot-level biomass obtained from a density-dependent model for diameters will be much different from predictive distributions using the regression approach. Moreover, they can be more informative for capturing uncertainty than those obtained from modeling derived plot-level biomass directly. We develop a density-dependent diameter distribution model and illustrate with data from the national Forest Inventory and Analysis (FIA) database. We also describe how to scale predictions to larger spatial regions. Our predictions agree (in magnitude) with available wisdom on mean and variation in biomass at the hectare scale. %B Annals of Applied Statistics %V 11 %P 340-361 %G eng %U https://projecteuclid.org/euclid.aoas/1491616884 %N 1 %0 Book Section %B Handbook of research methods in health and social sciences %D 0 %T Calendar and time diary methods: The tools to assess well-being in the 21st century %A Córdova Cazar, Ana Lucía %A Belli, Robert F. %E Liamputtong, P %B Handbook of research methods in health and social sciences %I Springer %G eng %0 Generic %D 0 %T Evaluating Data quality in Time Diary Surveys Using Paradata %A Córdova Cazar, Ana Lucía %A Belli, Robert F. %G eng %0 Generic %D 0 %T An evaluation study of the use of paradata to enhance data quality in the American Time Use Survey (ATUS) %A Córdova Cazar, Ana Lucía %A Belli, Robert F. %G eng %0 Generic %D 0 %T Interviewer Influence on Interviewer-Respondent Interaction During Battery Questions %A Cochran, Beth %A Olson, Kristen %A Smyth, Jolene %G eng %0 Generic %D 0 %T Memory Gaps in the American Time Use Survey. Are Respondents Forgetful or is There More to it? %A Kirchner, Antje %A Belli, Robert F. %A Deal, Caitlin E. %A Córdova-Cazar, Ana Lucia %G eng %0 Generic %D 0 %T Working with the SIPP-EHC audit trails: Parallel and sequential retrieval %A Lee, Jinyoung %A Seloske, Ben %A Córdova Cazar, Ana Lucía %A Eck, Adam %A Belli, Robert F. %G eng