TY - JOUR T1 - A framework for sharing confidential research data, applied to investigating differential pay by race in the U. S. government Y1 - Submitted A1 - Barrientos, A. F. A1 - Bolton, A. A1 - Balmat, T. A1 - Reiter, J. P. A1 - Machanavajjhala, A. A1 - Chen, Y. A1 - Kneifel, C. A1 - DeLong, M. A1 - de Figueiredo, J. M. AB - Data stewards seeking to provide access to large-scale social science data face a difficult challenge. They have to share data in ways that protect privacy and confidentiality, are informative for many analyses and purposes, and are relatively straightforward to use by data analysts. We present a framework for addressing this challenge. The framework uses an integrated system that includes fully synthetic data intended for wide access, coupled with means for approved users to access the confidential data via secure remote access solutions, glued together by verification servers that allow users to assess the quality of their analyses with the synthetic data. We apply this framework to data on the careers of employees of the U. S. federal government, studying differentials in pay by race. The integrated system performs as intended, allowing users to explore the synthetic data for potential pay differentials and learn through verifications which findings in the synthetic data hold up in the confidential data and which do not. We find differentials across races; for example, the gap between black and white female federal employees' pay increased over the time period. We present models for generating synthetic careers and differentially private algorithms for verification of regression results. ER - TY - CONF T1 - Differentially private regression diagnostics T2 - IEEE International Conference on Data Mining Y1 - 2017 A1 - Chen, Y. A1 - Machanavajjhala, A. A1 - Reiter, J. P. A1 - Barrientos, A. AB - Many data producers seek to provide users access to confidential data without unduly compromising data subjects' privacy and confidentiality. When intense redaction is needed to do so, one general strategy is to require users to do analyses without seeing the confidential data, for example, by releasing fully synthetic data or by allowing users to query remote systems for disclosure-protected outputs of statistical models. With fully synthetic data or redacted outputs, the analyst never really knows how much to trust the resulting findings. In particular, if the user did the same analysis on the confidential data, would regression coefficients of interest be statistically significant or not? We present algorithms for assessing this question that satisfy differential privacy. We describe conditions under which the algorithms should give accurate answers about statistical significance. We illustrate the properties of the methods using artificial and genuine data. JF - IEEE International Conference on Data Mining ER - TY - RPRT T1 - Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Y1 - 2017 A1 - Weinberg, Daniel A1 - Abowd, John M. A1 - Belli, Robert F. A1 - Cressie, Noel A1 - Folch, David C. A1 - Holan, Scott H. A1 - Levenstein, Margaret C. A1 - Olson, Kristen M. A1 - Reiter, Jerome P. A1 - Shapiro, Matthew D. A1 - Smyth, Jolene A1 - Soh, Leen-Kiat A1 - Spencer, Bruce A1 - Spielman, Seth E. A1 - Vilhuber, Lars A1 - Wikle, Christopher AB -

Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52650 ER - TY - RPRT T1 - Unique Entity Estimation with Application to the Syrian Conflict Y1 - 2017 A1 - Chen, B. A1 - Shrivastava, A. A1 - Steorts, R. C. KW - Computer Science - Data Structures and Algorithms KW - Computer Science - Databases KW - Statistics - Applications AB - Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the data generation process, error rates, and computational scalability that make it a difficult task for real applications. In this paper, we focus on a related problem of unique entity estimation, which is the task of estimating the unique number of entities and associated standard errors in a data set with duplicate entities. Unique entity estimation shares many fundamental challenges of entity resolution, namely, that the computational cost of all-to-all entity comparisons is intractable for large databases. To circumvent this computational barrier, we propose an efficient (near-linear time) estimation algorithm based on locality sensitive hashing. Our estimator, under realistic assumptions, is unbiased and has provably low variance compared to existing random sampling based approaches. In addition, we empirically show its superiority over the state-of-the-art estimators on three real applications. The motivation for our work is to derive an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict. Our methodology, when applied to the Syrian data set, provides an estimate of $191,874 \pm 1772$ documented, identifiable deaths, which is very close to the Human Rights Data Analysis Group (HRDAG) estimate of 191,369. Our work provides an example of challenges and efforts involved in solving a real, noisy challenging problem where modeling assumptions may not hold. JF - arXiv UR - https://arxiv.org/abs/1710.02690 ER - TY - RPRT T1 - 2017 Economic Census: Towards Synthetic Data Sets Y1 - 2016 A1 - Caldwell, Carol A1 - Thompson, Katherine Jenny AB - 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52165 ER - TY - ABST T1 - Data management and analytic use of paradata: SIPP-EHC audit trails Y1 - 2016 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Córdova Cazar, Ana Lucía A1 - Eck, Adam A1 - Kirchner, Antje A1 - Belli, Robert F. ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms Y1 - 2016 A1 - Brandimarte, Laura A1 - Chiew, Ernest A1 - Ventura, Sam A1 - Acquisti, Alessandro AB - NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms Brandimarte, Laura; Chiew, Ernest; Ventura, Sam; Acquisti, Alessandro Geolocation refers to the automatic identification of the physical locations of Internet users. In an online survey experiment, we studied respondent reactions towards different types of geolocation. After coordinating with US Census Bureau researchers, we designed and administered a replica of a census form to a sample of respondents. We also created slightly different forms by manipulating the type of geolocation implemented. Using the IP address of each respondent, we approximated the geographical coordinates of the respondent and displayed this location on a map on the survey. Across different experimental conditions, we manipulated the map interface between the three interfaces on the Google Maps API: default road map, Satellite View, and Street View. We also provided either a specific, pinpointed location, or a set of two circles of 1- and 2-miles radius. Snapshots of responses were captured at every instant information was added, altered, or deleted by respondents when completing the survey. We measured willingness to provide information on the typical Census form, as well as privacy concerns associated with geolocation technologies and attitudes towards the use of online geographical maps to identify one’s exact current location. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - Carnegie-Mellon University UR - http://hdl.handle.net/1813/43889 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata Y1 - 2016 A1 - Córdova Cazar, Ana Lucía A1 - Belli, Robert AB - NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata Córdova Cazar, Ana Lucía; Belli, Robert Over the past decades, time use researchers have been increasingly interested in analyzing wellbeing in tandem with the use of time (Juster and Stafford, 1985; Krueger et al, 2009). Many methodological issues have arose in this endeavor, including the concern about the quality of the time use data. Survey researchers have increasingly turned to the analysis of paradata to better understand and model data quality. In particular, it has been argued that paradata may serve as proxy of the respondents’ cognitive response process, and can be used as an additional tool to assess the impact of data generation on data quality. In this presentation, data quality in the American Time Use Survey (ATUS) will be assessed through the use of paradata and survey responses. Specifically, I will talk about a data quality index I have created, which includes measures of different types of ATUS errors (e.g. low number of reported activities, failures to report an activity), and paradata variables (e.g. response latencies, incompletes). The overall objective of this study is to contribute to data quality assessment in the collection of timeline data from national surveys by providing insights on those interviewing dynamics that most impact data quality. These insights will help to improve future instruments and training of interviewers, as well as to reduce costs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Nebraska UR - http://hdl.handle.net/1813/43896 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets Y1 - 2016 A1 - Caldwell, Carol A1 - Thompson, Katherine Jenny AB - NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52165 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Practical Issues in Anonymity Y1 - 2016 A1 - Clifton, Chris A1 - Merill, Shawn A1 - Merill, Keith AB - NCRN Meeting Spring 2017: Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52166 ER - TY - RPRT T1 - Practical Issues in Anonymity Y1 - 2016 A1 - Clifton, Chris A1 - Merill, Shawn A1 - Merill, Keith AB - Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52166 ER - TY - JOUR T1 - Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples JF - Journal of Survey Statistics and Methodology Y1 - 2015 A1 - Schifeling, T. A1 - Cheng, C. A1 - Hillygus, D. S. A1 - Reiter, J. P. AB - Panel surveys typically su↵er from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, panel data alone cannot inform the extent of the bias from the attrition, so that analysts using the panel data alone must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during some later wave of the panel. Refreshment samples o↵er information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the refreshment sample itself. As we illustrate, nonignorable unit nonresponse can significantly compromise the analyst’s ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences—corrected for panel attrition—are to di↵erent assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. VL - 3 UR - http://jssam.oxfordjournals.org/content/3/3/265.abstract IS - 3 ER - TY - JOUR T1 - Bayesian Hierarchical Statistical SIRS Models JF - Statistical Methods and Applications Y1 - 2015 A1 - Zhuang, L. A1 - Cressie, N. VL - 23 ER - TY - JOUR T1 - Capturing multivariate spatial dependence: Model, estimate, and then predict JF - Statistical Science Y1 - 2015 A1 - Cressie, N. A1 - Burden, S. A1 - Davis, W. A1 - Krivitsky, P. A1 - Mokhtarian, P. A1 - Seusse, T. A1 - Zammit-Mangion, A. VL - 30 UR - http://projecteuclid.org/euclid.ss/1433341474 IS - 2 ER - TY - CONF T1 - Changing ‘Who’ or ‘Where’: Implications for Data Quality in the American Time Use Survey T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Deal, C.E. A1 - Kirchner, A. A1 - Cordova-Cazar, A.L. A1 - Ellyne, L. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Comment on Article by Ferreira and Gamerman JF - Bayesian Analysis Y1 - 2015 A1 - Cressie, N. A1 - Chambers, R. L. VL - 10 UR - http://projecteuclid.org/euclid.ba/1429880217 IS - 3 ER - TY - JOUR T1 - Comment: Spatial sampling designs depend as much on “how much?” and “why?” as on “where?” JF - Bayesian Analysis Y1 - 2015 A1 - Cressie, N. A1 - Chambers, R. L. AB - A comment on “Optimal design in geostatistics under preferential sampling” by G. da Silva Ferreira and D. Gamerman ER - TY - JOUR T1 - Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0415-1 IS - 1 ER - TY - CHAP T1 - Evaluation of diagnostics for hierarchical spatial statistical models T2 - Geometry Driven Statistics Y1 - 2015 A1 - Cressie, N. A1 - Burden, S. ED - I.L. Dryden ED - J.T. Kent JF - Geometry Driven Statistics PB - Wiley CY - Chinchester SN - 978-1118866573 UR - http://niasra.uow.edu.au/content/groups/public/@web/@inf/@math/documents/doc/uow169240.pdf ER - TY - JOUR T1 - Figures of merit for simultaneous inference and comparisons in simulation experiments JF - Stat Y1 - 2015 A1 - Cressie, N. A1 - Burden, S. VL - 4 UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.88/epdf IS - 1 ER - TY - JOUR T1 - Hot enough for you? A spatial exploratory and inferential analysis of North American climate-change projections JF - Mathematical Geosciences Y1 - 2015 A1 - Cressie, N. A1 - Kang, E.L. UR - http://dx.doi.org/10.1007/s11004-015-9607-9 ER - TY - JOUR T1 - Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis JF - Statistics in Medicine Y1 - 2015 A1 - Siddique, J. A1 - Reiter, J. P. A1 - Brincks, A. A1 - Gibbons, R. A1 - Crespi, C. A1 - Brown, C. H. UR - http://onlinelibrary.wiley.com/doi/10.1002/sim.6562/abstract ER - TY - JOUR T1 - Multivariate Spatial Covariance Models: A Conditional Approach Y1 - 2015 A1 - Cressie, N. A1 - Zammit-Mangion, A. AB - Multivariate geostatistics is based on modelling all covariances between all possible combinations of two or more variables at any sets of locations in a continuously indexed domain. Multivariate spatial covariance models need to be built with care, since any covariance matrix that is derived from such a model must be nonnegative-definite. In this article, we develop a conditional approach for spatial-model construction whose validity conditions are easy to check. We start with bivariate spatial covariance models and go on to demonstrate the approach's connection to multivariate models defined by networks of spatial variables. In some circumstances, such as modelling respiratory illness conditional on air pollution, the direction of conditional dependence is clear. When it is not, the two directional models can be compared. More generally, the graph structure of the network reduces the number of possible models to compare. Model selection then amounts to finding possible causative links in the network. We demonstrate our conditional approach on surface temperature and pressure data, where the role of the two variables is seen to be asymmetric. UR - https://arxiv.org/abs/1504.01865 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Y1 - 2015 A1 - Cressie, Noel A1 - Holan, Scott H. A1 - Wikle, Christopher K. AB - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Cressie, Noel; Holan, Scott H.; Wikle, Christopher K. Presentation at the NCRN Spring 2015 Meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40179 ER - TY - JOUR T1 - A nonparametric, multiple imputation-based method for the retrospective integration of data sets JF - Multivariate Behavioral Research Y1 - 2015 A1 - M.M. Carrig A1 - D. Manrique-Vallier A1 - K. Ranby A1 - J.P. Reiter A1 - R. Hoyle VL - 50 UR - http://www.tandfonline.com/doi/full/10.1080/00273171.2015.1022641 IS - 4 ER - TY - JOUR T1 - Preventive policy strategy for banking the unbanked: Savings accounts for teenagers? JF - Journal of Poverty Y1 - 2015 A1 - Friedline, T. A1 - Despard, M. A1 - Chowa, G. KW - financial assets KW - savings KW - Survey of Income and Program Participation (SIPP) KW - teenagers KW - unbanked KW - young adults AB - Concern over percentages of unbanked and underbanked households in the United States and their lack of connectedness to the financial mainstream has led to policy strategies geared toward reaching these households. Using nationally-representative longitudinal data, a preventive strategy for banking households is tested that asks whether young adults are more likely to be banked and own a diversity of financial assets when they are connected to the financial mainstream as teenagers. Young adults are more likely to own checking accounts, savings accounts, certificates of deposit, and stocks when they had savings accounts as teenagers. Policy implications are discussed. VL - 20 UR - http://www.tandfonline.com/doi/full/10.1080/10875549.2015.1015068 IS - 1 ER - TY - JOUR T1 - Rejoinder on: Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0414-2 IS - 1 ER - TY - JOUR T1 - The SAR model for very large datasets: A reduced-rank approach JF - Econometrics Y1 - 2015 A1 - Burden, S. A1 - Cressie, N. A1 - Steel, D.G. VL - 3 UR - http://www.mdpi.com/2225-1146/3/2/317 IS - 2 ER - TY - JOUR T1 - Simultaneous Edit-Imputation for Continuous Microdata JF - Journal of the American Statistical Association Y1 - 2015 A1 - Kim, H. J. A1 - Cox, L. H. A1 - Karr, A. F. A1 - Reiter, J. P. A1 - Wang, Q. VL - 110 UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881 ER - TY - CONF T1 - The Use of Paradata to Evaluate Interview Complexity and Data Quality (in Calendar and Time Diary Surveys) T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Cordova-Cazar, A.L. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Call back later: The association of recruitment contact and error in the American Time Use Survey T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Countryman, A. A1 - Cordova-Cazar, A.L. A1 - Deal, C.E. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - A Comparison of Spatial Predictors when Datasets Could be Very Large JF - ArXiv Y1 - 2014 A1 - Bradley, J. R. A1 - Cressie, N. A1 - Shi, T. KW - Statistics - Methodology AB -

In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of CO2 data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.

UR - http://arxiv.org/abs/1410.7748 IS - 1410.7748 ER - TY - CHAP T1 - Enabling statistical analysis of suppressed tabular data, in Privacy in Statistical Databases T2 - Lecture Notes in Computer Science Y1 - 2014 A1 - L. Cox JF - Lecture Notes in Computer Science PB - Springer CY - Heidelberg VL - 8744 ER - TY - CONF T1 - Hours or Minutes: Does One Unit Fit All? T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - Cochran, B. A1 - Smyth, J.D. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - JOUR T1 - Imputation of confidential data sets with spatial locations using disease mapping models JF - Statistics in Medicine Y1 - 2014 A1 - T. Paiva A1 - A. Chakraborty A1 - J.P. Reiter A1 - A.E. Gelfand VL - 33 ER - TY - CONF T1 - Interviewer variance and prevalence of verbal behaviors in calendar and conventional interviewing T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Belli, R.F. A1 - Charoenruk, N., JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Interviewer variance of interviewer and respondent behaviors: A comparison between calendar and conventional interviewing T2 - XVIII International Sociological Association World Congress of Sociology Y1 - 2014 A1 - Belli, R.F. A1 - Charoenruk, N., JF - XVIII International Sociological Association World Congress of Sociology CY - Yokohama, Japan UR - https://isaconf.confex.com/isaconf/wc2014/webprogram/Paper34278.html ER - TY - JOUR T1 - Multiple imputation of missing or faulty values under linear constraints JF - Journal of Business and Economic Statistics Y1 - 2014 A1 - Kim, H. J. A1 - Reiter, J. P. A1 - Wang, Q. A1 - Cox, L. H. A1 - Karr, A. F. AB -

Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.

VL - 32 ER - TY - CONF T1 - Remembering where: A look at the American Time Use Survey T2 - Paper presented at the annual conference of the Midwest Association for Public Opinion Research Y1 - 2014 A1 - Deal, C. A1 - Cordova-Cazar, A.L. A1 - Countryman, A. A1 - Kirchner, A. A1 - Belli, R.F. JF - Paper presented at the annual conference of the Midwest Association for Public Opinion Research CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - CHAP T1 - The Rise of Incarceration Among the Poor with Mental Illnesses: How Neoliberal Policies Contribute T2 - The Routledge Handbook of Poverty in the United States Y1 - 2014 A1 - Camp, J. A1 - Haymes, S. A1 - Haymes, M. V. d. A1 - Miller, R.J. JF - The Routledge Handbook of Poverty in the United States PB - Routledge ER - TY - JOUR T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates JF - Spatial Statistics Y1 - 2014 A1 - Porter, A. T., A1 - Holan, S.H., A1 - Wikle, C.K., A1 - Cressie, N. VL - 10 UR - http://arxiv.org/pdf/1303.6668v3.pdf ER - TY - JOUR T1 - Top-Coding and Public Use Microdata Samples from the U.S. Census Bureau JF - Journal of Privacy and Confidentiality Y1 - 2014 A1 - Crimi, N. A1 - Eddy, W. C. VL - 6 UR - http://repository.cmu.edu/jpc/vol6/iss2/2/ ER - TY - CHAP T1 - The Untold Story of Multi-Mode (Online and Mail) Consumer Panels: From Optimal Recruitment to Retention and Attrition T2 - Online Panel Surveys: An Interdisciplinary Approach Y1 - 2014 A1 - McCutcheon, Allan L. A1 - Rao, K., A1 - Kaminska, O. ED - Callegaro, M. ED - Baker, R. ED - Bethlehem, J. ED - Göritz, A. ED - Krosnick, J. ED - Lavrakas, P. JF - Online Panel Surveys: An Interdisciplinary Approach PB - Wiley ER - TY - CONF T1 - The use of paradata (in time use surveys) to better evaluate data quality T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Cordova-Cazar, A.L. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Using Social Media to Measure Labor Market Flows Y1 - 2014 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew UR - http://www-personal.umich.edu/~shapiro/papers/LaborFlowsSocialMedia.pdf ER - TY - CONF T1 - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences T2 - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2014 A1 - Woodruff, A. A1 - Pihur, V. A1 - Acquisti, A. A1 - Consolvo, S. A1 - Schmidt, L. A1 - Brandimarte, L. JF - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY UR - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff N1 - IAPP SOUPS Privacy Award Winner ER - TY - ABST T1 - Bayesian inference for the Spatial Random Effects Model Y1 - 2013 A1 - Cressie, N. JF - Department of Statistics, Macquarie University PB - Macquarie University ER - TY - CONF T1 - Comparing and Selecting Predictors Predictors Using Local Criteria T2 - International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies Y1 - 2013 A1 - Cressie, N. JF - International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies PB - International Workshop on Recent Advances in Statistical Inference: Theory and Case Studies CY - Padua, Italy ER - TY - CONF T1 - Examining item nonresponse through paradata and respondent characteristics: A multilevel approach T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Cordova-Cazar, A.L. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - From Facebook Regrets to Facebook Privacy Nudges JF - Ohio State Law Journal Y1 - 2013 A1 - Wang, Y. A1 - Leon, P. G. A1 - Chen, X. A1 - Komanduri, S. A1 - Norcie, G. A1 - Scott, K. A1 - Acquisti, A. A1 - Cranor, L. F. A1 - Sadeh, N. N1 - Invited paper ER - TY - JOUR T1 - Hierarchical Spatio-Temporal Models and Survey Research JF - Statistics Views Y1 - 2013 A1 - Wikle, C. A1 - Holan, S. A1 - Cressie, N. UR - http://www.statisticsviews.com/details/feature/4730991/Hierarchical-Spatio-Temporal-Models-and-Survey-Research.html ER - TY - JOUR T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions JF - Spatial Statistics Y1 - 2013 A1 - Sengupta, A. A1 - Cressie, N. KW - EM algorithm KW - Empirical Bayes KW - Geostatistical process KW - Maximum likelihood estimation KW - MCMC KW - SRE model VL - 4 UR - http://www.sciencedirect.com/science/article/pii/S2211675313000055 ER - TY - ABST T1 - How can survey estimates of small areas be improved by leveraging social-media data? Y1 - 2013 A1 - Cressie, N. A1 - Holan, S. A1 - Wikle, C. JF - The Survey Statistician UR - http://isi.cbs.nl/iass/N68.pdf ER - TY - CONF T1 - Is it the Typeset or the Type of Statistics? Disfluent Font and Self-Disclosure T2 - Proceedings of Learning from Authoritative Security Experiment Results (LASER) Y1 - 2013 A1 - Balebako, R. A1 - Pe'er, E. A1 - Brandimarte, L. A1 - Cranor, L. F. A1 - Acquisti, A. JF - Proceedings of Learning from Authoritative Security Experiment Results (LASER) PB - USENIX Association CY - New York, NY UR - https://www.usenix.org/laser2013/program/balebako ER - TY - THES T1 - Mental Disorders and Inequality in the United States: Intersection of race, gender, and disability on employment and income T2 - Social Work Y1 - 2013 A1 - Camp, J. JF - Social Work PB - Wayne State University VL - Ph.D. ER - TY - JOUR T1 - Ringtail: a generalized nowcasting system. JF - WebDB Y1 - 2013 A1 - Antenucci, Dolan A1 - Li, Erdong A1 - Liu, Shaobo A1 - Zhang, Bochun A1 - Cafarella, Michael J A1 - Ré, Christopher AB - Social media nowcasting—using online user activity to de- scribe real-world phenomena—is an active area of research to supplement more traditional and costly data collection methods such as phone surveys. Given the potential impact of such research, we would expect general-purpose nowcast- ing systems to quickly become a standard tool among non- computer scientists, yet it has largely remained a research topic. We believe a major obstacle to widespread adoption is the nowcasting feature selection problem. Typical now- casting systems require the user to choose a handful of social media objects from a pool of billions of potential candidates, which can be a time-consuming and error-prone process. We have built Ringtail, a nowcasting system that helps the user by automatically suggesting high-quality signals. We demonstrate that Ringtail can make nowcasting easier by suggesting relevant features for a range of topics. The user provides just a short topic query (e.g., unemployment) and a small conventional dataset in order for Ringtail to quickly return a usable predictive nowcasting model. VL - 6 UR - http://cs.stanford.edu/people/chrismre/papers/Ringtail-VLDB-demo.pdf ER - TY - JOUR T1 - Ringtail: Feature Selection for Easier Nowcasting. JF - WebDB Y1 - 2013 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew AB - In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have. We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user. UR - http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf ER - TY - ABST T1 - Some Historical Remarks on Spatial Statistics, Spatio-Temporal Statistics Y1 - 2013 A1 - Cressie, N. JF - Reading Group, University of Missouri ER - TY - ABST T1 - Statistics for Spatio-Temporal Data Y1 - 2013 A1 - Cressie, N. JF - Invited One-Day Short Course at the U.S. Census Bureau ER - TY - ABST T1 - Confidentiality and Privacy Protection in a Non-US Census Context Y1 - 2012 A1 - Anne-Sophie Charest PB - Carnegie Mellon University ER - TY - THES T1 - Creation and Analysis of Differentially-Private Synthesis Datasets Y1 - 2012 A1 - Anne-Sophie Charest PB - Carnegie Mellon University N1 - PhD Thesis, Department of Statistics ER - TY - CONF T1 - Differential Privacy for Synthetic Datasets T2 - Proceedings of the Survey Research Section of the SSC Y1 - 2012 A1 - Anne-Sophie Charest JF - Proceedings of the Survey Research Section of the SSC CY - Guelph, Ontario N1 - Invited session on Confidentiality of the Annual Meeting of the Statistical Society of Canada ER - TY - CONF T1 - Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables T2 - Privacy in Statistical Databases Y1 - 2012 A1 - Anne-Sophie Charest ED - Josep Domingo-Ferrer ED - Ilenia Tinnirello JF - Privacy in Statistical Databases PB - Springer VL - 7556 SN - 978-3-642-33627-0 N1 - Print ISBN is 978-3-642-33626-3 ER - TY - ABST T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions Y1 - 2012 A1 - Sengupta, A. A1 - Cressie, N. PB - The Ohio State University ER - TY - ABST T1 - Inference for Count Data using the Spatial Random Effects Model Y1 - 2012 A1 - Cressie, N. ER - TY - CONF T1 - Interviewer variance of interviewer and respondent behaviors: A new frontier in analyzing the interviewer-respondent interaction T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Charoenruk, N. A1 - Parkhurst, B. A1 - Ay, M. A1 - Belli, R. F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html N1 - Annual conference of the Midwest Association for Public Opinion Research, Chicago, Illinois. ER - TY - JOUR T1 - Biomass prediction using density dependent diameter distribution models JF - Annals of Applied Statistics Y1 - 0 A1 - Schliep, E.M. A1 - A.E. Gelfand A1 - J.S. Clark A1 - B.J. Tomasek AB - Prediction of aboveground biomass, particularly at large spatial scales, is necessary for estimating global-scale carbon sequestration. Since biomass can be measured only by sacrificing trees, total biomass on plots is never observed. Rather, allometric equations are used to convert individual tree diameter to individual biomass, perhaps with noise. The values for all trees on a plot are then summed to obtain a derived total biomass for the plot. Then, with derived total biomasses for a collection of plots, regression models, using appropriate environmental covariates, are employed to attempt explanation and prediction. Not surprisingly, when out-of-sample validation is examined, such a model will predict total biomass well for holdout data because it is obtained using exactly the same derived approach. Apart from the somewhat circular nature of the regression approach, it also fails to employ the actual observed plot level response data. At each plot, we observe a random number of trees, each with an associated diameter, producing a sample of diameters. A model based on this random number of tree diameters provides understanding of how environmental regressors explain abundance of individuals, which in turn explains individual diameters. We incorporate density dependence because the distribution of tree diameters over a plot of fixed size depends upon the number of trees on the plot. After fitting this model, we can obtain predictive distributions for individual-level biomass and plot-level total biomass. We show that predictive distributions for plot-level biomass obtained from a density-dependent model for diameters will be much different from predictive distributions using the regression approach. Moreover, they can be more informative for capturing uncertainty than those obtained from modeling derived plot-level biomass directly. We develop a density-dependent diameter distribution model and illustrate with data from the national Forest Inventory and Analysis (FIA) database. We also describe how to scale predictions to larger spatial regions. Our predictions agree (in magnitude) with available wisdom on mean and variation in biomass at the hectare scale. VL - 11 UR - https://projecteuclid.org/euclid.aoas/1491616884 IS - 1 ER - TY - CHAP T1 - Calendar and time diary methods: The tools to assess well-being in the 21st century T2 - Handbook of research methods in health and social sciences Y1 - 0 A1 - Córdova Cazar, Ana Lucía A1 - Belli, Robert F. ED - Liamputtong, P JF - Handbook of research methods in health and social sciences PB - Springer ER - TY - ABST T1 - Evaluating Data quality in Time Diary Surveys Using Paradata Y1 - 0 A1 - Córdova Cazar, Ana Lucía A1 - Belli, Robert F. ER - TY - ABST T1 - An evaluation study of the use of paradata to enhance data quality in the American Time Use Survey (ATUS) Y1 - 0 A1 - Córdova Cazar, Ana Lucía A1 - Belli, Robert F. ER - TY - ABST T1 - Interviewer Influence on Interviewer-Respondent Interaction During Battery Questions Y1 - 0 A1 - Cochran, Beth A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - Memory Gaps in the American Time Use Survey. Are Respondents Forgetful or is There More to it? Y1 - 0 A1 - Kirchner, Antje A1 - Belli, Robert F. A1 - Deal, Caitlin E. A1 - Córdova-Cazar, Ana Lucia ER - TY - ABST T1 - Working with the SIPP-EHC audit trails: Parallel and sequential retrieval Y1 - 0 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Córdova Cazar, Ana Lucía A1 - Eck, Adam A1 - Belli, Robert F. ER -