TY - JOUR T1 - Data fusion for correcting measurement errors Y1 - Submitted A1 - J. P. Reiter A1 - T. Schifeling A1 - M. De Yoreo AB - Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. In some settings, however, analysts have access to a data source on different individuals with high quality measurements of the error-prone survey items. We present a data fusion framework for leveraging this information to improve inferences in the error-prone survey. The basic idea is to posit models about the rates at which individuals make errors, coupled with models for the values reported when errors are made. This can avoid the unrealistic assumption of conditional independence typically used in data fusion. We apply the approach on the reported values of educational attainments in the American Community Survey, using the National Survey of College Graduates as the high quality data source. In doing so, we account for the informative sampling design used to select the National Survey of College Graduates. We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. Supplemental material is available online. ER - TY - JOUR T1 - Sequential identification of nonignorable missing data mechanisms JF - Statistica Sinica Y1 - Submitted A1 - Mauricio Sadinle A1 - Jerome P. Reiter KW - Identification KW - Missing not at random KW - Non-parametric saturated KW - Partial ignorability KW - Sensitivity analysis AB - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated. ER - TY - JOUR T1 - The Earned Income Tax Credit and Food Insecurity: Who Benefits? Y1 - forthcoming A1 - Shaefer, H.L. A1 - Wilson, R. ER - TY - JOUR T1 - The Response of Consumer Spending to Changes in Gasoline Prices Y1 - forthcoming A1 - Gelman, Michael A1 - Gorodnichenko, Yuriy A1 - Kariv, Shachar A1 - Koustas, Dmitri A1 - Shapiro, Matthew D A1 - Silverman, Daniel A1 - Tadelis, Steven AB - This paper estimates how overall consumer spending responds to changes in gasoline prices. It uses the differential impact across consumers of the sudden, large drop in gasoline prices in 2014 for identification. This estimation strategy is implemented using comprehensive, daily transaction-level data for a large panel of individuals. The estimated marginal propensity to consume (MPC) is approximately one, a higher estimate than estimates found in less comprehensive or well-measured data. This estimate takes into account the elasticity of demand for gasoline and potential slow adjustment to changes in prices. The high MPC implies that changes in gasoline prices have large aggregate effects. ER - TY - JOUR T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching JF - Annals of Economics and Statistics Y1 - 2018 A1 - John M. Abowd A1 - Francis Kramarz A1 - Sebastien Perez-Duarte A1 - Ian M. Schmutte ER - TY - JOUR T1 - Adaptively-Tuned Particle Swarm Optimization with Application to Spatial Design JF - Stat Y1 - 2017 A1 - Simpson, M. A1 - Wikle, C.K. A1 - Holan, S.H. AB - Particle swarm optimization (PSO) algorithms are a class of heuristic optimization algorithms that are attractive for complex optimization problems. We propose using PSO to solve spatial design problems, e.g. choosing new locations to add to an existing monitoring network. Additionally, we introduce two new classes of PSO algorithms that perform well in a wide variety of circumstances, called adaptively tuned PSO and adaptively tuned bare bones PSO. To illustrate these algorithms, we apply them to a common spatial design problem: choosing new locations to add to an existing monitoring network. Specifically, we consider a network in the Houston, TX, area for monitoring ambient ozone levels, which have been linked to out-of-hospital cardiac arrest rates. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA VL - 6 UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.142/abstract IS - 1 ER - TY - JOUR T1 - Bayesian estimation of bipartite matchings for record linkage JF - Journal of the American Statistical Association Y1 - 2017 A1 - Mauricio Sadinle KW - Assignment problem KW - Bayes estimate KW - Data matching KW - Fellegi-Sunter decision rule KW - Mixture model KW - Rejection option AB - The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador. VL - 112 UR - http://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1148612 IS - 518 ER - TY - JOUR T1 - Cost-Benefit Analysis for a Quinquennial Census: The 2016 Population Census of South Africa JF - Journal of Official Statistics Y1 - 2017 A1 - Spencer, Bruce D. A1 - May, Julian A1 - Kenyon, Steven A1 - Seeskin, Zachary KW - demographic statistics KW - fiscal allocations KW - loss function KW - population estimates KW - post-censal estimates AB - The question of whether to carry out a quinquennial Census is faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. We describe uses and limitations of cost-benefit analysis in this decision problem in the case of the 2016 Census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 Census or to rely on increasingly inaccurate postcensal estimates accounting for births, deaths, and migration since the previous (2011) Census. The cost-benefit analysis compared predicted costs of the 2016 Census to the benefits of improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 Census, although not the only important benefit. Without the 2016 Census, allocations would be based on population estimates. Accuracy of the postcensal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 Census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 Census, but instead to improve data and capacity for producing post-censal estimates. VL - 33 SN - 2001-7367 UR - https://www.degruyter.com/view/j/jos.2017.33.issue-1/jos-2017-0013/jos-2017-0013.xml IS - 1 ER - TY - JOUR T1 - Do Interviewer Post-survey Evaluations of Respondents Measure Who Respondents Are or What They Do? A Behavior Coding Study JF - Public Opinion Quarterly Y1 - 2017 A1 - Kirchner, Antje A1 - Olson, Kristen A1 - Smyth, Jolene D. AB - Survey interviewers are often tasked with assessing the quality of respondents’ answers after completing a survey interview. These interviewer observations have been used to proxy for measurement error in interviewer-administered surveys. How interviewers formulate these evaluations and how well they proxy for measurement error has received little empirical attention. According to dual-process theories of impression formation, individuals form impressions about others based on the social categories of the observed person (e.g., sex, race) and individual behaviors observed during an interaction. Although initial impressions start with heuristic, rule-of-thumb evaluations, systematic processing is characterized by extensive incorporation of available evidence. In a survey context, if interviewers default to heuristic information processing when evaluating respondent engagement, then we expect their evaluations to be primarily based on respondent characteristics and stereotypes associated with those characteristics. Under systematic processing, on the other hand, interviewers process and evaluate respondents based on observable respondent behaviors occurring during the question-answering process. We use the Work and Leisure Today Survey, including survey data and behavior codes, to examine proxy measures of heuristic and systematic processing by interviewers as predictors of interviewer postsurvey evaluations of respondents’ cooperativeness, interest, friendliness, and talkativeness. Our results indicate that CATI interviewers base their evaluations on actual behaviors during an interview (i.e., systematic processing) rather than perceived characteristics of the respondent or the interviewer (i.e., heuristic processing). These results are reassuring for the many surveys that collect interviewer observations as proxies for data quality. UR - https://doi.org/10.1093/poq/nfx026 ER - TY - RPRT T1 - Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Y1 - 2017 A1 - Weinberg, Daniel A1 - Abowd, John M. A1 - Belli, Robert F. A1 - Cressie, Noel A1 - Folch, David C. A1 - Holan, Scott H. A1 - Levenstein, Margaret C. A1 - Olson, Kristen M. A1 - Reiter, Jerome P. A1 - Shapiro, Matthew D. A1 - Smyth, Jolene A1 - Soh, Leen-Kiat A1 - Spencer, Bruce A1 - Spielman, Seth E. A1 - Vilhuber, Lars A1 - Wikle, Christopher AB -

Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52650 ER - TY - RPRT T1 - Formal Privacy Models and Title 13 Y1 - 2017 A1 - Nissim, Kobbi A1 - Gasser, Urs A1 - Smith, Adam A1 - Vadhan, Salil A1 - O'Brien, David A1 - Wood, Alexandra AB - Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52164 ER - TY - JOUR T1 - Itemwise conditionally independent nonresponse modeling for incomplete multivariate data JF - Biometrika Y1 - 2017 A1 - M. Sadinle A1 - J.P. Reiter KW - Loglinear model KW - Missing not at random KW - Missingness mechanism KW - Nonignorable KW - Nonparametric saturated KW - Sensitivity analysis AB - We introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness mechanism, in that nonresponse for any item can depend on values of other items that are themselves missing. We show that, under this itemwise conditionally independent nonresponse assumption, one can define and identify nonparametric saturated classes of joint multivariate models for the study variables and their missingness indicators. We also show how to perform sensitivity analysis to violations of the conditional independence assumptions encoded by this missingness mechanism. Throughout, we illustrate the use of this modeling approach with data analyses. VL - 104 UR - https://doi.org/10.1093/biomet/asw063 IS - 1 ER - TY - JOUR T1 - Itemwise conditionally independent nonresponse modeling for multivariate categorical data JF - Biometrika Y1 - 2017 A1 - Sadinle, M. A1 - Reiter, J. P. KW - Identification KW - Missing not at random KW - Non-parametric saturated KW - Partial ignorability KW - Sensitivity analysis AB - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated. VL - 104 ER - TY - JOUR T1 - Modeling Endogenous Mobility in Earnings Determination JF - Journal of Business & Economic Statistics Y1 - 2017 A1 - John M. Abowd A1 - Kevin L. Mckinney A1 - Ian M. Schmutte AB - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The mobility-bias corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. UR - http://dx.doi.org/10.1080/07350015.2017.1356727 ER - TY - RPRT T1 - Modeling Endogenous Mobility in Wage Determination Y1 - 2017 A1 - John M. Abowd A1 - Kevin L. Mckinney A1 - Ian M. Schmutte AB - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. UR - http://digitalcommons.ilr.cornell.edu/ldi/28/ ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Y1 - 2017 A1 - Nissim, Kobbi A1 - Gasser, Urs A1 - Smith, Adam A1 - Vadhan, Salil A1 - O'Brien, David A1 - Wood, Alexandra AB - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52164 ER - TY - RPRT T1 - Presentation: Introduction to Stan for Markov Chain Monte Carlo Y1 - 2017 A1 - Simpson, Matthew AB - Presentation: Introduction to Stan for Markov Chain Monte Carlo Simpson, Matthew An introduction to Stan (http://mc-stan.org/): a probabilistic programming language that implements Hamiltonian Monte Carlo (HMC), variational Bayes, and (penalized) maximum likelihood estimation. Presentation given at the U.S. Census Bureau on April 25, 2017. PB - University of Missouri UR - http://hdl.handle.net/1813/52656 ER - TY - RPRT T1 - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy Y1 - 2017 A1 - Vilhuber, Lars A1 - Schmutte, Ian AB - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian On October 14, 2016, we hosted a workshop that brought together economists, survey statisticians, and computer scientists with expertise in the field of privacy preserving methods: Census Bureau staff working on implementing cutting-edge methods in the Bureau’s flagship public-use products mingled with academic researchers from a variety of universities. The four products discussed as part of the workshop were 1. the American Community Survey (ACS); 2. Longitudinal Employer-Household Data (LEHD), in particular the LEHD Origin-Destination Employment Statistics (LODES); the 3. 2020 Decennial Census; and the 4. 2017 Economic Census. The goal of the workshop was to 1. Discuss the specific challenges that have arisen in ongoing efforts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers 2. Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas. PB - Cornell University UR - http://hdl.handle.net/1813/46197 ER - TY - RPRT T1 - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy Y1 - 2017 A1 - Vilhuber, Lars A1 - Schmutte, Ian M. AB - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian M. ese proceedings report on a workshop hosted at the U.S. Census Bureau on May 8, 2017. Our purpose was to gather experts from various backgrounds together to continue discussing the development of formal privacy systems for Census Bureau data products. is workshop was a successor to a previous workshop held in October 2016 (Vilhuber & Schmu e 2017). At our prior workshop, we hosted computer scientists, survey statisticians, and economists, all of whom were experts in data privacy. At that time we discussed the practical implementation of cu ing-edge methods for publishing data with formal, provable privacy guarantees, with a focus on applications to Census Bureau data products. e teams developing those applications were just starting out when our rst workshop took place, and we spent our time brainstorming solutions to the various problems researchers were encountering, or anticipated encountering. For these cu ing-edge formal privacy models, there had been very li le e ort in the academic literature to apply those methods in real-world se ings with large, messy data. We therefore brought together an expanded group of specialists from academia and government who could shed light on technical challenges, subject ma er challenges and address how data users might react to changes in data availability and publishing standards. In May 2017, we organized a follow-up workshop, which these proceedings report on. We reviewed progress made in four di erent areas. e four topics discussed as part of the workshop were 1. the 2020 Decennial Census; 2. the American Community Survey (ACS); 3. the 2017 Economic Census; 4. measuring the demand for privacy and for data quality. As in our earlier workshop, our goals were to 1. Discuss the speci c challenges that have arisen in ongoing e orts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers; 2. Produce short wri en memos that summarize concrete suggestions for practical applications to speci c Census Bureau priority areas. Comments can be provided at h ps://goo.gl/ZAh3YE PB - Cornell University UR - http://hdl.handle.net/1813/52473 ER - TY - RPRT T1 - Proceedings from the Synthetic LBD International Seminar Y1 - 2017 A1 - Vilhuber, Lars A1 - Kinney, Saki A1 - Schmutte, Ian M. AB - Proceedings from the Synthetic LBD International Seminar Vilhuber, Lars; Kinney, Saki; Schmutte, Ian M. On May 9, 2017, we hosted a seminar to discuss the conditions necessary to implement the SynLBD approach with interested parties, with the goal of providing a straightforward toolkit to implement the same procedure on other data. The proceedings summarize the discussions during the workshop. PB - Cornell University UR - http://hdl.handle.net/1813/52472 ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - John M. Abowd A1 - Ian M. Schmutte AB - We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial. JF - Labor Dynamics Institute Document UR - http://digitalcommons.ilr.cornell.edu/ldi/37/ ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial. A complete archive of the data and programs used in this paper is available via http://doi.org/10.5281/zenodo.345385. PB - Cornell University UR - http://hdl.handle.net/1813/39081 ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52612 ER - TY - ABST T1 - Sequential Prediction of Respondent Behaviors Leading to Error in Web-based Surveys Y1 - 2017 A1 - Eck, Adam A1 - Soh, Leen-Kiat ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2017 A1 - John M. Abowd A1 - Francis Kramarz A1 - Sebastien Perez-Duarte A1 - Ian M. Schmutte AB - We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Labor Dynamics Institute UR - http://digitalcommons.ilr.cornell.edu/ldi/40/ ER - TY - RPRT T1 - Unique Entity Estimation with Application to the Syrian Conflict Y1 - 2017 A1 - Chen, B. A1 - Shrivastava, A. A1 - Steorts, R. C. KW - Computer Science - Data Structures and Algorithms KW - Computer Science - Databases KW - Statistics - Applications AB - Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the data generation process, error rates, and computational scalability that make it a difficult task for real applications. In this paper, we focus on a related problem of unique entity estimation, which is the task of estimating the unique number of entities and associated standard errors in a data set with duplicate entities. Unique entity estimation shares many fundamental challenges of entity resolution, namely, that the computational cost of all-to-all entity comparisons is intractable for large databases. To circumvent this computational barrier, we propose an efficient (near-linear time) estimation algorithm based on locality sensitive hashing. Our estimator, under realistic assumptions, is unbiased and has provably low variance compared to existing random sampling based approaches. In addition, we empirically show its superiority over the state-of-the-art estimators on three real applications. The motivation for our work is to derive an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict. Our methodology, when applied to the Syrian data set, provides an estimate of $191,874 \pm 1772$ documented, identifiable deaths, which is very close to the Human Rights Data Analysis Group (HRDAG) estimate of 191,369. Our work provides an example of challenges and efforts involved in solving a real, noisy challenging problem where modeling assumptions may not hold. JF - arXiv UR - https://arxiv.org/abs/1710.02690 ER - TY - JOUR T1 - A Bayesian Approach to Graphical Record Linkage and Deduplication JF - Journal of the American Statistical Association Y1 - 2016 A1 - Rebecca C. Steorts A1 - Rob Hall A1 - Stephen E. Fienberg AB - ABSTRACTWe propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online. VL - 111 UR - http://dx.doi.org/10.1080/01621459.2015.1105807 ER - TY - JOUR T1 - Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples JF - Annals of Applied Statistics Y1 - 2016 A1 - Y. Si A1 - J. P. Reiter A1 - D. S. Hillygus VL - 10 UR - http://projecteuclid.org/euclid.aoas/1458909910 ER - TY - ABST T1 - Data management and analytic use of paradata: SIPP-EHC audit trails Y1 - 2016 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Córdova Cazar, Ana Lucía A1 - Eck, Adam A1 - Kirchner, Antje A1 - Belli, Robert F. ER - TY - JOUR T1 - Differentially private publication of data on wages and job mobility JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Schmutte, Ian M. KW - Demand for public statistics KW - differential privacy KW - job mobility KW - matched employer-employee data KW - optimal confidentiality protection KW - optimal data accuracy KW - technology for statistical agencies AB - Brazil, like many countries, is reluctant to publish business-level data, because of legitimate concerns about the establishments' confidentiality. A trusted data curator can increase the utility of data, while managing the risk to establishments, either by releasing synthetic data, or by infusing noise into published statistics. This paper evaluates the application of a differentially private mechanism to publish statistics on wages and job mobility computed from Brazilian employer-employee matched data. The publication mechanism can result in both the publication of specific statistics as well as the generation of synthetic data. I find that the tradeoff between the privacy guaranteed to individuals in the data, and the accuracy of published statistics, is potentially much better that the worst-case theoretical accuracy guarantee. However, the synthetic data fare quite poorly in analyses that are outside the set of queries to which it was trained. Note that this article only explores and characterizes the feasibility of these publication strategies, and will not directly result in the publication of any data. VL - 32 UR - http://content.iospress.com/articles/statistical-journal-of-the-iaos/sji962 IS - 1 ER - TY - JOUR T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors JF - Survey Practice Y1 - 2016 A1 - Olson, Kristen A1 - Kirchner, Antje A1 - Smyth, Jolene D. AB - Interviewers are required to be flexible in responding to respondent concerns during recruitment, but standardized during administration of the questionnaire. These skill sets may be at odds. Recent research has shown a U-shaped relationship between interviewer cooperation rates and interviewer variance: the least and the most successful interviewers during recruitment have the largest interviewer variance components. Little is known about why this association occurs. We posit four hypotheses for this association: 1) interviewers with higher cooperation rates more conscientious interviewers altogether, 2) interviewers with higher cooperation rates continue to use rapport behaviors from the cooperation request throughout an interview, 3) interviewers with higher cooperation rates display more confidence which translates into different interview behavior, and 4) interviewers with higher cooperation rates continue their flexible interviewing style throughout the interview and deviate more from standardized interviewing. We use behavior codes from the Work and Leisure Today Survey (n=450, AAPOR RR3=6.3%) to evaluate interviewer behavior. Our results largely support the confidence hypothesis. Interviewers with higher cooperation rates do not show evidence of being “better” interviewers. VL - 9 UR - http://www.surveypractice.org/index.php/SurveyPractice/article/view/351 IS - 2 ER - TY - RPRT T1 - Estimating Compensating Wage Differentials with Endogenous Job Mobility Y1 - 2016 A1 - Kurt Lavetti A1 - Ian M. Schmutte AB - We demonstrate a strategy for using matched employer-employee data to correct endogenous job mobility bias when estimating compensating wage differentials. Applied to fatality rates in the census of formal-sector jobs in Brazil between 2003-2010, we show why common approaches to eliminating ability bias can greatly amplify endogenous job mobility bias. By extending the search-theoretic hedonic wage frame- work, we establish conditions necessary to interpret our estimates as preferences. We present empirical analyses supporting the predictions of the model and identifying conditions, demonstrating that the standard models are misspecified, and that our proposed model eliminates latent ability and endogenous mobility biases. UR - http://digitalcommons.ilr.cornell.edu/ldi/29/ ER - TY - JOUR T1 - How Should We Define Low-Wage Work? An Analysis Using the Current Population Survey JF - Monthly Labor Review Y1 - 2016 A1 - Fusaro, V. A1 - Shaefer, H. Luke AB - Low-wage work is a central concept in considerable research, yet it lacks an agreed-upon definition. Using data from the Current Population Survey’s Annual Social and Economic Supplement, the analysis presented in this article suggests that defining low-wage work on the basis of alternative hourly wage cutoffs changes the size of the low-wage population, but does not noticeably alter time trends in the rate of change. The analysis also indicates that different definitions capture groups of workers with substantively different demographic, social, and economic characteristics. Although the individuals in any of the categories examined might reasonably be considered low-wage workers, a single definition obscures these distinctions. UR - http://www.bls.gov/opub/mlr/2016/article/pdf/how-should-we-define-low-wage-work.pdf ER - TY - JOUR T1 - Incorporating marginal prior information into latent class models JF - Bayesian Analysis Y1 - 2016 A1 - Schifeling, T. S. A1 - Reiter, J. P. VL - 11 UR - https://projecteuclid.org/euclid.ba/1434649584 ER - TY - JOUR T1 - Measuring Poverty Using the Supplemental Poverty Measure in the Panel Study of Income Dynamics, 1998 to 2010 JF - Journal of Economic and Social Measurement Y1 - 2016 A1 - Kimberlin, S. A1 - Shaefer, H.L. A1 - Kim, J. AB - The Supplemental Poverty Measure (SPM) was recently introduced by the U.S. Census Bureau as an alternative measure of poverty that addresses many shortcomings of the official poverty measure (OPM) to better reflect the resources households have available to meet their basic needs. The Census SPM is available only in the Current Population Survey (CPS). This paper describes a method for constructing SPM poverty estimates in the Panel Study of Income Dynamics (PSID), for the biennial years 1998 through 2010. A public-use dataset of individual-level SPM status produced in this analysis will be available for download on the PSID website. Annual SPM poverty estimates from the PSID are presented for the years 1998, 2000, 2002, 2004, 2006, 2008, and 2010 and compared to SPM estimates for the same years derived from CPS data by the Census Bureau and independent researchers. We find that SPM poverty rates in the PSID are somewhat lower than those found in the CPS, though trends over time and impact of specific SPM components are similar across the two datasets. VL - 41 UR - http://content.iospress.com/articles/journal-of-economic-and-social-measurement/jem425 IS - 1 ER - TY - ABST T1 - Mismatches Y1 - 2016 A1 - Smyth, Jolene A1 - Olson, Kristen ER - TY - RPRT T1 - Modeling Endogenous Mobility in Earnings Determination Y1 - 2016 A1 - Abowd, John M. A1 - McKinney, Kevin L. A1 - Schmutte, Ian M. AB - Modeling Endogenous Mobility in Earnings Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. Replication code can be found at DOI: http://doi.org/10.5281/zenodo.zenodo.376600 and our Github repository endogenous-mobility-replication . PB - Cornell University UR - http://hdl.handle.net/1813/40306 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits Y1 - 2016 A1 - Spencer, Bruce D. AB - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits Spencer, Bruce D. Census costs affect data quality and data quality affects census benefits. Although measuring census data quality is difficult enough ex post, census planning requires it to be done well in advance. The topic of this talk is the prediction of the cost-quality curve, its uncertainty, and its relation to benefits from census data. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - Northwestern University UR - http://hdl.handle.net/1813/43897 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study Y1 - 2016 A1 - Mccue, Kristin A1 - Abowd, John A1 - Levenstein, Margaret A1 - Patki, Dhiren A1 - Rodgers, Ann A1 - Shapiro, Matthew A1 - Wasi, Nada AB - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Michigan UR - http://hdl.handle.net/1813/43895 ER - TY - JOUR T1 - Spatial Variation in the Quality of American Community Survey Estimates JF - Demography Y1 - 2016 A1 - Folch, David C. A1 - Arribas-Bel, Daniel A1 - Koschinsky, Julia A1 - Spielman, Seth E. VL - 53 ER - TY - THES T1 - Topics on Official Statistics and Statistical Policy T2 - Statistics Y1 - 2016 A1 - Zachary Seeskin AB - My dissertation studies decision questions for government statistical agencies, both regarding data collection and how to combine data from multiple sources. Informed decisions regarding expenditure on data collection require information about the effects of data quality on data use. For the first topic, I study two important uses of decennial census data in the U.S.: for apportioning the House of Representatives and for allocating federal funds. Estimates of distortions in these two uses are developed for different levels of census accuracy. Then, I thoroughly investigate the sensitivity of findings to the census error distribution and to the choice of how to measure the distortions. The chapter concludes with a proposed framework for partial cost-benefit analysis that charges a share of the cost of the census to allocation programs. Then, I investigate an approximation to make analysis of the effects of census error on allocations feasible when allocations also depend on non-census statistics, as is the case for many formula-based allocations. The approximation conditions on the realized values of the non-census statistics instead of using the joint distribution over both census and non-census statistics. The research studies how using the approximation affects conclusions. I find that in some simple cases, the approximation always either overstates or equals the true effects of census error. Understatement is possible in other cases, but theory suggests that the largest possible understatements are about one-third the amount of the largest possible overstatements. In simulations with a more complex allocation formula, the approximation tends to overstate the effects of census error with the overstatement increasing with error in non-census statistics but decreasing with error in census statistics. In the final chapter, I evaluate the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, I evaluate the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. I examine three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compare how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method. JF - Statistics PB - Northwestern University CY - Evanston, Illinois VL - PHD UR - http://search.proquest.com/docview/1826016819 ER - TY - JOUR T1 - Using Data Mining to Predict the Occurrence of Respondent Retrieval Strategies in Calendar Interviewing: The Quality of Retrospective Reports JF - Journal of Official Statistics Y1 - 2016 A1 - Belli, Robert F. A1 - Miller, L. Dee A1 - Baghal, Tarek Al A1 - Soh, Leen-Kiat AB - Determining which verbal behaviors of interviewers and respondents are dependent on one another is a complex problem that can be facilitated via data-mining approaches. Data are derived from the interviews of 153 respondents of the Panel Study of Income Dynamics (PSID) who were interviewed about their life-course histories. Behavioral sequences of interviewer-respondent interactions that were most predictive of respondents spontaneously using parallel, timing, duration, and sequential retrieval strategies in their generation of answers were examined. We also examined which behavioral sequences were predictive of retrospective reporting data quality as shown by correspondence between calendar responses with responses collected in prior waves of the PSID. The verbal behaviors of immediately preceding interviewer and respondent turns of speech were assessed in terms of their co-occurrence with each respondent retrieval strategy. Interviewers’ use of parallel probes is associated with poorer data quality, whereas interviewers’ use of timing and duration probes, especially in tandem, is associated with better data quality. Respondents’ use of timing and duration strategies is also associated with better data quality and both strategies are facilitated by interviewer timing probes. Data mining alongside regression techniques is valuable to examine which interviewer-respondent interactions will benefit data quality. VL - 32 IS - 3 ER - TY - JOUR T1 - Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples JF - Journal of Survey Statistics and Methodology Y1 - 2015 A1 - Schifeling, T. A1 - Cheng, C. A1 - Hillygus, D. S. A1 - Reiter, J. P. AB - Panel surveys typically su↵er from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, panel data alone cannot inform the extent of the bias from the attrition, so that analysts using the panel data alone must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during some later wave of the panel. Refreshment samples o↵er information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the refreshment sample itself. As we illustrate, nonignorable unit nonresponse can significantly compromise the analyst’s ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences—corrected for panel attrition—are to di↵erent assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. VL - 3 UR - http://jssam.oxfordjournals.org/content/3/3/265.abstract IS - 3 ER - TY - JOUR T1 - Bayesian Analysis of Spatially-Dependent Functional Responses with Spatially-Dependent Multi-Dimensional Functional Predictors JF - Statistica Sinica Y1 - 2015 A1 - Yang, W. H. A1 - Wikle, C.K. A1 - Holan, S.H. A1 - Sudduth, K. A1 - Meyers, D.B. VL - 25 UR - http://www3.stat.sinica.edu.tw/preprint/SS-13-245w_Preprint.pdf ER - TY - JOUR T1 - Bayesian Latent Pattern Mixture Models for Handling Attrition in Panel Studies With Refreshment Samples JF - ArXiv Y1 - 2015 A1 - Yajuan Si A1 - Jerome P. Reiter A1 - D. Sunshine Hillygus KW - Categorical KW - Dirichlet pro- cess KW - Multiple imputation KW - Non-ignorable KW - Panel attrition KW - Refreshment sample AB - Many panel studies collect refreshment samples---new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by non-ignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. UR - http://arxiv.org/abs/1509.02124 IS - 1509.02124 ER - TY - RPRT T1 - Blocking Methods Applied to Casualty Records from the Syrian Conflict Y1 - 2015 A1 - Sadosky, Peter A1 - Shrivastava, Anshumali A1 - Price, Megan A1 - Steorts, Rebecca JF - ArXiv UR - http://arxiv.org/abs/1510.07714 ER - TY - JOUR T1 - Capturing multivariate spatial dependence: Model, estimate, and then predict JF - Statistical Science Y1 - 2015 A1 - Cressie, N. A1 - Burden, S. A1 - Davis, W. A1 - Krivitsky, P. A1 - Mokhtarian, P. A1 - Seusse, T. A1 - Zammit-Mangion, A. VL - 30 UR - http://projecteuclid.org/euclid.ss/1433341474 IS - 2 ER - TY - JOUR T1 - Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0415-1 IS - 1 ER - TY - RPRT T1 - Cost-Benefit Analysis for a Quinquennial Census: The 2016 Population Census of South Africa. Y1 - 2015 A1 - Spencer, Bruce D. A1 - May, Julian A1 - Kenyon, Steven A1 - Seeskin, Zachary H. KW - demographic statistics KW - fiscal allocations KW - loss function KW - population estimates KW - post-censal estimates AB -

The question of whether to carry out a quinquennial census is being faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. The authors describe uses, and limitations, of cost-benefit analysis for this decision problem in the case of the 2016 census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 census or to rely on increasingly inaccurate post-censal estimates accounting for births, deaths, and migration since the previous (2011) census. The cost-benefit analysis compared predicted costs of the 2016 census to the benefits from improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 census, although not the only important benefit. Without the 2016 census, allocations would be based on population estimates. Accuracy of the post-censal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 census, but instead to improve data and capacity for producing post-censal estimates.

JF - IPR Working Paper Series PB - Northwestern University, Institute for Policy Research UR - http://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-06.html ER - TY - CONF T1 - Determining Potential for Breakoff in Time Diary Survey Using Paradata T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Wettlaufer, D. A1 - Arunachalam, H. A1 - Atkin, G. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors T2 - International Conference on Total Survey Error Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. A1 - Kirchner, A. JF - International Conference on Total Survey Error CY - Baltimore, MD UR - http://www.niss.org/events/2015-international-total-survey-error-conference ER - TY - CONF T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors T2 - Joint Statistical Meetings Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. A1 - Kirchner, A. JF - Joint Statistical Meetings CY - Seattle, WA UR - http://www.amstat.org/meetings/jsm/2015/program.cfm ER - TY - RPRT T1 - Economic Analysis and Statistical Disclosure Limitation Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian M. AB -

Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies.

PB - Cornell University UR - http://hdl.handle.net/1813/40581 ER - TY - JOUR T1 - Economic Analysis and Statistical Disclosure Limitation JF - Brookings Papers on Economic Activity Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian M. AB - Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies. VL - Spring 2015 UR - http://www.brookings.edu/about/projects/bpea/papers/2015/economic-analysis-statistical-disclosure-limitation ER - TY - JOUR T1 - The Effect of CATI Questionnaire Design Features on Response Timing JF - Journal of Survey Statistics and Methodology Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. VL - 3 IS - 3 ER - TY - RPRT T1 - Effects of Census Accuracy on Apportionment of Congress and Allocations of Federal Funds. Y1 - 2015 A1 - Seeskin, Zachary H. A1 - Spencer, Bruce D. AB -

How much accuracy is needed in the 2020 census depends on the cost of attaining accuracy and on the consequences of imperfect accuracy. The cost target for the 2020 census of the United States has been specified, and the Census Bureau is developing projections of the accuracy attainable for that cost. It is desirable to have information about the consequences of the accuracy that might be attainable for that cost or for alternative cost levels. To assess the consequences of imperfect census accuracy, Seeskin and Spencer consider alternative profiles of accuracy for states and assess their implications for apportionment of the U.S. House of Representatives and for allocation of federal funds. An error in allocation is defined as the difference between the allocation computed under imperfect data and the allocation computed with perfect data. Estimates of expected sums of absolute values of errors are presented for House apportionment and for federal funds allocations.

JF - IPR Working Paper Series PB - Northwestern University, Institute for Policy Research UR - http://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-05.html ER - TY - JOUR T1 - Entity Resolution with Empirically Motivated Priors JF - Bayesian Anal. Y1 - 2015 A1 - Steorts, Rebecca C. AB - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian-type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey on income and wealth, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters. VL - 10 UR - http://dx.doi.org/10.1214/15-BA965SI ER - TY - JOUR T1 - Entity resolution with empirically motivated priors JF - Bayesian Analysis Y1 - 2015 A1 - Steorts, Rebecca C. AB - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian--type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters. VL - 10 UR - http://projecteuclid.org/euclid.ba/1441790411 IS - 5 ER - TY - JOUR T1 - Expanding the Discourse on Antipoverty Policy: Reconsidering a Negative Income Tax JF - Journal of Poverty Y1 - 2015 A1 - Jessica Wiederspan A1 - Elizabeth Rhodes A1 - H. Luke Shaefer KW - economic well-being KW - poverty alleviation KW - public policy KW - social welfare policy AB - This article proposes that advocates for the poor consider the replacement of the current means-tested safety net in the United States with a Negative Income Tax (NIT), a guaranteed income program that lifts families’ incomes above a minimum threshold. The article highlights gaps in service provision that leave millions in poverty, explains how a NIT could help fill those gaps, and compares current expenditures on major means-tested programs to estimated expenditures necessary for a NIT. Finally, it addresses the financial and political concerns that are likely to arise in the event that a NIT proposal gains traction among policy makers. VL - 19 UR - http://dx.doi.org/10.1080/10875549.2014.991889 ER - TY - RPRT T1 - How individuals smooth spending: Evidence from the 2013 government shutdown using account data Y1 - 2015 A1 - Gelman, Michael A1 - Kariv, Shachar A1 - Shapiro, Matthew D A1 - Silverman, Dan A1 - Tadelis, Steven AB - Using comprehensive account records, this paper examines how individuals adjusted spending and saving in response to a temporary drop in income due to the 2013 U.S. government shutdown. The shutdown cut paychecks by 40% for affected employees, which was recovered within 2 weeks. Though the shock was short-lived and completely reversed, spending dropped sharply implying a naïve estimate of the marginal propensity to spend of 0.58. This estimate overstates how consumption responded. While many individuals had low liquidity, they used multiple strategies to smooth consumption including delay of recurring payments such as mortgages and credit card balances. PB - National Bureau of Economic Research ER - TY - CONF T1 - I Know What You Did Next: Predicting Respondent’s Next Activity Using Machine Learning T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Arunachalam, H. A1 - Atkin, G. A1 - Eck, A. A1 - Wettlaufer, D. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Introduction to The Survey of Income and Program Participation (SIPP) Y1 - 2015 A1 - Shaefer, H. Luke AB - Introduction to The Survey of Income and Program Participation (SIPP) Shaefer, H. Luke Goals for the SIPP Workshop Provide you with an introduction to the SIPP and get you up and running on the public-use SIPP files, offer some advanced tools for 2008 Panel SIPP data analysis, Get you some experience analyzing SIPP data, Introduce you to the SIPP EHC (SIPP Redesign), Introduce you to the SIPP Synthetic Beta (SSB) Presentation made on May 15, 2015 at the Census Bureau, and previously in 2014 at Duke University and University of Michigan PB - University of Michigan UR - http://hdl.handle.net/1813/40169 ER - TY - RPRT T1 - Modeling Endogenous Mobility in Wage Determination Y1 - 2015 A1 - Abowd, John M. A1 - McKinney, Kevin L. A1 - Schmutte, Ian M. AB - Modeling Endogenous Mobility in Wage Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. PB - Cornell University UR - http://hdl.handle.net/1813/40306 ER - TY - RPRT T1 - Modeling Endogenous Mobility in Wage Determination Y1 - 2015 A1 - Abowd, John M. A1 - McKinney, Kevin L. A1 - Schmutte, Ian M. AB - Modeling Endogenous Mobility in Wage Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-ofsample estimates of revenue per worker. The mobility-bias corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52608 ER - TY - JOUR T1 - Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis JF - Statistics in Medicine Y1 - 2015 A1 - Siddique, J. A1 - Reiter, J. P. A1 - Brincks, A. A1 - Gibbons, R. A1 - Crespi, C. A1 - Brown, C. H. UR - http://onlinelibrary.wiley.com/doi/10.1002/sim.6562/abstract ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey Y1 - 2015 A1 - Spielman, Seth AB - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey Spielman, Seth Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40183 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian AB - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John M.; Schmutte, Ian Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40184 ER - TY - CONF T1 - Predicting Breakoff Using Sequential Machine Learning Methods T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Soh, L.-K. A1 - Eck, A. A1 - McCutcheon, A.L. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - THES T1 - Probabilistic Hashing Techniques For Big Data T2 - Computer Science Y1 - 2015 A1 - Anshumali Shrivastava AB - We investigate probabilistic hashing techniques for addressing computational and memory challenges in large scale machine learning and data mining systems. In this thesis, we show that the traditional idea of hashing goes far beyond near-neighbor search and there are some striking new possibilities. We show that hashing can improve state of the art large scale learning algorithms, and it goes beyond the conventional notions of pairwise similarities. Despite being a very well studied topic in literature, we found several opportunities for fundamentally improving some of the well know textbook hashing algorithms. In particular, we show that the traditional way of computing minwise hashes is unnecessarily expensive and without loosing anything we can achieve an order of magnitude speedup. We also found that for cosine similarity search there is a better scheme than SimHash. In the end, we show that the existing locality sensitive hashing framework itself is very restrictive, and we cannot have efficient algorithms for some important measures like inner products which are ubiquitous in machine learning. We propose asymmetric locality sensitive hashing (ALSH), an extended framework, where we show provable and practical efficient algorithms for Maximum Inner Product Search (MIPS). Having such an efficient solutions to MIPS directly scales up many popular machine learning algorithms. We believe that this thesis provides significant improvements to some of the heavily used subroutines in big-data systems, which we hope will be adopted. JF - Computer Science PB - Cornell University VL - Ph.D. UR - https://ecommons.cornell.edu/handle/1813/40886 ER - TY - THES T1 - Ranking Firms Using Revealed Preference and Other Essays About Labor Markets T2 - Department of Economics Y1 - 2015 A1 - Isaac Sorkin KW - economics KW - labor markets AB - This dissertation contains essays on three questions about the labor market. Chapter 1 considers the question: why do some firms pay so much and some so little? Firms account for a substantial portion of earnings inequality. Although the standard explanation is that there are search frictions that support an equilibrium with rents, this chapter finds that compensating differentials for nonpecuniary characteristics are at least as important. To reach this finding, this chapter develops a structural search model and estimates it on U.S. administrative data. The model analyzes the revealed preference information in the labor market: specifically, how workers move between the 1.5 million firms in the data. With on the order of 1.5 million parameters, standard estimation approaches are infeasible and so the chapter develops a new estimation approach that is feasible on such big data. Chapter 2 considers the question: why do men and women work at different firms? Men work for higher-paying firms than women. The chapter builds on chapter 1 to consider two explanations for why men and women work in different firms. First, men and women might search from different offer distributions. Second, men and women might have different rankings of firms. Estimation finds that the main explanation for why men and women are sorted is that women search from a lower-paying offer distribution than men. Indeed, men and women are estimated to have quite similar rankings of firms. Chapter 3 considers the question: what are there long-run effects of the minimum wage? An empirical consensus suggests that there are small employment effects of minimum wage increases. This chapter argues that these are short-run elasticities. Long-run elasticities, which may differ from short-run elasticities, are more policy relevant. This chapter develops a dynamic industry equilibrium model of labor demand. The model makes two points. First, long-run regressions have been misinterpreted because even if the short- and long-run employment elasticities differ, standard methods would not detect a difference using U.S. variation. Second, the model offers a reconciliation of the small estimated short-run employment effects with the commonly found pass-through of minimum wage increases to product prices. JF - Department of Economics PB - University of Michigan CY - Ann Arbor, MI UR - http://hdl.handle.net/2027.42/116747 ER - TY - CONF T1 - Recording What the Respondent Says: Does Question Format Matter? T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Smyth, J.D. A1 - Olson, K. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Reducing the Margins of Error in the American Community Survey Through Data-Driven Regionalization JF - PlosOne Y1 - 2015 A1 - Folch, D. A1 - Spielman, S. E. UR - http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115626 ER - TY - JOUR T1 - Rejoinder on: Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0414-2 IS - 1 ER - TY - JOUR T1 - The SAR model for very large datasets: A reduced-rank approach JF - Econometrics Y1 - 2015 A1 - Burden, S. A1 - Cressie, N. A1 - Steel, D.G. VL - 3 UR - http://www.mdpi.com/2225-1146/3/2/317 IS - 2 ER - TY - JOUR T1 - Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples JF - Political Analysis Y1 - 2015 A1 - Y. Si A1 - J.P. Reiter A1 - D.S. Hillygus VL - 23 UR - http://pan.oxfordjournals.org/cgi/reprint/mpu009?%20ijkey=joX8eSl6gyIlQKP&keytype=ref ER - TY - JOUR T1 - Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach JF - Annals of the Association of American Geographers Y1 - 2015 A1 - Seth E. Spielman A1 - Alex Singleton AB - In 2010 the American Community Survey (ACS) replaced the long form of the decennial census as the sole national source of demographic and economic data for small geographic areas such as census tracts. These small area estimates suffer from large margins of error, however, which makes the data difficult to use for many purposes. The value of a large and comprehensive survey like the ACS is that it provides a richly detailed, multivariate, composite picture of small areas. This article argues that one solution to the problem of large margins of error in the ACS is to shift from a variable-based mode of inquiry to one that emphasizes a composite multivariate picture of census tracts. Because the margin of error in a single ACS estimate, like household income, is assumed to be a symmetrically distributed random variable, positive and negative errors are equally likely. Because the variable-specific estimates are largely independent from each other, when looking at a large collection of variables these random errors average to zero. This means that although single variables can be methodologically problematic at the census tract scale, a large collection of such variables provides utility as a contextual descriptor of the place(s) under investigation. This idea is demonstrated by developing a geodemographic typology of all U.S. census tracts. The typology is firmly rooted in the social scientific literature and is organized around a framework of concepts, domains, and measures. The typology is validated using public domain data from the City of Chicago and the U.S. Federal Election Commission. The typology, as well as the data and methods used to create it, is open source and published freely online. VL - 105 UR - http://dx.doi.org/10.1080/00045608.2015.1052335 ER - TY - JOUR T1 - Understanding the Dynamics of $2-a-Day Poverty in the United States JF - The Russell Sage Foundation Journal of the Social Sciences Y1 - 2015 A1 - Shaefer, H. Luke A1 - Edin, Kathryn A1 - Talbert, E. VL - 1 IS - Severe Deprivation ER - TY - JOUR T1 - Understanding the Human Condition through Survey Informatics JF - IEEE Computer Y1 - 2015 A1 - Eck, A. A1 - Leen-Kiat, S. A1 - McCutcheon, A. L. A1 - Smyth, J.D. A1 - Belli, R.F. VL - 48 IS - 11 ER - TY - CONF T1 - Using Data Mining to Examine Interviewer-Respondent Interactions in Calendar Interviews T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L.-K. A1 - T. Al Baghal JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Using Machine Learning Techniques to Predict Respondent Type from A Priori Demographic Information T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Wettlaufer, D. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Why Do Interviewers Speed Up? An Examination of Changes in Interviewer Behaviors over the Course of the Survey Field Period T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Causes and Patterns of Uncertainty in the American Community Survey JF - Applied Geography Y1 - 2014 A1 - Spielman, S. E. A1 - Folch, D. A1 - Nagle, N. VL - 46 UR - http://www.sciencedirect.com/science/article/pii/S0143622813002518 ER - TY - JOUR T1 - The Co-Evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: a Schelling Model JF - Transactions in GIS Y1 - 2014 A1 - Spielman, S. E. A1 - Harrison, P. VL - 18 UR - http://onlinelibrary.wiley.com/enhanced/doi/10.1111/tgis.12014/ ER - TY - CHAP T1 - A Comparison of Blocking Methods for Record Linkage T2 - Privacy in Statistical Databases Y1 - 2014 A1 - Steorts, R. A1 - Ventura, S. A1 - Sadinle, M. A1 - Fienberg, S. E. A1 - Domingo-Ferrer, J. JF - Privacy in Statistical Databases PB - Springer VL - 8744 UR - http://link.springer.com/chapter/10.1007/978-3-319-11257-2_20 ER - TY - JOUR T1 - A Comparison of Spatial Predictors when Datasets Could be Very Large JF - ArXiv Y1 - 2014 A1 - Bradley, J. R. A1 - Cressie, N. A1 - Shi, T. KW - Statistics - Methodology AB -

In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of CO2 data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.

UR - http://arxiv.org/abs/1410.7748 IS - 1410.7748 ER - TY - JOUR T1 - Dasymetric Modeling and Uncertainty JF - The Annals of the Association of American Geographers Y1 - 2014 A1 - Nagle, N. A1 - Buttenfield, B. A1 - Leyk, S. A1 - Spielman, S. E. VL - 104 UR - http://www.tandfonline.com/doi/abs/10.1080/00045608.2013.843439 ER - TY - CONF T1 - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R.F. JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - CONF T1 - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA. UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach JF - Annals of Applied Statistics Y1 - 2014 A1 - Sadinle, M. VL - 8 ER - TY - CONF T1 - The Effect of CATI Questionnaire Design Features on Response Timing T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Olson, K. A1 - Smyth, Jolene JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Entity Resolution with Empirically Motivated Priors JF - ArXiv Y1 - 2014 A1 - Steorts, R. C. KW - Statistics - Methodology AB - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian--type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters. UR - http://arxiv.org/abs/1409.0643 IS - 1409.0643 ER - TY - JOUR T1 - Harnessing Naturally Occurring Data to Measure the Response of Spending to Income JF - Science Y1 - 2014 A1 - Gelman, M. A1 - Kariv, S. A1 - Shapiro, M.D. A1 - Silverman, D. A1 - Tadelis, S. AB - This paper presents a new data infrastructure for measuring economic activity. The infrastructure records transactions and account balances, yielding measurements with scope and accuracy that have little precedent in economics. The data are drawn from a diverse population that overrepresents males and younger adults but contains large numbers of underrepresented groups. The data infrastructure permits evaluation of a benchmark theory in economics that predicts that individuals should use a combination of cash management, saving, and borrowing to make the timing of income irrelevant for the timing of spending. As in previous studies and in contrast to the predictions of the theory, there is a response of spending to the arrival of anticipated income. The data also show, however, that this apparent excess sensitivity of spending results largely from the coincident timing of regular income and regular spending. The remaining excess sensitivity is concentrated among individuals with less liquidity. Link to data at Berkeley Econometrics Lab (EML): https://eml.berkeley.edu/cgi-bin/HarnessingDataScience2014.cgi VL - 345 UR - http://www.sciencemag.org/content/345/6193/212.full IS - 11 ER - TY - CONF T1 - Having a Lasting Impact: The Effects of Interviewer Errors on Data Quality T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - Timm, A. A1 - Olson, K. A1 - Smyth, J.D. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - CONF T1 - Hours or Minutes: Does One Unit Fit All? T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - Cochran, B. A1 - Smyth, J.D. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - JOUR T1 - I Cheated, but only a Little–Partial Confessions to Unethical Behavior JF - Journal of Personality and Social Psychology Y1 - 2014 A1 - Peer, E. A1 - Acquisti, A. A1 - Shalvi, S. VL - 106 ER - TY - JOUR T1 - Identifying Regions based on Flexible User Defined Constraints JF - International Journal of Geographic Information Science Y1 - 2014 A1 - Folch, D. A1 - Spielman, S. E. VL - 28 UR - http://www.tandfonline.com/doi/abs/10.1080/13658816.2013.848986 ER - TY - CONF T1 - Making sense of paradata: Challenges faced and lessons learned T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Eck, A. A1 - Stuart, L. A1 - Atkin, G. A1 - Soh, L-K A1 - McCutcheon, A.L. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Making Sense of Paradata: Challenges Faced and Lessons Learned T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Eck, A. A1 - Stuart, L. A1 - Atkin, G. A1 - Soh, L-K A1 - McCutcheon, A.L. A1 - Belli, R.F. JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation Y1 - 2014 A1 - Steorts, Rebecca A1 - Shalizi, Cosma AB - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation Steorts, Rebecca; Shalizi, Cosma Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37748 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth Y1 - 2014 A1 - Dunn, Abe A1 - Liebman, Eli A1 - Shapiro, Adam AB - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth Dunn, Abe; Liebman, Eli; Shapiro, Adam PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37411 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Designer Census Geographies Y1 - 2014 A1 - Spielman, Seth AB - NCRN Meeting Fall 2014: Designer Census Geographies Spielman, Seth Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37747 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System Y1 - 2014 A1 - Spiller, Michael (Trey) AB - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System Spiller, Michael (Trey) PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37414 ER - TY - RPRT T1 - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data Y1 - 2014 A1 - Schneider, Matthew J. A1 - Abowd, John M. AB - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data Schneider, Matthew J.; Abowd, John M. Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators. PB - Cornell University UR - http://hdl.handle.net/1813/40828 ER - TY - JOUR T1 - The Past, Present, and Future of Geodemographic Research in the Unites States and United Kingdom JF - The Professional Geographer Y1 - 2014 A1 - Singleton, A. A1 - Spielman, S. E. VL - 4 ER - TY - RPRT T1 - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Y1 - 2014 A1 - Spielman, Seth A1 - Folch, David AB - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Spielman, Seth; Folch, David The American Community Survey (ACS) is the largest US survey of households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article develops a spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to post-process survey data in order to reduce the margins of error to some user-specified threshold. PB - University of Colorado at Boulder / University of Tennessee UR - http://hdl.handle.net/1813/38121 ER - TY - CONF T1 - SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication T2 - AISTATS 2014 Proceedings, JMLR Y1 - 2014 A1 - Steorts, R. A1 - Hall, R. A1 - Fienberg, S. E. JF - AISTATS 2014 Proceedings, JMLR PB - W& CP VL - 33 ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2014 A1 - Abowd, John M. A1 - Kramarz, Francis A1 - Perez-Duarte, Sebastien A1 - Schmutte, Ian M. AB - Sorting Between and Within Industries: A Testable Model of Assortative Matching Abowd, John M.; Kramarz, Francis; Perez-Duarte, Sebastien; Schmutte, Ian M. We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Cornell University UR - http://hdl.handle.net/1813/52607 ER - TY - JOUR T1 - Spatial Collective Intelligence? Accuracy, Credibility in Crowdsourced Data JF - Cartography and Geographic Information Science Y1 - 2014 A1 - Spielman, S. E. VL - 41 UR - http://go.galegroup.com/ps/i.do?action=interpret&id=GALE|A361943563&v=2.1&u=nysl_sc_cornl&it=r&p=AONE&sw=w&authCount=1 IS - 2 ER - TY - CONF T1 - Supporting Planners' Work with Uncertain Demographic Data T2 - GIScience Workshop on Uncertainty Visualization Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Nagle, N. N. A1 - Folch, D. C. JF - GIScience Workshop on Uncertainty Visualization VL - 23 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf. ER - TY - CONF T1 - Supporting Planners' work with Uncertain Demographic Data T2 - Proceedings of IEEE VIS 2014 Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Nagle, N. N. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Folch, D. C. JF - Proceedings of IEEE VIS 2014 PB - Proceedings of IEEE VIS 2014 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf ER - TY - CONF T1 - Survey Informatics: Ideas, Opportunities, and Discussions T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Eck, A. A1 - Soh, L-K JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - RPRT T1 - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates Y1 - 2014 A1 - Folch, David C. A1 - Arribas-Bel, Daniel A1 - Koschinsky, Julia A1 - Spielman, Seth E. AB - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates Folch, David C.; Arribas-Bel, Daniel; Koschinsky, Julia; Spielman, Seth E. The U.S. Census Bureau's American Community Survey (ACS) is the foundation of social science research, much federal resource allocation and the development of public policy and private sector decisions. However, the high uncertainty associated with some of the ACS's most frequently used estimates can jeopardize the accuracy of inferences based on these data. While there is high level understanding in the research community that problems exist in the data, the sources and implications of these problems have been largely overlooked. Using 2006-2010 ACS median household income at the census tract scale as the test case (where a third of small-area estimates have higher than recommend errors), we explore the patterns in the uncertainty of ACS data. We consider various potential sources of uncertainty in the data, ranging from response level to geographic location to characteristics of the place. We find that there exist systematic patterns in the uncertainty in both the spatial and attribute dimensions. Using a regression framework, we identify the factors that are most frequently correlated with the error at national, regional and metropolitan area scales, and find these correlates are not consistent across the various locations tested. The implication is that data quality varies in different places, making cross-sectional analysis both within and across regions less reliable. We also present general advice for data users and potential solutions to the challenges identified. PB - University of Colorado at Boulder / University of Tennessee UR - http://hdl.handle.net/1813/38122 ER - TY - JOUR T1 - An updated method for calculating income and payroll taxes from PSID data using the NBER’s TAXSIM, for PSID survey years 1999 through 2011 JF - Unpublished manuscript, University of Michigan. Accessed May Y1 - 2014 A1 - Kimberlin, Sara A1 - Kim, Jiyoun A1 - Shaefer, Luke AB - This paper describes a method to calculate income and payroll taxes from Panel Study of Income Dynamics data using the NBERʼs Internet TAXSIM version 9 (http://users.nber.org/~taxsim/taxsim9/), for PSID survey years 1999, 2001, 2003, 2005. 2007, 2009, and 2011 (tax years n-1). These methods are implemented in two Stata programs, designed to be used with the PSID public-use zipped Main Interview data files: PSID_TAXSIM_1of2.do and PSID_TAXSIM_2of2.do. The main program (2of2) was written by Sara Kimberlin (skimberlin@berkeley.edu) and generates all TAXSIM input variables, runs TAXSIM, adjusts tax estimates using additional information available in PSID data, and calculates total PSID family unit taxes. A separate program (1of2) was written by Jiyoon (June) Kim (junekim@umich.edu) in collaboration with Luke Shaefer (lshaefer@umich.edu) to calculate mortgage interest for itemized deductions; this program needs to be run first, before the main program. Jonathan Latner contributed code to use the programs with the PSID zipped data. The overall methods build on the strategy for using TAXSIM with PSID data outlined by Butrica & Burkhauser (1997), with some expansions and modifications. Note that the methods described below are designed to prioritize accuracy of income taxes calculated for low-income households, particularly refundable tax credits such as the Earned Income Tax Credit (EITC) and the Additional Child Tax Credit. Income tax liability is generally low for low-income households, and the amount of refundable tax credits is often substantially larger than tax liabilities for this population. Payroll tax can also be substantial for low-income households. Thus the methods below focus on maximizing accuracy of income tax and payroll tax calculations for low-income families, with less attention to tax items that largely impact higher-income households (e.g. the treatment of capital gains). VL - 6 ER - TY - RPRT T1 - Using Social Media to Measure Labor Market Flows Y1 - 2014 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew UR - http://www-personal.umich.edu/~shapiro/papers/LaborFlowsSocialMedia.pdf ER - TY - CONF T1 - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences T2 - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2014 A1 - Woodruff, A. A1 - Pihur, V. A1 - Acquisti, A. A1 - Consolvo, S. A1 - Schmidt, L. A1 - Brandimarte, L. JF - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY UR - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff N1 - IAPP SOUPS Privacy Award Winner ER - TY - RPRT T1 - A Bayesian Approach to Graphical Record Linkage and De-duplication Y1 - 2013 A1 - Steorts, Rebecca C. A1 - Hall, Rob A1 - Fienberg, Stephen E. AB - We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online. JF - arXiv UR - https://arxiv.org/abs/1312.4645 ER - TY - RPRT T1 - b-Bit Minwise Hashing in Practice Y1 - 2013 A1 - Li, Ping A1 - Shrivastava, Anshumali A1 - König, Arnd Christian AB - b-Bit Minwise Hashing in Practice Li, Ping; Shrivastava, Anshumali; König, Arnd Christian Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demon- strated a potential use of b-bit minwise hashing [23, 24] for ef- ficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical is- sues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications. PB - Cornell University UR - http://hdl.handle.net/1813/37986 ER - TY - CONF T1 - b-Bit Minwise Hashing in Practice T2 - Internetware'13 Y1 - 2013 A1 - Ping Li A1 - Anshumali Shrivastava A1 - König, Arnd Christian AB - Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demonstrated a potential use of b-bit minwise hashing [23, 24] for efficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical issues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications. Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations for each data vector. We developed a parallelization scheme using GPUs and observed that the preprocessing time can be reduced by a factor of 20   80 and becomes substantially smaller than the data loading time. Reducing the preprocessing time is highly beneficial in practice, e.g., for duplicate Web page detection (where minwise hashing is a major step in the crawling pipeline) or for increasing the testing speed of online classifiers. Another critical issue is that for very large data sets it becomes impossible to store a (fully) random permutation matrix, due to its space requirements. Our paper is the first study to demonstrate that b-bit minwise hashing implemented using simple hash functions, e.g., the 2-universal (2U) and 4-universal (4U) hash families, can produce very similar learning results as using fully random permutations. Experiments on datasets of up to 200GB are presented. JF - Internetware'13 UR - http://www.nudt.edu.cn/internetware2013/ ER - TY - CONF T1 - Beyond Pairwise: Provably Fast Algorithms for Approximate K-Way Similarity Search T2 - Neural Information Processing Systems (NIPS) Y1 - 2013 A1 - Anshumali Shrivastava A1 - Ping Li JF - Neural Information Processing Systems (NIPS) ER - TY - CONF T1 - The Co-Evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: A Schelling Model T2 - Transactions in GIS Y1 - 2013 A1 - S.E. Spielman A1 - Patrick Harrison JF - Transactions in GIS ER - TY - JOUR T1 - Do single mothers in the United States use the Earned Income Tax Credit to reduce unsecured debt? JF - Review of Economics of the Household Y1 - 2013 A1 - Shaefer, H. Luke A1 - Song, Xiaoqing A1 - Williams Shanks, Trina R. KW - Earned Income Tax Credit Single Mothers Unsecured Debt AB -

The Earned Income Tax Credit (EITC) is a refundable credit for low income workers mainly targeted at families with children. This study uses the Survey of Income and Program Participation’s topical modules on Assets and Liabilities to examine associations between the EITC expansions during the early 1990s and the unsecured debt of the households of single mothers. We use two difference-in-differences comparisons over the study period 1988–1999, first comparing single mothers to single childless women, and then comparing single mothers with two or more children to single mothers with exactly one child. In both cases we find that the EITC expansions are associated with a relative decline in the unsecured debt of affected households of single mothers. While not direct evidence of a causal relationship, this is suggestive evidence that single mothers may have used part of their EITC to limit the growth of their unsecured debt during this period.

N1 - NCRN ER - TY - JOUR T1 - On estimation of mean squared errors of benchmarked and empirical bayes estimators JF - Statistica Sinica Y1 - 2013 A1 - Rebecca C. Steorts A1 - Malay Ghosh VL - 23 ER - TY - CONF T1 - Examining the relationship between error and behavior in the American Time Use Survey using audit trail paradata T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Ruther, N. A1 - T. Al Baghal A1 - A. Eck A1 - L. Stuart A1 - L. Phillips A1 - R. Belli A1 - Soh, L-K JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Fast Near Neighbor Search in High-Dimensional Binary Data Y1 - 2013 A1 - Shrivastava, Anshumali A1 - Li, Ping AB - Fast Near Neighbor Search in High-Dimensional Binary Data Shrivastava, Anshumali; Li, Ping Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections. PB - Cornell University UR - http://hdl.handle.net/1813/37987 ER - TY - JOUR T1 - From Facebook Regrets to Facebook Privacy Nudges JF - Ohio State Law Journal Y1 - 2013 A1 - Wang, Y. A1 - Leon, P. G. A1 - Chen, X. A1 - Komanduri, S. A1 - Norcie, G. A1 - Scott, K. A1 - Acquisti, A. A1 - Cranor, L. F. A1 - Sadeh, N. N1 - Invited paper ER - TY - JOUR T1 - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Record Systems JF - Journal of the American Statistical Association Y1 - 2013 A1 - Sadinle, M. A1 - Fienberg, S. E. VL - 108 UR - http://dx.doi.org/10.1080/01621459.2012.757231 ER - TY - JOUR T1 - Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples JF - Statist. Sci. Y1 - 2013 A1 - Deng, Yiting A1 - Hillygus, D. Sunshine A1 - Reiter, Jerome P. A1 - Si, Yajuan A1 - Zheng, Siyu AB - Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples—new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel—offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007–2008 Associated Press–Yahoo! News Election Poll. VL - 28 UR - http://dx.doi.org/10.1214/13-STS414 ER - TY - JOUR T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions JF - Spatial Statistics Y1 - 2013 A1 - Sengupta, A. A1 - Cressie, N. KW - EM algorithm KW - Empirical Bayes KW - Geostatistical process KW - Maximum likelihood estimation KW - MCMC KW - SRE model VL - 4 UR - http://www.sciencedirect.com/science/article/pii/S2211675313000055 ER - TY - JOUR T1 - Identifying Neighborhoods Using High Resolution Population Data JF - Annals of the Association of American Geographers Y1 - 2013 A1 - S.E. Spielman A1 - J. Logan VL - 103 ER - TY - JOUR T1 - Neighborhood contexts, health, and behavior: understanding the role of scale and residential sorting JF - Environment and Planning B Y1 - 2013 A1 - Spielman, S. E. A1 - Linkletter, C. A1 - Yoo, E.-H. VL - 3 ER - TY - JOUR T1 - Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Si, Y. A1 - Reiter, J.P. VL - 38 UR - http://www.stat.duke.edu/ jerry/Papers/StatinMed14.pdf ER - TY - CONF T1 - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of autobiographical recall in surveys T2 - Biennial conference of the Society for Applied Research in Memory and Cognition Y1 - 2013 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L-K A1 - T. Al Baghal JF - Biennial conference of the Society for Applied Research in Memory and Cognition CY - Rotterdam, Netherlands UR - http://static1.squarespace.com/static/504170d6e4b0b97fe5a59760/t/52457a8be4b0012b7a5f462a/1380285067247/SARMAC_X_PaperJune27.pdf ER - TY - CONF T1 - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of retrospective reports T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L-K A1 - T. Al Baghal JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Reconsidering the Consequences of Worker Displacements: Survey versus Administrative Measurements Y1 - 2013 A1 - Flaaen, Aaron A1 - Shapiro, Matthew A1 - Isaac Sorkin AB - Displaced workers suffer persistent earnings losses. This stark finding has been established by following workers in administrative data after mass layoffs under the presumption that these are involuntary job losses owing to economic distress. Using linked survey and administrative data, this paper examines this presumption by matching worker-supplied reasons for separations with what is happening at the firm. The paper documents substantially different earnings dynamics in mass layoffs depending on the reason the worker gives for the separation. Using a new methodology for accounting for the increase in the probability of separation among all types of survey response during in a mass layoff, the paper finds earnings loss estimates that are surprisingly close to those using only administrative data. Finally, the survey-administrative link allows the decomposition of earnings losses due to subsequent nonemployment into non-participation and unemployment. Including the zero earnings of those identified as being unemployed substantially increases the estimate of earnings losses. PB - University of Michigan UR - http://www-personal.umich.edu/~shapiro/papers/ReconsideringDisplacements.pdf ER - TY - JOUR T1 - Ringtail: Feature Selection for Easier Nowcasting. JF - WebDB Y1 - 2013 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew AB - In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have. We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user. UR - http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf ER - TY - JOUR T1 - Rising extreme poverty in the United States and the response of means-tested transfers. JF - Social Service Review Y1 - 2013 A1 - H. Luke Shaefer A1 - Edin, K. AB - This study documents an increase in the prevalence of extreme poverty among US households with children between 1996 and 2011 and assesses the response of major federal means-tested transfer programs. Extreme poverty is defined using a World Bank metric of global poverty: \$2 or less, per person, per day. Using the 1996–2008 panels of the Survey of Income and Program Participation (SIPP), we estimate that in mid-2011, 1.65 million households with 3.55 million children were living in extreme poverty in a given month, based on cash income, constituting 4.3 percent of all nonelderly households with children. The prevalence of extreme poverty has risen sharply since 1996, particularly among those most affected by the 1996 welfare reform. Adding SNAP benefits to household income reduces the number of extremely poor households with children by 48.0 percent in mid-2011. Adding SNAP, refundable tax credits, and housing subsidies reduces it by 62.8 percent. VL - 87 UR - http://www.jstor.org/stable/10.1086/671012 IS - 2 ER - TY - JOUR T1 - Two-stage Bayesian benchmarking as applied to small area estimation JF - TEST Y1 - 2013 A1 - Rebecca C. Steorts A1 - Malay Ghosh KW - small area estimation VL - 22 IS - 4 ER - TY - THES T1 - User Modeling via Machine Learning and Rule-based Reasoning to Understand and Predict Errors in Survey Systems Y1 - 2013 A1 - Stuart, Leonard Cleve PB - University of Nebraska-Lincoln UR - http://digitalcommons.unl.edu/computerscidiss/70/ ER - TY - JOUR T1 - Using High Resolution Population Data to Identify Neighborhoods and Determine their Boundaries JF - Annals of the Association of American Geographers Y1 - 2013 A1 - Spielman, S. E. A1 - Logan, J. VL - 103 UR - http://www.tandfonline.com/doi/abs/10.1080/00045608.2012.685049 ER - TY - CONF T1 - What are you doing now?: Audit trails, Activity level responses and error in the American Time Use Survey T2 - American Association for Public Opinion Research Y1 - 2013 A1 - T. Al Baghal A1 - Phillips, A.L. A1 - Ruther, N. A1 - Belli, R.F. A1 - Stuart, L. A1 - Eck, A. A1 - Soh, L-K JF - American Association for Public Opinion Research CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Bayesian Parametric and Nonparametric Inference for Multiple Record Likage T2 - Modern Nonparametric Methods in Machine Learning Workshop Y1 - 2012 A1 - Hall, R. A1 - Steorts, R. A1 - Fienberg, S. E. JF - Modern Nonparametric Methods in Machine Learning Workshop PB - NIPS UR - http://www.stat.cmu.edu/NCRN/PUBLIC/files/beka_nips_finalsub4.pdf ER - TY - CONF T1 - On Estimation of Mean Squared Errors of Benchmarked and Empirical Bayes Estimators T2 - 2012 Joint Statistical Meetings Y1 - 2012 A1 - Rebecca C. Steorts A1 - Malay Ghosh JF - 2012 Joint Statistical Meetings CY - San Diego, CA ER - TY - CONF T1 - Exploring interviewer and respondent interactions: An innovative behavior coding approach T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Walton, L. A1 - Stange, M. A1 - Powell, R. A1 - Belli, R.F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - ABST T1 - Extreme Poverty in the United States, 1996 to 2011 Y1 - 2012 A1 - Shaefer, H. Luke A1 - Edin, Kathryn PB - University of Michigan UR - http://www.npc.umich.edu/publications/policy_briefs/brief28/policybrief28.pdf N1 - NCRN ER - TY - CONF T1 - Fast Multi-task Learning for Query Spelling Correction T2 - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012) Y1 - 2012 A1 - Xu Sun A1 - Anshumali Shrivastava A1 - Ping Li JF - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012) UR - http://dx.doi.org/10.1145/2396761.2396800 ER - TY - CONF T1 - Fast Near Neighbor Search in High-Dimensional Binary Data T2 - The European Conference on Machine Learning (ECML 2012) Y1 - 2012 A1 - Anshumali Shrivastava A1 - Ping Li JF - The European Conference on Machine Learning (ECML 2012) ER - TY - RPRT T1 - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Records Systems Y1 - 2012 A1 - Mauricio Sadinle A1 - Stephen E. Fienberg JF - arXiv UR - https://arxiv.org/abs/1205.3217 ER - TY - CONF T1 - GPU-based minwise hashing: GPU-based minwise hashing T2 - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume) Y1 - 2012 A1 - Ping Li A1 - Anshumali Shrivastava A1 - Arnd Christian König JF - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume) UR - http://doi.acm.org/10.1145/2187980.2188129 ER - TY - ABST T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions Y1 - 2012 A1 - Sengupta, A. A1 - Cressie, N. PB - The Ohio State University ER - TY - CONF T1 - Logit-Based Confidence Intervals for Single Capture-Recapture Estimation T2 - American Statistical Association Pittsburgh Chapter Banquet Y1 - 2012 A1 - Mauricio Sadinle JF - American Statistical Association Pittsburgh Chapter Banquet CY - Pittsburgh, PA N1 - April 9, 2012 ER - TY - CONF T1 - Maintaining Quality in the Face of Rapid Program Expansion T2 - 2012 Joint Statistical Meetings Y1 - 2012 A1 - Cosma Shalizi A1 - Rebecca Nugent JF - 2012 Joint Statistical Meetings CY - San Diego, CA ER - TY - CONF T1 - MulFiles Record Linkage Using a Generalized Fellegi-Sunter Framework T2 - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University Y1 - 2012 A1 - Mauricio Sadinle JF - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University ER - TY - CONF T1 - Query spelling correction using multi-task learning T2 - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume) Y1 - 2012 A1 - Xu Sun A1 - Anshumali Shrivastava A1 - Ping Li JF - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume) UR - http://doi.acm.org/10.1145/2187980.2188153 ER - TY - JOUR T1 - Testing for Membership to the IFRA and the NBU Classes of Distributions JF - Journal of Machine Learning Research - Proceedings Track for the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012) Y1 - 2012 A1 - Radhendushka Srivastava A1 - Ping Li A1 - Debasis Sengupta VL - 22 UR - http://jmlr.csail.mit.edu/proceedings/papers/v22/srivastava12.html ER - TY - CONF T1 - Thinking inside the box: Mapping the microstructure of urban environment (and why it matters) T2 - AutoCarto 2012 Y1 - 2012 A1 - Seth Spielman A1 - David Folch A1 - John Logan A1 - Nicholas Nagle KW - cartography JF - AutoCarto 2012 CY - Columbus, Ohio UR - http://www.cartogis.org/docs/proceedings/2012/Spielman_etal_AutoCarto2012.pdf ER - TY - JOUR T1 - The welfare reforms of the 1990s and the stratification of material well-being among low-income households with children JF - Children and Youth Services Review Y1 - 2012 A1 - Shaefer, H. Luke A1 - Ybarra, Marci AB -

We examine the incidence of material hardship experienced by low-income households with children, before and after the major changes to U.S. anti-poverty programs during the 1990s. We use the Survey of Income and ProgramParticipation (SIPP) to examine a series of measures of householdmaterial hardship thatwere collected in the years 1992, 1995, 1998, 2003 and 2005.We stratify our sample to differentiate between the 1) deeply poor (b50% of poverty), who sawa decline in public assistance over this period; and two groups that sawsome forms of public assistance increase: 2) other poor households (50–99% of poverty), and 3) the near poor (100–150% of poverty). We report bivariate trends over the study period, as well as presenting multivariate difference-indifferences estimates.We find suggestive evidence that material hardship—in the form of difficulty meeting essential household expenses, and falling behind on utilities costs—has generally increased among the deeply poor but has remained roughly the same for the middle group (50–99% of poverty), and decreased among the near poor (100–150% of poverty). Multivariate difference-in-differences estimates suggest that these trends have resulted in intensified stratification of the material well-being of low-income households with children.

VL - 34 N1 - NCRN ER - TY - CONF T1 - Approaches to Multiple Record Linkage T2 - Proceedings of the 58th World Statistical Congress Y1 - 2011 A1 - Sadinle, M. A1 - Hall, R. A1 - Fienberg, S. E. JF - Proceedings of the 58th World Statistical Congress PB - International Statistical Institute CY - Dublin UR - http://2011.isiproceedings.org/papers/450092.pdf ER - TY - RPRT T1 - Do Single Mothers in the United States use the Earned Income Tax Credit to Reduce Unsecured Debt? Y1 - 2011 A1 - Shaefer, H. Luke A1 - Song, Xiaoqing A1 - Williams Shanks, Trina R. AB - Do Single Mothers in the United States use the Earned Income Tax Credit to Reduce Unsecured Debt? Shaefer, H. Luke; Song, Xiaoqing; Williams Shanks, Trina R. The Earned Income Tax Credit (EITC) is a refundable credit for low-income workers that is mainly targeted at families with children. This study uses the Survey of Income and Program Participation’s (SIPP) topical modules on Assets & Liabilities to examine the effects of EITC expansions during the early 1990s on the unsecured debt of the households of single mothers. We use two difference-in-differences comparisons over the study period 1988 to 1999, first comparing single mothers to single childless women, and then comparing single mothers with two or more children to single mothers with exactly one child. In both cases we find that the EITC expansions are associated with a relative decline in the unsecured debt of affected households of single mothers. This suggests that single mothers may have used part of their EITC to limit the growth of their unsecured debt during this period. PB - University of Michigan UR - http://hdl.handle.net/1813/34516 ER - TY - ABST T1 - Are Self-Description Scales Better than Agree/Disagree Scales in Mail and Telephone Surveys? Y1 - 0 A1 - Timbrook, Jerry A1 - Smyth, Jolene D. A1 - Olson, Kristen ER - TY - ABST T1 - Are Self-Description Scales Better than Agree/Disagree Scales in Mail and Telephone Surveys? Y1 - 0 A1 - Timbrook, Jerry A1 - Smyth, Jolene D. A1 - Olson, Kristen ER - TY - JOUR T1 - Bayesian estimation of bipartite matchings for record linkage JF - Journal of the American Statistical Association Y1 - 0 A1 - Mauricio Sadinle AB - The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador. ER - TY - JOUR T1 - Biomass prediction using density dependent diameter distribution models JF - Annals of Applied Statistics Y1 - 0 A1 - Schliep, E.M. A1 - A.E. Gelfand A1 - J.S. Clark A1 - B.J. Tomasek AB - Prediction of aboveground biomass, particularly at large spatial scales, is necessary for estimating global-scale carbon sequestration. Since biomass can be measured only by sacrificing trees, total biomass on plots is never observed. Rather, allometric equations are used to convert individual tree diameter to individual biomass, perhaps with noise. The values for all trees on a plot are then summed to obtain a derived total biomass for the plot. Then, with derived total biomasses for a collection of plots, regression models, using appropriate environmental covariates, are employed to attempt explanation and prediction. Not surprisingly, when out-of-sample validation is examined, such a model will predict total biomass well for holdout data because it is obtained using exactly the same derived approach. Apart from the somewhat circular nature of the regression approach, it also fails to employ the actual observed plot level response data. At each plot, we observe a random number of trees, each with an associated diameter, producing a sample of diameters. A model based on this random number of tree diameters provides understanding of how environmental regressors explain abundance of individuals, which in turn explains individual diameters. We incorporate density dependence because the distribution of tree diameters over a plot of fixed size depends upon the number of trees on the plot. After fitting this model, we can obtain predictive distributions for individual-level biomass and plot-level total biomass. We show that predictive distributions for plot-level biomass obtained from a density-dependent model for diameters will be much different from predictive distributions using the regression approach. Moreover, they can be more informative for capturing uncertainty than those obtained from modeling derived plot-level biomass directly. We develop a density-dependent diameter distribution model and illustrate with data from the national Forest Inventory and Analysis (FIA) database. We also describe how to scale predictions to larger spatial regions. Our predictions agree (in magnitude) with available wisdom on mean and variation in biomass at the hectare scale. VL - 11 UR - https://projecteuclid.org/euclid.aoas/1491616884 IS - 1 ER - TY - ABST T1 - "During the LAST YEAR, Did You...": The Effect of Emphasis in CATI Survey Questions on Data Quality Y1 - 0 A1 - Olson, Kristen A1 - Smyth, Jolene D. ER - TY - ABST T1 - "During the LAST YEAR, Did You...": The Effect of Emphasis in CATI Survey Questions on Data Quality Y1 - 0 A1 - Olson, Kristen A1 - Smyth, Jolene D. ER - TY - ABST T1 - The Effect of Question Characteristics, Respondents and Interviewers on Question Reading Time and Question Reading Behaviors in CATI Surveys Y1 - 0 A1 - Olson, Kristen A1 - Smyth, Jolene A1 - Kirchner, Antje ER - TY - ABST T1 - The Effects of Respondent and Question Characteristics on Respondent Behaviors Y1 - 0 A1 - Ganshert, Amanda A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - Going off Script: How Interviewer Behavior Affects Respondent Behaviors in Telephone Surveys Y1 - 0 A1 - Kirchner, Antje A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - How do Low Versus High Response Scale Ranges Impact the Administration and Answering of Behavioral Frequency Questions in Telephone Surveys? Y1 - 0 A1 - Sarwar, Mazen A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - How do Mismatches Affect Interviewer/Respondent Interactions in the Question/Answer Process? Y1 - 0 A1 - Smyth, Jolene D. A1 - Olson, Kristen ER - TY - ABST T1 - Interviewer Influence on Interviewer-Respondent Interaction During Battery Questions Y1 - 0 A1 - Cochran, Beth A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - Response Scales: Effects on Data Quality for Interviewer Administered Surveys Y1 - 0 A1 - Sarwar, Mazen A1 - Olson, Kristen A1 - Smyth, Jolene ER - TY - ABST T1 - Using audit trails to evaluate an event history calendar survey instrument Y1 - 0 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Belli, Robert F. ER - TY - ABST T1 - Why do Mobile Interviews Take Longer? A Behavior Coding Perspective Y1 - 0 A1 - Timbrook, Jerry A1 - Smyth, Jolene A1 - Olson, Kristen ER - TY - ABST T1 - Working with the SIPP-EHC audit trails: Parallel and sequential retrieval Y1 - 0 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Córdova Cazar, Ana Lucía A1 - Eck, Adam A1 - Belli, Robert F. ER -