TY  - JOUR
T1  - Data fusion for  correcting measurement errors
Y1  - Submitted
A1  - J. P. Reiter
A1  - T. Schifeling
A1  - M. De Yoreo
AB  - Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. In some settings, however, analysts have access to a data source on different individuals with high quality measurements of the error-prone survey items. We present a data fusion framework for leveraging this information to improve inferences in the error-prone survey. The basic idea is to posit models about the rates at which individuals make errors, coupled with models for the values reported when errors are made. This can avoid the unrealistic assumption of conditional independence typically used in data fusion. We apply the approach on the reported values of educational attainments in the American Community Survey, using the National Survey of College Graduates as the high quality data source. In doing so, we account for the informative sampling design used to select the National Survey of College Graduates. We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. Supplemental material is available online.
ER  - 

TY  - JOUR
T1  - Sequential identification of nonignorable missing data mechanisms
JF  - Statistica Sinica
Y1  - Submitted
A1  - Mauricio Sadinle
A1  - Jerome P. Reiter
KW  - Identification
KW  - Missing not at random
KW  - Non-parametric saturated
KW  - Partial ignorability
KW  - Sensitivity analysis
AB  - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated.
ER  - 

TY  - JOUR
T1  - The Earned Income Tax Credit and Food Insecurity: Who Benefits?
Y1  - forthcoming
A1  - Shaefer, H.L.
A1  - Wilson, R.
ER  - 

TY  - JOUR
T1  - The Response of Consumer Spending to Changes in Gasoline Prices
Y1  - forthcoming
A1  - Gelman, Michael
A1  - Gorodnichenko, Yuriy
A1  - Kariv, Shachar
A1  - Koustas, Dmitri
A1  - Shapiro, Matthew D
A1  - Silverman, Daniel
A1  - Tadelis, Steven
AB  - This paper estimates how overall consumer spending responds to changes in gasoline prices. It uses the differential impact across consumers of the sudden, large drop in gasoline prices in 2014 for identification. This estimation strategy is implemented using comprehensive, daily transaction-level data for a large panel of individuals. The estimated marginal propensity to consume (MPC) is approximately one, a higher estimate than estimates found in less comprehensive or well-measured data. This estimate takes into account the elasticity of demand for gasoline and potential slow adjustment to changes in prices. The high MPC implies that changes in gasoline prices have large aggregate effects.
ER  - 

TY  - JOUR
T1  - Sorting Between and Within Industries: A Testable Model of Assortative Matching
JF  - Annals of Economics and Statistics
Y1  - 2018
A1  - John M. Abowd
A1  - Francis Kramarz
A1  - Sebastien Perez-Duarte
A1  - Ian M. Schmutte
ER  - 

TY  - JOUR
T1  - Adaptively-Tuned Particle Swarm Optimization with Application to Spatial Design
JF  - Stat
Y1  - 2017
A1  - Simpson, M.
A1  - Wikle, C.K.
A1  - Holan, S.H.
AB  - Particle swarm optimization (PSO) algorithms are a class of heuristic optimization algorithms that are attractive for complex optimization problems. We propose using PSO to solve spatial design problems, e.g. choosing new locations to add to an existing monitoring network. Additionally, we introduce two new classes of PSO algorithms that perform well in a wide variety of circumstances, called adaptively tuned PSO and adaptively tuned bare bones PSO. To illustrate these algorithms, we apply them to a common spatial design problem: choosing new locations to add to an existing monitoring network. Specifically, we consider a network in the Houston, TX, area for monitoring ambient ozone levels, which have been linked to out-of-hospital cardiac arrest rates. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA
VL  - 6
UR  - http://onlinelibrary.wiley.com/doi/10.1002/sta4.142/abstract
IS  - 1
ER  - 

TY  - JOUR
T1  - Bayesian estimation of bipartite matchings for record linkage
JF  - Journal of the American Statistical Association
Y1  - 2017
A1  - Mauricio Sadinle
KW  - Assignment problem
KW  - Bayes estimate
KW  - Data matching
KW  - Fellegi-Sunter decision rule
KW  - Mixture model
KW  - Rejection option
AB  - The bipartite record linkage task consists of merging two disparate datafiles containing  information on two overlapping sets of entities. This is non-trivial in the absence  of unique identifiers and it is important for a wide variety of applications given  that it needs to be solved whenever we have to combine information from different  sources. Most statistical techniques currently used for record linkage are derived from  a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence  in the matching statuses of record pairs to derive estimation procedures  and optimal point estimators. We argue that this independence assumption is unreasonable  and instead target a bipartite matching between the two datafiles as our  parameter of interest. Bayesian implementations allow us to quantify uncertainty on  the matching decisions and derive a variety of point estimators using different loss  functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite  matching to be left unresolved. We evaluate our approach to record linkage  using a variety of challenging scenarios and show that it outperforms the traditional  methodology. We illustrate the advantages of our methods merging two datafiles on  casualties from the civil war of El Salvador.
VL  - 112
UR  - http://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1148612
IS  - 518
ER  - 

TY  - JOUR
T1  - Cost-Benefit Analysis for a Quinquennial Census: The 2016 Population Census of South Africa
JF  - Journal of Official Statistics
Y1  - 2017
A1  - Spencer, Bruce D.
A1  - May, Julian
A1  - Kenyon, Steven
A1  - Seeskin, Zachary
KW  - demographic statistics
KW  - fiscal allocations
KW  - loss function
KW  - population estimates
KW  - post-censal estimates
AB  - The question of whether to carry out a quinquennial Census is faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. We describe uses and limitations of cost-benefit analysis in this decision problem in the case of the 2016 Census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 Census or to rely on increasingly inaccurate postcensal estimates accounting for births, deaths, and migration since the previous (2011) Census. The cost-benefit analysis compared predicted costs of the 2016 Census to the benefits of improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 Census, although not the only important benefit. Without the 2016 Census, allocations would be based on population estimates. Accuracy of the postcensal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 Census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 Census, but instead to improve data and capacity for producing post-censal estimates.
VL  - 33
SN  - 2001-7367
UR  - https://www.degruyter.com/view/j/jos.2017.33.issue-1/jos-2017-0013/jos-2017-0013.xml
IS  - 1
ER  - 

TY  - JOUR
T1  - Do Interviewer Post-survey Evaluations of Respondents Measure Who Respondents Are or What They Do? A Behavior Coding Study
JF  - Public Opinion Quarterly
Y1  - 2017
A1  - Kirchner, Antje
A1  - Olson, Kristen
A1  - Smyth, Jolene D.
AB  - Survey interviewers are often tasked with assessing the quality  of respondents’ answers after completing a survey interview. These  interviewer observations have been used to proxy for measurement error  in interviewer-administered surveys. How interviewers formulate these  evaluations and how well they proxy for measurement error has received  little empirical attention. According to dual-process theories of impression  formation, individuals form impressions about others based on the  social categories of the observed person (e.g., sex, race) and individual  behaviors observed during an interaction. Although initial impressions  start with heuristic, rule-of-thumb evaluations, systematic processing is  characterized by extensive incorporation of available evidence. In a survey  context, if interviewers default to heuristic information processing  when evaluating respondent engagement, then we expect their evaluations  to be primarily based on respondent characteristics and stereotypes  associated with those characteristics. Under systematic processing, on the other hand, interviewers process and evaluate respondents based on observable respondent behaviors occurring during the question-answering  process. We use the Work and Leisure Today Survey, including survey  data and behavior codes, to examine proxy measures of heuristic  and systematic processing by interviewers as predictors of interviewer  postsurvey evaluations of respondents’ cooperativeness, interest, friendliness,  and talkativeness. Our results indicate that CATI interviewers  base their evaluations on actual behaviors during an interview (i.e., systematic  processing) rather than perceived characteristics of the respondent  or the interviewer (i.e., heuristic processing). These results are  reassuring for the many surveys that collect interviewer observations as  proxies for data quality.
UR  - https://doi.org/10.1093/poq/nfx026
ER  - 

TY  - RPRT
T1  - Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System?
Y1  - 2017
A1  - Weinberg, Daniel
A1  - Abowd, John M.
A1  - Belli, Robert F.
A1  - Cressie, Noel
A1  - Folch, David C.
A1  - Holan, Scott H.
A1  - Levenstein, Margaret C.
A1  - Olson, Kristen M.
A1  - Reiter, Jerome P.
A1  - Shapiro, Matthew D.
A1  - Smyth, Jolene
A1  - Soh, Leen-Kiat
A1  - Spencer, Bruce
A1  - Spielman, Seth E.
A1  - Vilhuber, Lars
A1  - Wikle, Christopher
AB  - <p>Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong</p>
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52650
ER  - 

TY  - RPRT
T1  - Formal Privacy Models and Title 13
Y1  - 2017
A1  - Nissim, Kobbi
A1  - Gasser, Urs
A1  - Smith, Adam
A1  - Vadhan, Salil
A1  - O'Brien, David
A1  - Wood, Alexandra
AB  - Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52164
ER  - 

TY  - JOUR
T1  - Itemwise conditionally independent nonresponse modeling for incomplete multivariate data
JF  - Biometrika
Y1  - 2017
A1  - M. Sadinle
A1  - J.P. Reiter
KW  - Loglinear model
KW  - Missing not at random
KW  - Missingness mechanism
KW  - Nonignorable
KW  - Nonparametric saturated
KW  - Sensitivity analysis
AB  - We introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness mechanism, in that nonresponse for any item can depend on values of other items that are themselves missing. We show that, under this itemwise conditionally independent nonresponse assumption, one can define and identify nonparametric saturated classes of joint multivariate models for the study variables and their missingness indicators. We also show how to perform sensitivity analysis to violations of the conditional independence assumptions encoded by this missingness mechanism. Throughout, we illustrate the use of this modeling approach with data analyses.
VL  - 104
UR  - https://doi.org/10.1093/biomet/asw063
IS  - 1
ER  - 

TY  - JOUR
T1  - Itemwise conditionally independent nonresponse modeling for multivariate categorical data
JF  - Biometrika
Y1  - 2017
A1  - Sadinle, M.
A1  - Reiter, J. P.
KW  - Identification
KW  - Missing not at random
KW  - Non-parametric saturated
KW  - Partial ignorability
KW  - Sensitivity analysis
AB  - With nonignorable missing data, likelihood-based inference should be based on the joint  distribution of the study variables and their missingness indicators. These joint models cannot  be estimated from the data alone, thus requiring the analyst to impose restrictions that make the  models uniquely obtainable from the distribution of the observed data. We present an approach  for constructing classes of identifiable nonignorable missing data models. The main idea is to use  a sequence of carefully set up identifying assumptions, whereby we specify potentially different  missingness mechanisms for different blocks of variables. We show that the procedure results in  models with the desirable property of being non-parametric saturated.
VL  - 104
ER  - 

TY  - JOUR
T1  - Modeling Endogenous Mobility in Earnings Determination
JF  - Journal of Business & Economic Statistics
Y1  - 2017
A1  - John M. Abowd
A1  - Kevin L. Mckinney
A1  - Ian M. Schmutte
AB  - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The mobility-bias corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.
UR  - http://dx.doi.org/10.1080/07350015.2017.1356727
ER  - 

TY  - RPRT
T1  - Modeling Endogenous Mobility in Wage Determination
Y1  - 2017
A1  - John M. Abowd
A1  - Kevin L. Mckinney
A1  - Ian M. Schmutte
AB  - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.
UR  - http://digitalcommons.ilr.cornell.edu/ldi/28/
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13
Y1  - 2017
A1  - Nissim, Kobbi
A1  - Gasser, Urs
A1  - Smith, Adam
A1  - Vadhan, Salil
A1  - O'Brien, David
A1  - Wood, Alexandra
AB  - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52164
ER  - 

TY  - RPRT
T1  - Presentation: Introduction to Stan for Markov Chain Monte Carlo
Y1  - 2017
A1  - Simpson, Matthew
AB  - Presentation: Introduction to Stan for Markov Chain Monte Carlo Simpson, Matthew An introduction to Stan (http://mc-stan.org/): a probabilistic programming language that implements Hamiltonian Monte Carlo (HMC), variational Bayes, and (penalized) maximum likelihood estimation. Presentation given at the U.S. Census Bureau on April 25, 2017.
PB  - University of Missouri
UR  - http://hdl.handle.net/1813/52656
ER  - 

TY  - RPRT
T1  - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy
Y1  - 2017
A1  - Vilhuber, Lars
A1  - Schmutte, Ian
AB  - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian On October 14, 2016, we hosted a workshop that brought together economists, survey statisticians, and computer scientists with expertise in the field of privacy preserving methods: Census Bureau staff working on implementing cutting-edge methods in the Bureau’s flagship public-use products mingled with academic researchers from a variety of universities. The four products discussed as part of the workshop were 1. the American Community Survey (ACS); 2. Longitudinal Employer-Household Data (LEHD), in particular the LEHD Origin-Destination Employment Statistics (LODES); the 3. 2020 Decennial Census; and the 4. 2017 Economic Census. The goal of the workshop was to 1. Discuss the specific challenges that have arisen in ongoing efforts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers 2. Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/46197
ER  - 

TY  - RPRT
T1  - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy
Y1  - 2017
A1  - Vilhuber, Lars
A1  - Schmutte, Ian M.
AB  - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian M. ese proceedings report on a workshop hosted at the U.S. Census Bureau on May 8, 2017. Our purpose was to gather experts from various backgrounds together to continue discussing the development of formal privacy systems for Census Bureau data products. is workshop was a successor to a previous workshop held in October 2016 (Vilhuber &amp; Schmu e 2017). At our prior workshop, we hosted computer scientists, survey statisticians, and economists, all of whom were experts in data privacy. At that time we discussed the practical implementation of cu ing-edge methods for publishing data with formal, provable privacy guarantees, with a focus on applications to Census Bureau data products. e teams developing those applications were just starting out when our rst workshop took place, and we spent our time brainstorming solutions to the various problems researchers were encountering, or anticipated encountering. For these cu ing-edge formal privacy models, there had been very li le e ort in the academic literature to apply those methods in real-world se ings with large, messy data. We therefore brought together an expanded group of specialists from academia and government who could shed light on technical challenges, subject ma er challenges and address how data users might react to changes in data availability and publishing standards. In May 2017, we organized a follow-up workshop, which these proceedings report on. We reviewed progress made in four di erent areas. e four topics discussed as part of the workshop were 1. the 2020 Decennial Census; 2. the American Community Survey (ACS); 3. the 2017 Economic Census; 4. measuring the demand for privacy and for data quality. As in our earlier workshop, our goals were to 1. Discuss the speci c challenges that have arisen in ongoing e orts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers; 2. Produce short wri en memos that summarize concrete suggestions for practical applications to speci c Census Bureau priority areas. Comments can be provided at h ps://goo.gl/ZAh3YE
PB  - Cornell University
UR  - http://hdl.handle.net/1813/52473
ER  - 

TY  - RPRT
T1  - Proceedings from the Synthetic LBD International Seminar
Y1  - 2017
A1  - Vilhuber, Lars
A1  - Kinney, Saki
A1  - Schmutte, Ian M.
AB  - Proceedings from the Synthetic LBD International Seminar Vilhuber, Lars; Kinney, Saki; Schmutte, Ian M. On May 9, 2017, we hosted a seminar to discuss the conditions necessary to implement the SynLBD approach with interested parties, with the goal of providing a straightforward toolkit to implement the same procedure on other data. The proceedings summarize the discussions during the workshop.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/52472
ER  - 

TY  - RPRT
T1  - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
Y1  - 2017
A1  - John M. Abowd
A1  - Ian M. Schmutte
AB  - We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial.
JF  - Labor Dynamics Institute Document
UR  - http://digitalcommons.ilr.cornell.edu/ldi/37/
ER  - 

TY  - RPRT
T1  - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
Y1  - 2017
A1  - Abowd, John
A1  - Schmutte, Ian M.
AB  - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial. A complete archive of the data and programs used in this paper is available via http://doi.org/10.5281/zenodo.345385.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/39081
ER  - 

TY  - RPRT
T1  - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
Y1  - 2017
A1  - Abowd, John
A1  - Schmutte, Ian M.
AB  - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52612
ER  - 

TY  - ABST
T1  - Sequential Prediction of Respondent Behaviors Leading to Error in Web-based Surveys
Y1  - 2017
A1  - Eck, Adam
A1  - Soh, Leen-Kiat
ER  - 

TY  - RPRT
T1  - Sorting Between and Within Industries: A Testable Model of Assortative Matching
Y1  - 2017
A1  - John M. Abowd
A1  - Francis Kramarz
A1  - Sebastien Perez-Duarte
A1  - Ian M. Schmutte
AB  - We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated.
PB  - Labor Dynamics Institute
UR  - http://digitalcommons.ilr.cornell.edu/ldi/40/
ER  - 

TY  - RPRT
T1  - Unique Entity Estimation with Application to the Syrian Conflict
Y1  - 2017
A1  - Chen, B.
A1  - Shrivastava, A.
A1  - Steorts, R. C.
KW  - Computer Science - Data Structures and Algorithms
KW  - Computer Science - Databases
KW  - Statistics - Applications
AB  - Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the data generation process, error rates, and computational scalability that make it a difficult task for real applications. In this paper, we focus on a related problem of unique entity estimation, which is the task of estimating the unique number of entities and associated standard errors in a data set with duplicate entities. Unique entity estimation shares many fundamental challenges of entity resolution, namely, that the computational cost of all-to-all entity comparisons is intractable for large databases. To circumvent this computational barrier, we propose an efficient (near-linear time) estimation algorithm based on locality sensitive hashing. Our estimator, under realistic assumptions, is unbiased and has provably low variance compared to existing random sampling based approaches. In addition, we empirically show its superiority over the state-of-the-art estimators on three real applications. The motivation for our work is to derive an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict. Our methodology, when applied to the Syrian data set, provides an estimate of $191,874 \pm 1772$ documented, identifiable deaths, which is very close to the Human Rights Data Analysis Group (HRDAG) estimate of 191,369. Our work provides an example of challenges and efforts involved in solving a real, noisy challenging problem where modeling assumptions may not hold.
JF  - arXiv
UR  - https://arxiv.org/abs/1710.02690
ER  - 

TY  - JOUR
T1  - A Bayesian Approach to Graphical Record Linkage and Deduplication
JF  - Journal of the American Statistical Association
Y1  - 2016
A1  - Rebecca C. Steorts
A1  - Rob Hall
A1  - Stephen E. Fienberg
AB  - ABSTRACTWe propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online.
VL  - 111
UR  - http://dx.doi.org/10.1080/01621459.2015.1105807
ER  - 

TY  - JOUR
T1  - Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples
JF  - Annals of Applied Statistics
Y1  - 2016
A1  - Y. Si
A1  - J. P. Reiter
A1  - D. S. Hillygus
VL  - 10
UR  - http://projecteuclid.org/euclid.aoas/1458909910
ER  - 

TY  - ABST
T1  - Data management and analytic use of paradata: SIPP-EHC audit trails
Y1  - 2016
A1  - Lee, Jinyoung
A1  - Seloske, Ben
A1  - Córdova Cazar, Ana Lucía
A1  - Eck, Adam
A1  - Kirchner, Antje
A1  - Belli, Robert F.
ER  - 

TY  - JOUR
T1  - Differentially private publication of data on wages and job mobility
JF  - Statistical Journal of the International Association for Official Statistics
Y1  - 2016
A1  - Schmutte, Ian M.
KW  - Demand for public statistics
KW  - differential privacy
KW  - job mobility
KW  - matched employer-employee data
KW  - optimal confidentiality protection
KW  - optimal data accuracy
KW  - technology for statistical agencies
AB  - Brazil, like many countries, is reluctant to publish business-level data, because of legitimate concerns about the establishments' confidentiality. A trusted data curator can increase the utility of data, while managing the risk to establishments, either by releasing synthetic data, or by infusing noise into published statistics. This paper evaluates the application of a differentially private mechanism to publish statistics on wages and job mobility computed from Brazilian employer-employee matched data. The publication mechanism can result in both the publication of specific statistics as well as the generation of synthetic data. I find that the tradeoff between the privacy guaranteed to individuals in the data, and the accuracy of published statistics, is potentially much better that the worst-case theoretical accuracy guarantee. However, the synthetic data fare quite poorly in analyses that are outside the set of queries to which it was trained. Note that this article only explores and characterizes the feasibility of these publication strategies, and will not directly result in the publication of any data.
VL  - 32
UR  - http://content.iospress.com/articles/statistical-journal-of-the-iaos/sji962
IS  - 1
ER  - 

TY  - JOUR
T1  - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors
JF  - Survey Practice
Y1  - 2016
A1  - Olson, Kristen
A1  - Kirchner, Antje
A1  - Smyth, Jolene D.
AB  - Interviewers are required to be flexible in responding to respondent  concerns during recruitment, but standardized during administration of the  questionnaire. These skill sets may be at odds. Recent research has shown a  U-shaped relationship between interviewer cooperation rates and interviewer  variance: the least and the most successful interviewers during recruitment have  the largest interviewer variance components. Little is known about why this  association occurs. We posit four hypotheses for this association: 1) interviewers  with higher cooperation rates more conscientious interviewers altogether, 2)  interviewers with higher cooperation rates continue to use rapport behaviors  from the cooperation request throughout an interview, 3) interviewers with  higher cooperation rates display more confidence which translates into different  interview behavior, and 4) interviewers with higher cooperation rates continue  their flexible interviewing style throughout the interview and deviate more from  standardized interviewing. We use behavior codes from the Work and Leisure  Today Survey (n=450, AAPOR RR3=6.3%) to evaluate interviewer behavior.  Our results largely support the confidence hypothesis. Interviewers with higher  cooperation rates do not show evidence of being “better” interviewers.
VL  - 9
UR  - http://www.surveypractice.org/index.php/SurveyPractice/article/view/351
IS  - 2
ER  - 

TY  - RPRT
T1  - Estimating Compensating Wage Differentials with Endogenous Job Mobility
Y1  - 2016
A1  - Kurt Lavetti
A1  - Ian M. Schmutte
AB  - We demonstrate a strategy for using matched employer-employee data to correct endogenous job mobility bias when estimating compensating wage differentials. Applied to fatality rates in the census of formal-sector jobs in Brazil between 2003-2010, we show why common approaches to eliminating ability bias can greatly amplify endogenous job mobility bias. By extending the search-theoretic hedonic wage frame- work, we establish conditions necessary to interpret our estimates as preferences. We present empirical analyses supporting the predictions of the model and identifying conditions, demonstrating that the standard models are misspecified, and that our proposed model eliminates latent ability and endogenous mobility biases.
UR  - http://digitalcommons.ilr.cornell.edu/ldi/29/
ER  - 

TY  - JOUR
T1  - How Should We Define Low-Wage Work? An Analysis Using the Current Population Survey
JF  - Monthly Labor Review
Y1  - 2016
A1  - Fusaro, V.
A1  - Shaefer, H. Luke
AB  - Low-wage work is a central concept in considerable research, yet it lacks an agreed-upon definition. Using data from the Current Population Survey’s Annual Social and Economic Supplement, the analysis presented in this article suggests that defining low-wage work on the basis of alternative hourly wage cutoffs changes the size of the low-wage population, but does not noticeably alter time trends in the rate of change. The analysis also indicates that different definitions capture groups of workers with substantively different demographic, social, and economic characteristics. Although the individuals in any of the categories examined might reasonably be considered low-wage workers, a single definition obscures these distinctions.
UR  - http://www.bls.gov/opub/mlr/2016/article/pdf/how-should-we-define-low-wage-work.pdf
ER  - 

TY  - JOUR
T1  - Incorporating marginal prior information into latent class models
JF  - Bayesian Analysis
Y1  - 2016
A1  - Schifeling, T. S.
A1  - Reiter, J. P.
VL  - 11
UR  - https://projecteuclid.org/euclid.ba/1434649584
ER  - 

TY  - JOUR
T1  - Measuring Poverty Using the Supplemental Poverty Measure in the Panel Study of Income Dynamics, 1998 to 2010
JF  - Journal of Economic and Social Measurement
Y1  - 2016
A1  - Kimberlin, S.
A1  - Shaefer, H.L.
A1  - Kim, J.
AB  - The Supplemental Poverty Measure (SPM) was recently introduced by the U.S. Census Bureau as an alternative measure of poverty that addresses many shortcomings of the official poverty measure (OPM) to better reflect the resources households have available to meet their basic needs. The Census SPM is available only in the Current Population Survey (CPS). This paper describes a method for constructing SPM poverty estimates in the Panel Study of Income Dynamics (PSID), for the biennial years 1998 through 2010. A public-use dataset of individual-level SPM status produced in this analysis will be available for download on the PSID website. Annual SPM poverty estimates from the PSID are presented for the years 1998, 2000, 2002, 2004, 2006, 2008, and 2010 and compared to SPM estimates for the same years derived from CPS data by the Census Bureau and independent researchers. We find that SPM poverty rates in the PSID are somewhat lower than those found in the CPS, though trends over time and impact of specific SPM components are similar across the two datasets.
VL  - 41
UR  - http://content.iospress.com/articles/journal-of-economic-and-social-measurement/jem425
IS  - 1
ER  - 

TY  - ABST
T1  - Mismatches
Y1  - 2016
A1  - Smyth, Jolene
A1  - Olson, Kristen
ER  - 

TY  - RPRT
T1  - Modeling Endogenous Mobility in Earnings Determination
Y1  - 2016
A1  - Abowd, John M.
A1  - McKinney, Kevin L.
A1  - Schmutte, Ian M.
AB  - Modeling Endogenous Mobility in Earnings Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. Replication code can be found at DOI: http://doi.org/10.5281/zenodo.zenodo.376600 and our Github repository endogenous-mobility-replication .
PB  - Cornell University
UR  - http://hdl.handle.net/1813/40306
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits
Y1  - 2016
A1  - Spencer, Bruce D.
AB  - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits Spencer, Bruce D. Census costs affect data quality and data quality affects census benefits. Although measuring census data quality is difficult enough ex post, census planning requires it to be done well in advance. The topic of this talk is the prediction of the cost-quality curve, its uncertainty, and its relation to benefits from census data. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting
PB  - Northwestern University
UR  - http://hdl.handle.net/1813/43897
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study
Y1  - 2016
A1  - Mccue, Kristin
A1  - Abowd, John
A1  - Levenstein, Margaret
A1  - Patki, Dhiren
A1  - Rodgers, Ann
A1  - Shapiro, Matthew
A1  - Wasi, Nada
AB  - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting
PB  - University of Michigan
UR  - http://hdl.handle.net/1813/43895
ER  - 

TY  - JOUR
T1  - Spatial Variation in the Quality of American Community Survey Estimates
JF  - Demography
Y1  - 2016
A1  - Folch, David C.
A1  - Arribas-Bel, Daniel
A1  - Koschinsky, Julia
A1  - Spielman, Seth E.
VL  - 53
ER  - 

TY  - THES
T1  - Topics on Official Statistics and Statistical Policy
T2  - Statistics
Y1  - 2016
A1  - Zachary Seeskin
AB  - My dissertation studies decision questions for government statistical agencies, both regarding data collection and how to combine data from multiple sources. Informed decisions regarding expenditure on data collection require information about the effects of data quality on data use. For the first topic, I study two important uses of decennial census data in the U.S.: for apportioning the House of Representatives and for allocating federal funds. Estimates of distortions in these two uses are developed for different levels of census accuracy. Then, I thoroughly investigate the sensitivity of findings to the census error distribution and to the choice of how to measure the distortions. The chapter concludes with a proposed framework for partial cost-benefit analysis that charges a share of the cost of the census to allocation programs.    Then, I investigate an approximation to make analysis of the effects of census error on allocations feasible when allocations also depend on non-census statistics, as is the case for many formula-based allocations. The approximation conditions on the realized values of the non-census statistics instead of using the joint distribution over both census and non-census statistics. The research studies how using the approximation affects conclusions. I find that in some simple cases, the approximation always either overstates or equals the true effects of census error. Understatement is possible in other cases, but theory suggests that the largest possible understatements are about one-third the amount of the largest possible overstatements. In simulations with a more complex allocation formula, the approximation tends to overstate the effects of census error with the overstatement increasing with error in non-census statistics but decreasing with error in census statistics.    In the final chapter, I evaluate the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes.    Particularly, I evaluate the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. I examine three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compare how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method.
JF  - Statistics
PB  - Northwestern University
CY  - Evanston, Illinois
VL  - PHD
UR  - http://search.proquest.com/docview/1826016819
ER  - 

TY  - JOUR
T1  - Using Data Mining to Predict the Occurrence of Respondent Retrieval Strategies in Calendar Interviewing: The Quality of Retrospective Reports
JF  - Journal of Official Statistics
Y1  - 2016
A1  - Belli, Robert F.
A1  - Miller, L. Dee
A1  - Baghal, Tarek Al
A1  - Soh, Leen-Kiat
AB  - Determining which verbal behaviors of interviewers and respondents are dependent on one another is a complex problem that can be facilitated via data-mining approaches. Data are derived from the interviews of 153 respondents of the Panel Study of Income Dynamics (PSID) who were interviewed about their life-course histories. Behavioral sequences of interviewer-respondent interactions that were most predictive of respondents spontaneously using parallel, timing, duration, and sequential retrieval strategies in their generation of answers were examined. We also examined which behavioral sequences were predictive of retrospective reporting data quality as shown by correspondence between calendar responses with responses collected in prior waves of the PSID. The verbal behaviors of immediately preceding interviewer and respondent turns of speech were assessed in terms of their co-occurrence with each respondent retrieval strategy. Interviewers’ use of parallel probes is associated with poorer data quality, whereas interviewers’ use of timing and duration probes, especially in tandem, is associated with better data quality. Respondents’ use of timing and duration strategies is also associated with better data quality and both strategies are facilitated by interviewer timing probes. Data mining alongside regression techniques is valuable to examine which interviewer-respondent interactions will benefit data quality.
VL  - 32
IS  - 3
ER  - 

TY  - JOUR
T1  - Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples
JF  - Journal of Survey Statistics and Methodology
Y1  - 2015
A1  - Schifeling, T.
A1  - Cheng, C.
A1  - Hillygus, D. S.
A1  - Reiter, J. P.
AB  - Panel surveys typically su↵er from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, panel data alone cannot inform the extent of the bias from the attrition, so that analysts using the panel data alone must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during some later wave of the panel. Refreshment samples o↵er information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the refreshment sample itself. As we illustrate, nonignorable unit nonresponse can significantly compromise the analyst’s ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences—corrected for panel attrition—are to di↵erent assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
VL  - 3
UR  - http://jssam.oxfordjournals.org/content/3/3/265.abstract
IS  - 3
ER  - 

TY  - JOUR
T1  - Bayesian Analysis of Spatially-Dependent Functional Responses with 	Spatially-Dependent Multi-Dimensional Functional Predictors
JF  - Statistica Sinica
Y1  - 2015
A1  - Yang, W. H.
A1  - Wikle, C.K.
A1  - Holan, S.H.
A1  - Sudduth, K.
A1  - Meyers, D.B.
VL  - 25
UR  - http://www3.stat.sinica.edu.tw/preprint/SS-13-245w_Preprint.pdf
ER  - 

TY  - JOUR
T1  - Bayesian Latent Pattern Mixture Models for Handling Attrition in Panel Studies With Refreshment Samples
JF  - ArXiv
Y1  - 2015
A1  - Yajuan Si
A1  - Jerome P. Reiter
A1  - D. Sunshine Hillygus
KW  - Categorical
KW  - Dirichlet pro- cess
KW  - Multiple imputation
KW  - Non-ignorable
KW  - Panel attrition
KW  - Refreshment sample
AB  - Many panel studies collect refreshment samples---new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by non-ignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
UR  - http://arxiv.org/abs/1509.02124
IS  - 1509.02124
ER  - 

TY  - RPRT
T1  - Blocking Methods Applied to Casualty Records from the Syrian Conflict
Y1  - 2015
A1  - Sadosky, Peter
A1  - Shrivastava, Anshumali
A1  - Price, Megan
A1  - Steorts, Rebecca
JF  - ArXiv
UR  - http://arxiv.org/abs/1510.07714
ER  - 

TY  - JOUR
T1  - Capturing multivariate spatial dependence: Model, estimate, and then predict
JF  - Statistical Science
Y1  - 2015
A1  - Cressie, N.
A1  - Burden, S.
A1  - Davis, W.
A1  - Krivitsky, P.
A1  - Mokhtarian, P.
A1  - Seusse, T.
A1  - Zammit-Mangion, A.
VL  - 30
UR  - http://projecteuclid.org/euclid.ss/1433341474
IS  - 2
ER  - 

TY  - JOUR
T1  - Comparing and selecting spatial predictors using local criteria
JF  - Test
Y1  - 2015
A1  - Bradley, J.R.
A1  - Cressie, N.
A1  - Shi, T.
VL  - 24
UR  - http://dx.doi.org/10.1007/s11749-014-0415-1
IS  - 1
ER  - 

TY  - RPRT
T1  - Cost-Benefit Analysis for a Quinquennial Census: The 2016 Population Census of South Africa.
Y1  - 2015
A1  - Spencer, Bruce D.
A1  - May, Julian
A1  - Kenyon, Steven
A1  - Seeskin, Zachary H.
KW  - demographic statistics
KW  - fiscal allocations
KW  - loss function
KW  - population estimates
KW  - post-censal estimates
AB  - <p>The question of whether to carry out a quinquennial census is being faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. The authors describe uses, and limitations, of cost-benefit analysis for this decision problem in the case of the 2016 census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 census or to rely on increasingly inaccurate post-censal estimates accounting for births, deaths, and migration since the previous (2011) census. The cost-benefit analysis compared predicted costs of the 2016 census to the benefits from improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 census, although not the only important benefit. Without the 2016 census, allocations would be based on population estimates. Accuracy of the post-censal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 census, but instead to improve data and capacity for producing post-censal estimates.</p>
JF  - IPR Working Paper Series
PB  - Northwestern University, Institute for Policy Research
UR  - http://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-06.html
ER  - 

TY  - CONF
T1  - Determining Potential for Breakoff in Time Diary Survey Using Paradata
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Wettlaufer, D.
A1  - Arunachalam, H.
A1  - Atkin, G.
A1  - Eck, A.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors
T2  - International Conference on Total Survey Error
Y1  - 2015
A1  - Olson, K.
A1  - Smyth, J.D.
A1  - Kirchner, A.
JF  - International Conference on Total Survey Error
CY  - Baltimore, MD
UR  - http://www.niss.org/events/2015-international-total-survey-error-conference
ER  - 

TY  - CONF
T1  - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors
T2  - Joint Statistical Meetings
Y1  - 2015
A1  - Olson, K.
A1  - Smyth, J.D.
A1  - Kirchner, A.
JF  - Joint Statistical Meetings
CY  - Seattle, WA
UR  - http://www.amstat.org/meetings/jsm/2015/program.cfm
ER  - 

TY  - RPRT
T1  - Economic Analysis and Statistical Disclosure Limitation
Y1  - 2015
A1  - Abowd, John M.
A1  - Schmutte, Ian M.
AB  - <p>Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies.</p>
PB  - Cornell University
UR  - http://hdl.handle.net/1813/40581
ER  - 

TY  - JOUR
T1  - Economic Analysis and Statistical Disclosure Limitation
JF  - Brookings Papers on Economic Activity
Y1  - 2015
A1  - Abowd, John M.
A1  - Schmutte, Ian M.
AB  - Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies.
VL  - Spring 2015
UR  - http://www.brookings.edu/about/projects/bpea/papers/2015/economic-analysis-statistical-disclosure-limitation
ER  - 

TY  - JOUR
T1  - The Effect of CATI Questionnaire Design Features on Response Timing
JF  - Journal of Survey Statistics and Methodology
Y1  - 2015
A1  - Olson, K.
A1  - Smyth, J.D.
VL  - 3
IS  - 3
ER  - 

TY  - RPRT
T1  - Effects of Census Accuracy on Apportionment of Congress and Allocations of Federal Funds.
Y1  - 2015
A1  - Seeskin, Zachary H.
A1  - Spencer, Bruce D.
AB  - <p>How much accuracy is needed in the 2020 census depends on the cost of attaining accuracy and on the consequences of imperfect accuracy. The cost target for the 2020 census of the United States has been specified, and the Census Bureau is developing projections of the accuracy attainable for that cost. It is desirable to have information about the consequences of the accuracy that might be attainable for that cost or for alternative cost levels. To assess the consequences of imperfect census accuracy, Seeskin and Spencer consider alternative profiles of accuracy for states and assess their implications for apportionment of the U.S. House of Representatives and for allocation of federal funds. An error in allocation is defined as the difference between the allocation computed under imperfect data and the allocation computed with perfect data. Estimates of expected sums of absolute values of errors are presented for House apportionment and for federal funds allocations.</p>
JF  - IPR Working Paper Series
PB  - Northwestern University, Institute for Policy Research
UR  - http://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-05.html
ER  - 

TY  - JOUR
T1  - Entity Resolution with Empirically Motivated Priors
JF  - Bayesian Anal.
Y1  - 2015
A1  - Steorts, Rebecca C.
AB  - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian-type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey on income and wealth, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters.
VL  - 10
UR  - http://dx.doi.org/10.1214/15-BA965SI
ER  - 

TY  - JOUR
T1  - Entity resolution with empirically motivated priors
JF  - Bayesian Analysis
Y1  - 2015
A1  - Steorts, Rebecca C.
AB  - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian--type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters.
VL  - 10
UR  - http://projecteuclid.org/euclid.ba/1441790411
IS  - 5
ER  - 

TY  - JOUR
T1  - Expanding the Discourse on Antipoverty Policy: Reconsidering a Negative Income Tax
JF  - Journal of Poverty
Y1  - 2015
A1  - Jessica Wiederspan
A1  - Elizabeth Rhodes
A1  - H. Luke Shaefer
KW  - economic well-being
KW  - poverty alleviation
KW  - public policy
KW  - social welfare policy
AB  - This article proposes that advocates for the poor consider the replacement of the current means-tested safety net in the United States with a Negative Income Tax (NIT), a guaranteed income program that lifts families’ incomes above a minimum threshold. The article highlights gaps in service provision that leave millions in poverty, explains how a NIT could help fill those gaps, and compares current expenditures on major means-tested programs to estimated expenditures necessary for a NIT. Finally, it addresses the financial and political concerns that are likely to arise in the event that a NIT proposal gains traction among policy makers.
VL  - 19
UR  - http://dx.doi.org/10.1080/10875549.2014.991889
ER  - 

TY  - RPRT
T1  - How individuals smooth spending: Evidence from the 2013 government shutdown using account data
Y1  - 2015
A1  - Gelman, Michael
A1  - Kariv, Shachar
A1  - Shapiro, Matthew D
A1  - Silverman, Dan
A1  - Tadelis, Steven
AB  - Using comprehensive account records, this paper examines how individuals adjusted spending and saving in response to a temporary drop in income due to the 2013 U.S. government shutdown. The shutdown cut paychecks by 40% for affected employees, which was recovered within 2 weeks. Though the shock was short-lived and completely reversed, spending dropped sharply implying a naïve estimate of the marginal propensity to spend of 0.58. This estimate overstates how consumption responded. While many individuals had low liquidity, they used multiple strategies to smooth consumption including delay of recurring payments such as mortgages and credit card balances.
PB  - National Bureau of Economic Research
ER  - 

TY  - CONF
T1  - I Know What You Did Next: Predicting Respondent’s Next Activity Using Machine Learning
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Arunachalam, H.
A1  - Atkin, G.
A1  - Eck, A.
A1  - Wettlaufer, D.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - RPRT
T1  - Introduction to The Survey of Income and Program Participation (SIPP)
Y1  - 2015
A1  - Shaefer, H. Luke
AB  - Introduction to The Survey of Income and Program Participation (SIPP) Shaefer, H. Luke Goals for the SIPP Workshop Provide you with an introduction to the SIPP and get you up and running on the public-use SIPP files, offer some advanced tools for 2008 Panel SIPP data analysis, Get you some experience analyzing SIPP data, Introduce you to the SIPP EHC (SIPP Redesign), Introduce you to the SIPP Synthetic Beta (SSB) Presentation made on May 15, 2015 at the Census Bureau, and previously in 2014 at Duke University and University of Michigan
PB  - University of Michigan
UR  - http://hdl.handle.net/1813/40169
ER  - 

TY  - RPRT
T1  - Modeling Endogenous Mobility in Wage Determination
Y1  - 2015
A1  - Abowd, John M.
A1  - McKinney, Kevin L.
A1  - Schmutte, Ian M.
AB  - Modeling Endogenous Mobility in Wage Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/40306
ER  - 

TY  - RPRT
T1  - Modeling Endogenous Mobility in Wage Determination
Y1  - 2015
A1  - Abowd, John M.
A1  - McKinney, Kevin L.
A1  - Schmutte, Ian M.
AB  - Modeling Endogenous Mobility in Wage Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-ofsample estimates of revenue per worker. The mobility-bias corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52608
ER  - 

TY  - JOUR
T1  - Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis
JF  - Statistics in Medicine
Y1  - 2015
A1  - Siddique, J.
A1  - Reiter, J. P.
A1  - Brincks, A.
A1  - Gibbons, R.
A1  - Crespi, C.
A1  - Brown, C. H.
UR  - http://onlinelibrary.wiley.com/doi/10.1002/sim.6562/abstract
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey
Y1  - 2015
A1  - Spielman, Seth
AB  - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey Spielman, Seth Presentation at the NCRN Meeting Spring 2015
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/40183
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
Y1  - 2015
A1  - Abowd, John M.
A1  - Schmutte, Ian
AB  - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John M.; Schmutte, Ian Presentation at the NCRN Meeting Spring 2015
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/40184
ER  - 

TY  - CONF
T1  - Predicting Breakoff Using Sequential Machine Learning Methods
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Soh, L.-K.
A1  - Eck, A.
A1  - McCutcheon, A.L.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - THES
T1  - Probabilistic Hashing Techniques For Big Data
T2  - Computer Science
Y1  - 2015
A1  - Anshumali Shrivastava
AB  - We investigate probabilistic hashing techniques for addressing computational and memory challenges in large scale machine learning and data mining systems. In this thesis, we show that the traditional idea of hashing goes far beyond near-neighbor search and there are some striking new possibilities. We show that hashing can improve state of the art large scale learning algorithms, and it goes beyond the conventional notions of pairwise similarities. Despite being a very well studied topic in literature, we found several opportunities for fundamentally improving some of the well know textbook hashing algorithms. In particular, we show that the traditional way of computing minwise hashes is unnecessarily expensive and without loosing anything we can achieve an order of magnitude speedup. We also found that for cosine similarity search there is a better scheme than SimHash. In the end, we show that the existing locality sensitive hashing framework itself is very restrictive, and we cannot have efficient algorithms for some important measures like inner products which are ubiquitous in machine learning. We propose asymmetric locality sensitive hashing (ALSH), an extended framework, where we show provable and practical efficient algorithms for Maximum Inner Product Search (MIPS). Having such an efficient solutions to MIPS directly scales up many popular machine learning algorithms. We believe that this thesis provides significant improvements to some of the heavily used subroutines in big-data systems, which we hope will be adopted.
JF  - Computer Science
PB  - Cornell University
VL  - Ph.D.
UR  - https://ecommons.cornell.edu/handle/1813/40886
ER  - 

TY  - THES
T1  - Ranking Firms Using Revealed Preference and Other Essays About Labor Markets
T2  - Department of Economics
Y1  - 2015
A1  - Isaac Sorkin
KW  - economics
KW  - labor markets
AB  - This dissertation contains essays on three questions about the labor market. Chapter 1 considers the question: why do some firms pay so much and some so little? Firms account for a substantial portion of earnings inequality. Although the standard explanation is that there are search frictions that support an equilibrium with rents, this chapter finds that compensating differentials for nonpecuniary characteristics are at least as important. To reach this finding, this chapter develops a structural search model and estimates it on U.S. administrative data. The model analyzes the revealed preference information in the labor market: specifically, how workers move between the 1.5 million firms in the data. With on the order of 1.5 million parameters, standard estimation approaches are infeasible and so the chapter develops a new estimation approach that is feasible on such big data. Chapter 2 considers the question: why do men and women work at different firms? Men work for higher-paying firms than women. The chapter builds on chapter 1 to consider two explanations for why men and women work in different firms. First, men and women might search from different offer distributions. Second, men and women might have different rankings of firms. Estimation finds that the main explanation for why men and women are sorted is that women search from a lower-paying offer distribution than men. Indeed, men and women are estimated to have quite similar rankings of firms. Chapter 3 considers the question: what are there long-run effects of the minimum wage? An empirical consensus suggests that there are small employment effects of minimum wage increases. This chapter argues that these are short-run elasticities. Long-run elasticities, which may differ from short-run elasticities, are more policy relevant. This chapter develops a dynamic industry equilibrium model of labor demand. The model makes two points. First, long-run regressions have been misinterpreted because even if the short- and long-run employment elasticities differ, standard methods would not detect a difference using U.S. variation. Second, the model offers a reconciliation of the small estimated short-run employment effects with the commonly found pass-through of minimum wage increases to product prices.
JF  - Department of Economics
PB  - University of Michigan
CY  - Ann Arbor, MI
UR  - http://hdl.handle.net/2027.42/116747
ER  - 

TY  - CONF
T1  - Recording What the Respondent Says: Does Question Format Matter?
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Smyth, J.D.
A1  - Olson, K.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Reducing the Margins of Error in the American Community Survey Through Data-Driven Regionalization
JF  - PlosOne
Y1  - 2015
A1  - Folch, D.
A1  - Spielman, S. E.
UR  - http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115626
ER  - 

TY  - JOUR
T1  - Rejoinder on: Comparing and selecting spatial predictors using local criteria
JF  - Test
Y1  - 2015
A1  - Bradley, J.R.
A1  - Cressie, N.
A1  - Shi, T.
VL  - 24
UR  - http://dx.doi.org/10.1007/s11749-014-0414-2
IS  - 1
ER  - 

TY  - JOUR
T1  - The SAR model for very large datasets: A reduced-rank approach
JF  - Econometrics
Y1  - 2015
A1  - Burden, S.
A1  - Cressie, N.
A1  - Steel, D.G.
VL  - 3
UR  - http://www.mdpi.com/2225-1146/3/2/317
IS  - 2
ER  - 

TY  - JOUR
T1  - Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples
JF  - Political Analysis
Y1  - 2015
A1  - Y. Si
A1  - J.P. Reiter
A1  - D.S. Hillygus
VL  - 23
UR  - http://pan.oxfordjournals.org/cgi/reprint/mpu009?%20ijkey=joX8eSl6gyIlQKP&keytype=ref
ER  - 

TY  - JOUR
T1  - Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach
JF  - Annals of the Association of American Geographers
Y1  - 2015
A1  - Seth E. Spielman
A1  - Alex Singleton
AB  - In 2010 the American Community Survey (ACS) replaced the long form of the decennial census as the sole national source of demographic and economic data for small geographic areas such as census tracts. These small area estimates suffer from large margins of error, however, which makes the data difficult to use for many purposes. The value of a large and comprehensive survey like the ACS is that it provides a richly detailed, multivariate, composite picture of small areas. This article argues that one solution to the problem of large margins of error in the ACS is to shift from a variable-based mode of inquiry to one that emphasizes a composite multivariate picture of census tracts. Because the margin of error in a single ACS estimate, like household income, is assumed to be a symmetrically distributed random variable, positive and negative errors are equally likely. Because the variable-specific estimates are largely independent from each other, when looking at a large collection of variables these random errors average to zero. This means that although single variables can be methodologically problematic at the census tract scale, a large collection of such variables provides utility as a contextual descriptor of the place(s) under investigation. This idea is demonstrated by developing a geodemographic typology of all U.S. census tracts. The typology is firmly rooted in the social scientific literature and is organized around a framework of concepts, domains, and measures. The typology is validated using public domain data from the City of Chicago and the U.S. Federal Election Commission. The typology, as well as the data and methods used to create it, is open source and published freely online.
VL  - 105
UR  - http://dx.doi.org/10.1080/00045608.2015.1052335
ER  - 

TY  - JOUR
T1  - Understanding the Dynamics of $2-a-Day Poverty in the United States
JF  - The Russell Sage Foundation Journal of the Social Sciences
Y1  - 2015
A1  - Shaefer, H. Luke
A1  - Edin, Kathryn
A1  - Talbert, E.
VL  - 1
IS  - Severe Deprivation
ER  - 

TY  - JOUR
T1  - Understanding the Human Condition through Survey Informatics
JF  - IEEE Computer
Y1  - 2015
A1  - Eck, A.
A1  - Leen-Kiat, S.
A1  - McCutcheon, A. L.
A1  - Smyth, J.D.
A1  - Belli, R.F.
VL  - 48
IS  - 11
ER  - 

TY  - CONF
T1  - Using Data Mining to Examine Interviewer-Respondent Interactions in Calendar Interviews
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Belli, R.F.
A1  - Miller, L.D.
A1  - Soh, L.-K.
A1  - T. Al Baghal
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Using Machine Learning Techniques to Predict Respondent Type from A Priori Demographic Information
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Atkin, G.
A1  - Arunachalam, H.
A1  - Eck, A.
A1  - Wettlaufer, D.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Why Do Interviewers Speed Up? An Examination of Changes in Interviewer Behaviors over the Course of the Survey Field Period
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Olson, K.
A1  - Smyth, J.D.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Causes and Patterns of Uncertainty in the American Community Survey
JF  - Applied Geography
Y1  - 2014
A1  - Spielman, S. E.
A1  - Folch, D.
A1  - Nagle, N.
VL  - 46
UR  - http://www.sciencedirect.com/science/article/pii/S0143622813002518
ER  - 

TY  - JOUR
T1  - The Co-Evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: a Schelling Model
JF  - Transactions in GIS
Y1  - 2014
A1  - Spielman, S. E.
A1  - Harrison, P.
VL  - 18
UR  - http://onlinelibrary.wiley.com/enhanced/doi/10.1111/tgis.12014/
ER  - 

TY  - CHAP
T1  - A Comparison of Blocking Methods for Record Linkage
T2  - Privacy in Statistical Databases
Y1  - 2014
A1  - Steorts, R.
A1  - Ventura, S.
A1  - Sadinle, M.
A1  - Fienberg, S. E.
A1  - Domingo-Ferrer, J.
JF  - Privacy in Statistical Databases
PB  - Springer
VL  - 8744
UR  - http://link.springer.com/chapter/10.1007/978-3-319-11257-2_20
ER  - 

TY  - JOUR
T1  - A Comparison of Spatial Predictors when Datasets Could be Very Large
JF  - ArXiv
Y1  - 2014
A1  - Bradley, J. R.
A1  - Cressie, N.
A1  - Shi, T.
KW  - Statistics - Methodology
AB  - <p><span style="color: rgb(0, 0, 0); font-family: 'Lucida Grande', helvetica, arial, verdana, sans-serif; font-size: 14px;">In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of&nbsp;</span><span aria-readonly="true" class="MathJax" id="MathJax-Element-1-Frame" role="textbox" style="display: inline; line-height: normal; font-size: 14px; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; border: 0px; padding: 0px; margin: 0px; color: rgb(0, 0, 0); font-family: 'Lucida Grande', helvetica, arial, verdana, sans-serif;"><nobr style="transition: none; -webkit-transition: none; border: 0px; padding: 0px; margin: 0px; max-width: 5000em; max-height: 5000em; vertical-align: 0px; line-height: normal;"><span class="math" id="MathJax-Span-1" style="transition: none; -webkit-transition: none; display: inline-block; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; width: 2.24em;"><span style="transition: none; -webkit-transition: none; display: inline-block; position: relative; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; width: 1.867em; height: 0px; font-size: 17px;"><span style="transition: none; -webkit-transition: none; position: absolute; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; clip: rect(0.141em, 1000em, 1.317em, -0.497em); top: -0.992em; left: 0em;"><span class="mrow" id="MathJax-Span-2" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span class="msubsup" id="MathJax-Span-3" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span style="transition: none; -webkit-transition: none; display: inline-block; position: relative; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; width: 1.826em; height: 0px;"><span style="transition: none; -webkit-transition: none; position: absolute; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; clip: rect(1.658em, 1000em, 2.698em, -0.497em); top: -2.509em; left: 0em;"><span class="texatom" id="MathJax-Span-4" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span class="mrow" id="MathJax-Span-5" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span class="mi" id="MathJax-Span-6" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; font-family: STIXGeneral-Regular;">C</span><span class="mi" id="MathJax-Span-7" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; font-family: STIXGeneral-Regular;">O</span></span></span></span><span style="transition: none; -webkit-transition: none; position: absolute; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; top: -1.892em; left: 1.401em;"><span class="texatom" id="MathJax-Span-8" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span class="mrow" id="MathJax-Span-9" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal;"><span class="mn" id="MathJax-Span-10" style="transition: none; -webkit-transition: none; display: inline; position: static; border: 0px; padding: 0px; margin: 0px; vertical-align: 0px; line-height: normal; font-size: 12px; font-family: STIXGeneral-Regular;">2</span></span></span></span></span></span></span></span></span></span></nobr></span><span style="color: rgb(0, 0, 0); font-family: 'Lucida Grande', helvetica, arial, verdana, sans-serif; font-size: 14px;">&nbsp;data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.</span></p>
UR  - http://arxiv.org/abs/1410.7748
IS  - 1410.7748
ER  - 

TY  - JOUR
T1  - Dasymetric Modeling and Uncertainty
JF  - The Annals of the Association of American Geographers
Y1  - 2014
A1  - Nagle, N.
A1  - Buttenfield, B.
A1  - Leyk, S.
A1  - Spielman, S. E.
VL  - 104
UR  - http://www.tandfonline.com/doi/abs/10.1080/00045608.2013.843439
ER  - 

TY  - CONF
T1  - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation
T2  - UNL/SRAM/Gallup Symposium
Y1  - 2014
A1  - Atkin, G.
A1  - Arunachalam, H.
A1  - Eck, A.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - UNL/SRAM/Gallup Symposium
CY  - Omaha, NE
UR  - http://grc.unl.edu/unlsramgallup-symposium
ER  - 

TY  - CONF
T1  - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation
T2  - American Association for Public Opinion Research 2014 Annual Conference
Y1  - 2014
A1  - Atkin, G.
A1  - Arunachalam, H.
A1  - Eck, A.
A1  - Soh, L.-K.
A1  - Belli, R.
JF  - American Association for Public Opinion Research 2014 Annual Conference
CY  - Anaheim, CA.
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach
JF  - Annals of Applied Statistics
Y1  - 2014
A1  - Sadinle, M.
VL  - 8
ER  - 

TY  - CONF
T1  - The Effect of CATI Questionnaire Design Features on Response Timing
T2  - American Association for Public Opinion Research 2014 Annual Conference
Y1  - 2014
A1  - Olson, K.
A1  - Smyth, Jolene
JF  - American Association for Public Opinion Research 2014 Annual Conference
CY  - Anaheim, CA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Entity Resolution with Empirically Motivated Priors
JF  - ArXiv
Y1  - 2014
A1  - Steorts, R. C.
KW  - Statistics - Methodology
AB  - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian--type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters.
UR  - http://arxiv.org/abs/1409.0643
IS  - 1409.0643
ER  - 

TY  - JOUR
T1  - Harnessing Naturally Occurring Data to Measure the Response of Spending to Income
JF  - Science
Y1  - 2014
A1  - Gelman, M.
A1  - Kariv, S.
A1  - Shapiro, M.D.
A1  - Silverman, D.
A1  - Tadelis, S.
AB  - This paper presents a new data infrastructure for measuring economic activity. The infrastructure records transactions and account balances, yielding measurements with scope and accuracy that have little precedent in economics. The data are drawn from a diverse population that overrepresents males and younger adults but contains large numbers of underrepresented groups. The data infrastructure permits evaluation of a benchmark theory in economics that predicts that individuals should use a combination of cash management, saving, and borrowing to make the timing of income irrelevant for the timing of spending. As in previous studies and in contrast to the predictions of the theory, there is a response of spending to the arrival of anticipated income. The data also show, however, that this apparent excess sensitivity of spending results largely from the coincident timing of regular income and regular spending. The remaining excess sensitivity is concentrated among individuals with less liquidity.   Link to data at Berkeley Econometrics Lab (EML): https://eml.berkeley.edu/cgi-bin/HarnessingDataScience2014.cgi
VL  - 345
UR  - http://www.sciencemag.org/content/345/6193/212.full
IS  - 11
ER  - 

TY  - CONF
T1  - Having a Lasting Impact: The Effects of Interviewer Errors on Data Quality
T2  - Midwest Association for Public Opinion Research Annual Conference
Y1  - 2014
A1  - Timm, A.
A1  - Olson, K.
A1  - Smyth, J.D.
JF  - Midwest Association for Public Opinion Research Annual Conference
CY  - Chicago, IL
UR  - http://www.mapor.org/conferences.html
ER  - 

TY  - CONF
T1  - Hours or Minutes: Does One Unit Fit All?
T2  - Midwest Association for Public Opinion Research Annual Conference
Y1  - 2014
A1  - Cochran, B.
A1  - Smyth, J.D.
JF  - Midwest Association for Public Opinion Research Annual Conference
CY  - Chicago, IL
UR  - http://www.mapor.org/conferences.html
ER  - 

TY  - JOUR
T1  - I Cheated, but only a Little–Partial Confessions to Unethical Behavior
JF  - Journal of Personality and Social Psychology
Y1  - 2014
A1  - Peer, E.
A1  - Acquisti, A.
A1  - Shalvi, S.
VL  - 106
ER  - 

TY  - JOUR
T1  - Identifying Regions based on Flexible User Defined Constraints
JF  - International Journal of Geographic Information Science
Y1  - 2014
A1  - Folch, D.
A1  - Spielman, S. E.
VL  - 28
UR  - http://www.tandfonline.com/doi/abs/10.1080/13658816.2013.848986
ER  - 

TY  - CONF
T1  - Making sense of paradata: Challenges faced and lessons learned
T2  - American Association for Public Opinion Research 2014 Annual Conference
Y1  - 2014
A1  - Eck, A.
A1  - Stuart, L.
A1  - Atkin, G.
A1  - Soh, L-K
A1  - McCutcheon, A.L.
A1  - Belli, R.F.
JF  - American Association for Public Opinion Research 2014 Annual Conference
CY  - Anaheim, CA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Making Sense of Paradata: Challenges Faced and Lessons Learned
T2  - UNL/SRAM/Gallup Symposium
Y1  - 2014
A1  - Eck, A.
A1  - Stuart, L.
A1  - Atkin, G.
A1  - Soh, L-K
A1  - McCutcheon, A.L.
A1  - Belli, R.F.
JF  - UNL/SRAM/Gallup Symposium
CY  - Omaha, NE
UR  - http://grc.unl.edu/unlsramgallup-symposium
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation
Y1  - 2014
A1  - Steorts, Rebecca
A1  - Shalizi, Cosma
AB  - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation Steorts, Rebecca; Shalizi, Cosma Presentation from NCRN Fall 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37748
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth
Y1  - 2014
A1  - Dunn, Abe
A1  - Liebman, Eli
A1  - Shapiro, Adam
AB  - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth Dunn, Abe; Liebman, Eli; Shapiro, Adam
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37411
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Designer Census Geographies
Y1  - 2014
A1  - Spielman, Seth
AB  - NCRN Meeting Fall 2014: Designer Census Geographies Spielman, Seth Presentation from NCRN Fall 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37747
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System
Y1  - 2014
A1  - Spiller, Michael (Trey)
AB  - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System Spiller, Michael (Trey)
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37414
ER  - 

TY  - RPRT
T1  - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Y1  - 2014
A1  - Schneider, Matthew J.
A1  - Abowd, John M.
AB  - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data Schneider, Matthew J.; Abowd, John M. Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/40828
ER  - 

TY  - JOUR
T1  - The Past, Present, and Future of Geodemographic Research in the Unites States and United Kingdom
JF  - The Professional Geographer
Y1  - 2014
A1  - Singleton, A.
A1  - Spielman, S. E.
VL  - 4
ER  - 

TY  - RPRT
T1  - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization
Y1  - 2014
A1  - Spielman, Seth
A1  - Folch, David
AB  - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Spielman, Seth; Folch, David The American Community Survey (ACS) is the largest US survey of households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article develops a spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to post-process survey data in order to reduce the margins of error to some user-specified threshold.
PB  - University of Colorado at Boulder / University of Tennessee
UR  - http://hdl.handle.net/1813/38121
ER  - 

TY  - CONF
T1  - SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication
T2  - AISTATS 2014 Proceedings, JMLR
Y1  - 2014
A1  - Steorts, R.
A1  - Hall, R.
A1  - Fienberg, S. E.
JF  - AISTATS 2014 Proceedings, JMLR
PB  - W& CP
VL  - 33
ER  - 

TY  - RPRT
T1  - Sorting Between and Within Industries: A Testable Model of Assortative Matching
Y1  - 2014
A1  - Abowd, John M.
A1  - Kramarz, Francis
A1  - Perez-Duarte, Sebastien
A1  - Schmutte, Ian M.
AB  - Sorting Between and Within Industries: A Testable Model of Assortative Matching Abowd, John M.; Kramarz, Francis; Perez-Duarte, Sebastien; Schmutte, Ian M. We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/52607
ER  - 

TY  - JOUR
T1  - Spatial Collective Intelligence? Accuracy, Credibility in Crowdsourced Data
JF  - Cartography and Geographic Information Science
Y1  - 2014
A1  - Spielman, S. E.
VL  - 41
UR  - http://go.galegroup.com/ps/i.do?action=interpret&id=GALE|A361943563&v=2.1&u=nysl_sc_cornl&it=r&p=AONE&sw=w&authCount=1
IS  - 2
ER  - 

TY  - CONF
T1  - Supporting Planners' Work with Uncertain Demographic Data
T2  - GIScience Workshop on Uncertainty Visualization
Y1  - 2014
A1  - Griffin, A. L.
A1  - Spielman, S. E.
A1  - Jurjevich, J.
A1  - Merrick, M.
A1  - Nagle, N. N.
A1  - Folch, D. C.
JF  - GIScience Workshop on Uncertainty Visualization
VL  - 23
UR  - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf.
ER  - 

TY  - CONF
T1  - Supporting Planners' work with Uncertain Demographic Data
T2  - Proceedings of IEEE VIS 2014
Y1  - 2014
A1  - Griffin, A. L.
A1  - Spielman, S. E.
A1  - Nagle, N. N.
A1  - Jurjevich, J.
A1  - Merrick, M.
A1  - Folch, D. C.
JF  - Proceedings of IEEE VIS 2014
PB  - Proceedings of IEEE VIS 2014
UR  - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf
ER  - 

TY  - CONF
T1  - Survey Informatics: Ideas, Opportunities, and Discussions
T2  - UNL/SRAM/Gallup Symposium
Y1  - 2014
A1  - Eck, A.
A1  - Soh, L-K
JF  - UNL/SRAM/Gallup Symposium
CY  - Omaha, NE
UR  - http://grc.unl.edu/unlsramgallup-symposium
ER  - 

TY  - RPRT
T1  - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates
Y1  - 2014
A1  - Folch, David C.
A1  - Arribas-Bel, Daniel
A1  - Koschinsky, Julia
A1  - Spielman, Seth E.
AB  - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates Folch, David C.; Arribas-Bel, Daniel; Koschinsky, Julia; Spielman, Seth E. The U.S. Census Bureau's American Community Survey (ACS) is the foundation of social science research, much federal resource allocation and the development of public policy and private sector decisions. However, the high uncertainty associated with some of the ACS's most frequently used estimates can jeopardize the accuracy of inferences based on these data. While there is high level understanding in the research community that problems exist in the data, the sources and implications of these problems have been largely overlooked. Using 2006-2010 ACS median household income at the census tract scale as the test case (where a third of small-area estimates have higher than recommend errors), we explore the patterns in the uncertainty of ACS data. We consider various potential sources of uncertainty in the data, ranging from response level to geographic location to characteristics of the place. We find that there exist systematic patterns in the uncertainty in both the spatial and attribute dimensions. Using a regression framework, we identify the factors that are most frequently correlated with the error at national, regional and metropolitan area scales, and find these correlates are not consistent across the various locations tested. The implication is that data quality varies in different places, making cross-sectional analysis both within and across regions less reliable. We also present general advice for data users and potential solutions to the challenges identified.
PB  - University of Colorado at Boulder / University of Tennessee
UR  - http://hdl.handle.net/1813/38122
ER  - 

TY  - JOUR
T1  - An updated method for calculating income and payroll taxes from PSID data using the NBER’s TAXSIM, for PSID survey years 1999 through 2011
JF  - Unpublished manuscript, University of Michigan. Accessed May
Y1  - 2014
A1  - Kimberlin, Sara
A1  - Kim, Jiyoun
A1  - Shaefer, Luke
AB  - This paper describes a method to calculate income and payroll taxes from Panel Study of Income Dynamics data using the NBERʼs Internet TAXSIM version 9 (http://users.nber.org/~taxsim/taxsim9/), for PSID survey years 1999, 2001, 2003, 2005. 2007, 2009, and 2011 (tax years n-1). These methods are implemented in two Stata programs, designed to be used with the PSID public-use zipped Main Interview data files: PSID_TAXSIM_1of2.do and PSID_TAXSIM_2of2.do. The main program (2of2) was written by Sara Kimberlin (skimberlin@berkeley.edu) and generates all TAXSIM input variables, runs TAXSIM, adjusts tax estimates using additional information available in PSID data, and calculates total PSID family unit taxes. A separate program (1of2) was written by Jiyoon (June) Kim (junekim@umich.edu) in collaboration with Luke Shaefer (lshaefer@umich.edu) to calculate mortgage interest for itemized deductions; this program needs to be run first, before the main program. Jonathan Latner contributed code to use the programs with the PSID zipped data. The overall methods build on the strategy for using TAXSIM with PSID data outlined by Butrica & Burkhauser (1997), with some expansions and modifications. Note that the methods described below are designed to prioritize accuracy of income taxes calculated for low-income households, particularly refundable tax credits such as the Earned Income Tax Credit (EITC) and the Additional Child Tax Credit. Income tax liability is generally low for low-income households, and the amount of refundable tax credits is often substantially larger than tax liabilities for this population. Payroll tax can also be substantial for low-income households. Thus the methods below focus on maximizing accuracy of income tax and payroll tax calculations for low-income families, with less attention to tax items that largely impact higher-income households (e.g. the treatment of capital gains).
VL  - 6
ER  - 

TY  - RPRT
T1  - Using Social Media to Measure Labor Market Flows
Y1  - 2014
A1  - Antenucci, Dolan
A1  - Cafarella, Michael J
A1  - Levenstein, Margaret C.
A1  - Ré, Christopher
A1  - Shapiro, Matthew
UR  - http://www-personal.umich.edu/~shapiro/papers/LaborFlowsSocialMedia.pdf
ER  - 

TY  - CONF
T1  - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences
T2  - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS)
Y1  - 2014
A1  - Woodruff, A.
A1  - Pihur, V.
A1  - Acquisti, A.
A1  - Consolvo, S.
A1  - Schmidt, L.
A1  - Brandimarte, L.
JF  - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS)
PB  - ACM
CY  - New York, NY
UR  - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff
N1  - IAPP SOUPS Privacy Award Winner
ER  - 

TY  - RPRT
T1  - A Bayesian Approach to Graphical Record Linkage and De-duplication
Y1  - 2013
A1  - Steorts, Rebecca C.
A1  - Hall, Rob
A1  - Fienberg, Stephen E.
AB  - We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online.
JF  - arXiv
UR  - https://arxiv.org/abs/1312.4645
ER  - 

TY  - RPRT
T1  - b-Bit Minwise Hashing in Practice
Y1  - 2013
A1  - Li, Ping
A1  - Shrivastava, Anshumali
A1  - König, Arnd Christian
AB  - b-Bit Minwise Hashing in Practice Li, Ping; Shrivastava, Anshumali; König, Arnd Christian Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demon- strated a potential use of b-bit minwise hashing [23, 24] for ef- ficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical is- sues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/37986
ER  - 

TY  - CONF
T1  - b-Bit Minwise Hashing in Practice
T2  - Internetware'13
Y1  - 2013
A1  - Ping Li
A1  - Anshumali Shrivastava
A1  - König, Arnd Christian
AB  - Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demonstrated a potential use of b-bit minwise hashing [23, 24] for efficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical issues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications. Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations for each data vector. We developed a parallelization scheme using GPUs and observed that the preprocessing time can be reduced by a factor of 20   80 and becomes substantially smaller than the data loading time. Reducing the preprocessing time is highly beneficial in practice, e.g., for duplicate Web page detection (where minwise hashing is a major step in the crawling pipeline) or for increasing the testing speed of online classifiers. Another critical issue is that for very large data sets it becomes impossible to store a (fully) random permutation matrix, due to its space requirements. Our paper is the first study to demonstrate that b-bit minwise hashing implemented using simple hash functions, e.g., the 2-universal (2U) and 4-universal (4U) hash families, can produce very similar learning results as using fully random permutations. Experiments on datasets of up to 200GB are presented.
JF  - Internetware'13
UR  - http://www.nudt.edu.cn/internetware2013/
ER  - 

TY  - CONF
T1  - Beyond Pairwise: Provably Fast Algorithms for Approximate K-Way Similarity Search
T2  - Neural Information Processing Systems (NIPS)
Y1  - 2013
A1  - Anshumali Shrivastava
A1  - Ping Li
JF  - Neural Information Processing Systems (NIPS)
ER  - 

TY  - CONF
T1  - The Co-Evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: A Schelling Model
T2  - Transactions in GIS
Y1  - 2013
A1  - S.E. Spielman
A1  - Patrick Harrison
JF  - Transactions in GIS
ER  - 

TY  - JOUR
T1  - Do single mothers in the United States use the Earned Income Tax Credit to reduce unsecured debt?
JF  - Review of Economics of the Household
Y1  - 2013
A1  - Shaefer, H. Luke
A1  - Song, Xiaoqing
A1  - Williams Shanks, Trina R.
KW  - Earned Income Tax Credit Single Mothers Unsecured Debt
AB  - <p>The Earned Income Tax Credit (EITC) is a refundable credit for low income workers mainly targeted at families with children. This study uses the Survey of Income and Program Participation’s topical modules on Assets and Liabilities to examine associations between the EITC expansions during the early 1990s and the unsecured debt of the households of single mothers. We use two difference-in-differences comparisons over the study period 1988–1999, first comparing single mothers to single childless women, and then comparing single mothers with two or more children to single mothers with exactly one child. In both cases we find that the EITC expansions are associated with a relative decline in the unsecured debt of affected households of single mothers. While not direct evidence of a causal relationship, this is suggestive evidence that single mothers may have used part of their EITC to limit the growth of their unsecured debt during this period.</p>
N1  - NCRN
ER  - 

TY  - JOUR
T1  - On estimation of mean squared errors of benchmarked and empirical bayes estimators
JF  - Statistica Sinica
Y1  - 2013
A1  - Rebecca C. Steorts
A1  - Malay Ghosh
VL  - 23
ER  - 

TY  - CONF
T1  - Examining the relationship between error and behavior in the American Time Use Survey using audit trail paradata
T2  - American Association for Public Opinion Research 2013 Annual Conference
Y1  - 2013
A1  - Ruther, N.
A1  - T. Al Baghal
A1  - A. Eck
A1  - L. Stuart
A1  - L. Phillips
A1  - R. Belli
A1  - Soh, L-K
JF  - American Association for Public Opinion Research 2013 Annual Conference
CY  - Boston, MA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - RPRT
T1  - Fast Near Neighbor Search in High-Dimensional Binary Data
Y1  - 2013
A1  - Shrivastava, Anshumali
A1  - Li, Ping
AB  - Fast Near Neighbor Search in High-Dimensional Binary Data Shrivastava, Anshumali; Li, Ping Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/37987
ER  - 

TY  - JOUR
T1  - From Facebook Regrets to Facebook Privacy Nudges
JF  - Ohio State Law Journal
Y1  - 2013
A1  - Wang, Y.
A1  - Leon, P. G.
A1  - Chen, X.
A1  - Komanduri, S.
A1  - Norcie, G.
A1  - Scott, K.
A1  - Acquisti, A.
A1  - Cranor, L. F.
A1  - Sadeh, N.
N1  - Invited paper
ER  - 

TY  - JOUR
T1  - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Record Systems
JF  - Journal of the American Statistical Association
Y1  - 2013
A1  - Sadinle, M.
A1  - Fienberg, S. E.
VL  - 108
UR  - http://dx.doi.org/10.1080/01621459.2012.757231
ER  - 

TY  - JOUR
T1  - Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples
JF  - Statist. Sci.
Y1  - 2013
A1  - Deng, Yiting
A1  - Hillygus, D. Sunshine
A1  - Reiter, Jerome P.
A1  - Si, Yajuan
A1  - Zheng, Siyu
AB  - Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples—new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel—offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007–2008 Associated Press–Yahoo! News Election Poll.
VL  - 28
UR  - http://dx.doi.org/10.1214/13-STS414
ER  - 

TY  - JOUR
T1  - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions
JF  - Spatial Statistics
Y1  - 2013
A1  - Sengupta, A.
A1  - Cressie, N.
KW  - EM algorithm
KW  - Empirical Bayes
KW  - Geostatistical process
KW  - Maximum likelihood estimation
KW  - MCMC
KW  - SRE model
VL  - 4
UR  - http://www.sciencedirect.com/science/article/pii/S2211675313000055
ER  - 

TY  - JOUR
T1  - Identifying Neighborhoods Using High Resolution Population Data
JF  - Annals of the Association of American Geographers
Y1  - 2013
A1  - S.E. Spielman
A1  - J. Logan
VL  - 103
ER  - 

TY  - JOUR
T1  - Neighborhood contexts, health, and behavior: understanding the role of scale and residential sorting
JF  - Environment and Planning B
Y1  - 2013
A1  - Spielman, S. E.
A1  - Linkletter, C.
A1  - Yoo, E.-H.
VL  - 3
ER  - 

TY  - JOUR
T1  - Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys
JF  - Journal of Educational and Behavioral Statistics
Y1  - 2013
A1  - Si, Y.
A1  - Reiter, J.P.
VL  - 38
UR  - http://www.stat.duke.edu/ jerry/Papers/StatinMed14.pdf
ER  - 

TY  - CONF
T1  - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of autobiographical recall in surveys
T2  - Biennial conference of the Society for Applied Research in Memory and Cognition
Y1  - 2013
A1  - Belli, R.F.
A1  - Miller, L.D.
A1  - Soh, L-K
A1  - T. Al Baghal
JF  - Biennial conference of the Society for Applied Research in Memory and Cognition
CY  - Rotterdam, Netherlands
UR  - http://static1.squarespace.com/static/504170d6e4b0b97fe5a59760/t/52457a8be4b0012b7a5f462a/1380285067247/SARMAC_X_PaperJune27.pdf
ER  - 

TY  - CONF
T1  - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of retrospective reports
T2  - American Association for Public Opinion Research 2013 Annual Conference
Y1  - 2013
A1  - Belli, R.F.
A1  - Miller, L.D.
A1  - Soh, L-K
A1  - T. Al Baghal
JF  - American Association for Public Opinion Research 2013 Annual Conference
CY  - Boston, MA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - RPRT
T1  - Reconsidering the Consequences of Worker Displacements: Survey versus Administrative Measurements
Y1  - 2013
A1  - Flaaen, Aaron
A1  - Shapiro, Matthew
A1  - Isaac Sorkin
AB  - Displaced workers suffer persistent earnings losses. This stark finding has been established by  following workers in administrative data after mass layoffs under the presumption that these are  involuntary job losses owing to economic distress. Using linked survey and administrative data,  this paper examines this presumption by matching worker-supplied reasons for separations with  what is happening at the firm. The paper documents substantially different earnings dynamics  in mass layoffs depending on the reason the worker gives for the separation. Using a new methodology  for accounting for the increase in the probability of separation among all types of survey  response during in a mass layoff, the paper finds earnings loss estimates that are surprisingly  close to those using only administrative data. Finally, the survey-administrative link allows the  decomposition of earnings losses due to subsequent nonemployment into non-participation and  unemployment. Including the zero earnings of those identified as being unemployed substantially  increases the estimate of earnings losses.
PB  - University of Michigan
UR  - http://www-personal.umich.edu/~shapiro/papers/ReconsideringDisplacements.pdf
ER  - 

TY  - JOUR
T1  - Ringtail: Feature Selection for Easier Nowcasting.
JF  - WebDB
Y1  - 2013
A1  - Antenucci, Dolan
A1  - Cafarella, Michael J
A1  - Levenstein, Margaret C.
A1  - Ré, Christopher
A1  - Shapiro, Matthew
AB  - In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have.  We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user.
UR  - http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf
ER  - 

TY  - JOUR
T1  - Rising extreme poverty in the United States and the response of means-tested transfers.
JF  - Social Service Review
Y1  - 2013
A1  - H. Luke Shaefer
A1  - Edin, K.
AB  - This study documents an increase in the prevalence of extreme poverty among US households with children between 1996 and 2011 and assesses the response of major federal means-tested transfer programs. Extreme poverty is defined using a World Bank metric of global poverty: \$2 or less, per person, per day. Using the 1996–2008 panels of the Survey of Income and Program Participation (SIPP), we estimate that in mid-2011, 1.65 million households with 3.55 million children were living in extreme poverty in a given month, based on cash income, constituting 4.3 percent of all nonelderly households with children. The prevalence of extreme poverty has risen sharply since 1996, particularly among those most affected by the 1996 welfare reform. Adding SNAP benefits to household income reduces the number of extremely poor households with children by 48.0 percent in mid-2011. Adding SNAP, refundable tax credits, and housing subsidies reduces it by 62.8 percent.
VL  - 87
UR  - http://www.jstor.org/stable/10.1086/671012
IS  - 2
ER  - 

TY  - JOUR
T1  - Two-stage Bayesian benchmarking as applied to small area estimation
JF  - TEST
Y1  - 2013
A1  - Rebecca C. Steorts
A1  - Malay Ghosh
KW  - small area estimation
VL  - 22
IS  - 4
ER  - 

TY  - THES
T1  - User Modeling via Machine Learning and Rule-based Reasoning to Understand and Predict Errors in Survey Systems
Y1  - 2013
A1  - Stuart, Leonard Cleve
PB  - University of Nebraska-Lincoln
UR  - http://digitalcommons.unl.edu/computerscidiss/70/
ER  - 

TY  - JOUR
T1  - Using High Resolution Population Data to Identify Neighborhoods and Determine their Boundaries
JF  - Annals of the Association of American Geographers
Y1  - 2013
A1  - Spielman, S. E.
A1  - Logan, J.
VL  - 103
UR  - http://www.tandfonline.com/doi/abs/10.1080/00045608.2012.685049
ER  - 

TY  - CONF
T1  - What are you doing now?: Audit trails, Activity level responses and error in the American Time Use Survey
T2  - American Association for Public Opinion Research
Y1  - 2013
A1  - T. Al Baghal
A1  - Phillips, A.L.
A1  - Ruther, N.
A1  - Belli, R.F.
A1  - Stuart, L.
A1  - Eck, A.
A1  - Soh, L-K
JF  - American Association for Public Opinion Research
CY  - Boston, MA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Bayesian Parametric and Nonparametric Inference for Multiple Record Likage
T2  - Modern Nonparametric Methods in Machine Learning Workshop
Y1  - 2012
A1  - Hall, R.
A1  - Steorts, R.
A1  - Fienberg, S. E.
JF  - Modern Nonparametric Methods in Machine Learning Workshop
PB  - NIPS
UR  - http://www.stat.cmu.edu/NCRN/PUBLIC/files/beka_nips_finalsub4.pdf
ER  - 

TY  - CONF
T1  - On Estimation of Mean Squared Errors of Benchmarked and Empirical Bayes Estimators
T2  - 2012 Joint Statistical Meetings
Y1  - 2012
A1  - Rebecca C. Steorts
A1  - Malay Ghosh
JF  - 2012 Joint Statistical Meetings
CY  - San Diego, CA
ER  - 

TY  - CONF
T1  - Exploring interviewer and respondent interactions: An innovative behavior coding approach
T2  - Midwest Association for Public Opinion Research 2012 Annual Conference
Y1  - 2012
A1  - Walton, L.
A1  - Stange, M.
A1  - Powell, R.
A1  - Belli, R.F.
JF  - Midwest Association for Public Opinion Research 2012 Annual Conference
CY  - Chicago, IL
UR  - http://www.mapor.org/conferences.html
ER  - 

TY  - ABST
T1  - Extreme Poverty in the United States, 1996 to 2011
Y1  - 2012
A1  - Shaefer, H. Luke
A1  - Edin, Kathryn
PB  - University of Michigan
UR  - http://www.npc.umich.edu/publications/policy_briefs/brief28/policybrief28.pdf
N1  - NCRN
ER  - 

TY  - CONF
T1  - Fast Multi-task Learning for Query Spelling Correction
T2  - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012)
Y1  - 2012
A1  - Xu Sun
A1  - Anshumali Shrivastava
A1  - Ping Li
JF  - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012)
UR  - http://dx.doi.org/10.1145/2396761.2396800
ER  - 

TY  - CONF
T1  - Fast Near Neighbor Search in High-Dimensional Binary Data
T2  - The European Conference on Machine Learning (ECML 2012)
Y1  - 2012
A1  - Anshumali Shrivastava
A1  - Ping Li
JF  - The European Conference on Machine Learning (ECML 2012)
ER  - 

TY  - RPRT
T1  - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Records Systems
Y1  - 2012
A1  - Mauricio Sadinle
A1  - Stephen E. Fienberg
JF  - arXiv
UR  - https://arxiv.org/abs/1205.3217
ER  - 

TY  - CONF
T1  - GPU-based minwise hashing: GPU-based minwise hashing
T2  - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume)
Y1  - 2012
A1  - Ping Li
A1  - Anshumali Shrivastava
A1  - Arnd Christian König
JF  - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume)
UR  - http://doi.acm.org/10.1145/2187980.2188129
ER  - 

TY  - ABST
T1  - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions
Y1  - 2012
A1  - Sengupta, A.
A1  - Cressie, N.
PB  - The Ohio State University
ER  - 

TY  - CONF
T1  - Logit-Based Confidence Intervals for Single Capture-Recapture Estimation
T2  - American Statistical Association Pittsburgh Chapter Banquet
Y1  - 2012
A1  - Mauricio Sadinle
JF  - American Statistical Association Pittsburgh Chapter Banquet
CY  - Pittsburgh, PA
N1  - April 9, 2012
ER  - 

TY  - CONF
T1  - Maintaining Quality in the Face of Rapid Program Expansion
T2  - 2012 Joint Statistical Meetings
Y1  - 2012
A1  - Cosma Shalizi
A1  - Rebecca Nugent
JF  - 2012 Joint Statistical Meetings
CY  - San Diego, CA
ER  - 

TY  - CONF
T1  - MulFiles Record Linkage Using a Generalized Fellegi-Sunter Framework
T2  - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University
Y1  - 2012
A1  - Mauricio Sadinle
JF  - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University
ER  - 

TY  - CONF
T1  - Query spelling correction using multi-task learning
T2  - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume)
Y1  - 2012
A1  - Xu Sun
A1  - Anshumali Shrivastava
A1  - Ping Li
JF  - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume)
UR  - http://doi.acm.org/10.1145/2187980.2188153
ER  - 

TY  - JOUR
T1  - Testing for Membership to the IFRA and the NBU Classes of Distributions
JF  - Journal of Machine Learning Research - Proceedings Track for the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012)
Y1  - 2012
A1  - Radhendushka Srivastava
A1  - Ping Li
A1  - Debasis Sengupta
VL  - 22
UR  - http://jmlr.csail.mit.edu/proceedings/papers/v22/srivastava12.html
ER  - 

TY  - CONF
T1  - Thinking inside the box: Mapping the microstructure of urban environment (and why it matters)
T2  - AutoCarto 2012
Y1  - 2012
A1  - Seth Spielman
A1  - David Folch
A1  - John Logan
A1  - Nicholas Nagle
KW  - cartography
JF  - AutoCarto 2012
CY  - Columbus, Ohio
UR  - http://www.cartogis.org/docs/proceedings/2012/Spielman_etal_AutoCarto2012.pdf
ER  - 

TY  - JOUR
T1  - The welfare reforms of the 1990s and the stratification of material well-being among low-income households with children
JF  - Children and Youth Services Review
Y1  - 2012
A1  - Shaefer, H. Luke
A1  - Ybarra, Marci
AB  - <p>We examine the incidence of material hardship experienced by low-income households with children, before and after the major changes to U.S. anti-poverty programs during the 1990s. We use the Survey of Income and ProgramParticipation (SIPP) to examine a series of measures of householdmaterial hardship thatwere collected in the years 1992, 1995, 1998, 2003 and 2005.We stratify our sample to differentiate between the 1) deeply poor (b50% of poverty), who sawa decline in public assistance over this period; and two groups that sawsome forms of public assistance increase: 2) other poor households (50–99% of poverty), and 3) the near poor (100–150% of poverty). We report bivariate trends over the study period, as well as presenting multivariate difference-indifferences estimates.We find suggestive evidence that material hardship—in the form of difficulty meeting essential household expenses, and falling behind on utilities costs—has generally increased among the deeply poor but has remained roughly the same for the middle group (50–99% of poverty), and decreased among the near poor (100–150% of poverty). Multivariate difference-in-differences estimates suggest that these trends have resulted in intensified stratification of the material well-being of low-income households with children.</p>
VL  - 34
N1  - NCRN
ER  - 

TY  - CONF
T1  - Approaches to Multiple Record Linkage
T2  - Proceedings of the 58th World Statistical Congress
Y1  - 2011
A1  - Sadinle, M.
A1  - Hall, R.
A1  - Fienberg, S. E.
JF  - Proceedings of the 58th World Statistical Congress
PB  - International Statistical Institute
CY  - Dublin
UR  - http://2011.isiproceedings.org/papers/450092.pdf
ER  - 

TY  - RPRT
T1  - Do Single Mothers in the United States use the Earned Income Tax Credit to Reduce Unsecured Debt?
Y1  - 2011
A1  - Shaefer, H. Luke
A1  - Song, Xiaoqing
A1  - Williams Shanks, Trina R.
AB  - Do Single Mothers in the United States use the Earned Income Tax Credit to Reduce Unsecured Debt? Shaefer, H. Luke; Song, Xiaoqing; Williams Shanks, Trina R. The Earned Income Tax Credit (EITC) is a refundable credit for low-income workers that is mainly targeted at families with children. This study uses the Survey of Income and Program Participation’s (SIPP) topical modules on Assets &amp; Liabilities to examine the effects of EITC expansions during the early 1990s on the unsecured debt of the households of single mothers. We use two difference-in-differences comparisons over the study period 1988 to 1999, first comparing single mothers to single childless women, and then comparing single mothers with two or more children to single mothers with exactly one child. In both cases we find that the EITC expansions are associated with a relative decline in the unsecured debt of affected households of single mothers. This suggests that single mothers may have used part of their EITC to limit the growth of their unsecured debt during this period.
PB  - University of Michigan
UR  - http://hdl.handle.net/1813/34516
ER  - 

TY  - ABST
T1  - Are Self-Description Scales Better than Agree/Disagree Scales in Mail and Telephone Surveys?
Y1  - 0
A1  - Timbrook, Jerry
A1  - Smyth, Jolene D.
A1  - Olson, Kristen
ER  - 

TY  - ABST
T1  - Are Self-Description Scales Better than Agree/Disagree Scales in Mail and Telephone Surveys?
Y1  - 0
A1  - Timbrook, Jerry
A1  - Smyth, Jolene D.
A1  - Olson, Kristen
ER  - 

TY  - JOUR
T1  - Bayesian estimation of bipartite matchings for record linkage
JF  - Journal of the American Statistical Association
Y1  - 0
A1  - Mauricio Sadinle
AB  - The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador.
ER  - 

TY  - JOUR
T1  - Biomass prediction using density dependent diameter distribution models
JF  - Annals of Applied Statistics
Y1  - 0
A1  - Schliep, E.M.
A1  - A.E. Gelfand
A1  - J.S. Clark
A1  - B.J. Tomasek
AB  - Prediction of aboveground biomass, particularly at large spatial scales, is necessary for estimating global-scale carbon sequestration. Since biomass can be measured only by sacrificing trees, total biomass on plots is never observed. Rather, allometric equations are used to convert individual tree diameter to individual biomass, perhaps with noise. The values for all trees on a plot are then summed to obtain a derived total biomass for the plot. Then, with derived total biomasses for a collection of plots, regression models, using appropriate environmental covariates, are employed to attempt explanation and prediction. Not surprisingly, when out-of-sample validation is examined, such a model will predict total biomass well for holdout data because it is obtained using exactly the same derived approach.    Apart from the somewhat circular nature of the regression approach, it also fails to employ the actual observed plot level response data. At each plot, we observe a random number of trees, each with an associated diameter, producing a sample of diameters. A model based on this random number of tree diameters provides understanding of how environmental regressors explain abundance of individuals, which in turn explains individual diameters.    We incorporate density dependence because the distribution of tree diameters over a plot of fixed size depends upon the number of trees on the plot. After fitting this model, we can obtain predictive distributions for individual-level biomass and plot-level total biomass. We show that predictive distributions for plot-level biomass obtained from a density-dependent model for diameters will be much different from predictive distributions using the regression approach. Moreover, they can be more informative for capturing uncertainty than those obtained from modeling derived plot-level biomass directly.    We develop a density-dependent diameter distribution model and illustrate with data from the national Forest Inventory and Analysis (FIA) database. We also describe how to scale predictions to larger spatial regions. Our predictions agree (in magnitude) with available wisdom on mean and variation in biomass at the hectare scale.
VL  - 11
UR  - https://projecteuclid.org/euclid.aoas/1491616884
IS  - 1
ER  - 

TY  - ABST
T1  - "During the LAST YEAR, Did You...": The Effect of Emphasis in CATI Survey Questions on Data Quality
Y1  - 0
A1  - Olson, Kristen
A1  - Smyth, Jolene D.
ER  - 

TY  - ABST
T1  - "During the LAST YEAR, Did You...": The Effect of Emphasis in CATI Survey Questions on Data Quality
Y1  - 0
A1  - Olson, Kristen
A1  - Smyth, Jolene D.
ER  - 

TY  - ABST
T1  - The Effect of Question Characteristics, Respondents and Interviewers on Question Reading Time and Question Reading Behaviors in CATI Surveys
Y1  - 0
A1  - Olson, Kristen
A1  - Smyth, Jolene
A1  - Kirchner, Antje
ER  - 

TY  - ABST
T1  - The Effects of Respondent and Question Characteristics on Respondent Behaviors
Y1  - 0
A1  - Ganshert, Amanda
A1  - Olson, Kristen
A1  - Smyth, Jolene
ER  - 

TY  - ABST
T1  - Going off Script: How Interviewer Behavior Affects Respondent Behaviors in Telephone Surveys
Y1  - 0
A1  - Kirchner, Antje
A1  - Olson, Kristen
A1  - Smyth, Jolene
ER  - 

TY  - ABST
T1  - How do Low Versus High Response Scale Ranges Impact the Administration and Answering of Behavioral Frequency Questions in Telephone Surveys?
Y1  - 0
A1  - Sarwar, Mazen
A1  - Olson, Kristen
A1  - Smyth, Jolene
ER  - 

TY  - ABST
T1  - How do Mismatches Affect Interviewer/Respondent Interactions in the Question/Answer Process?
Y1  - 0
A1  - Smyth, Jolene D.
A1  - Olson, Kristen
ER  - 

TY  - ABST
T1  - Interviewer Influence on Interviewer-Respondent Interaction During Battery Questions
Y1  - 0
A1  - Cochran, Beth
A1  - Olson, Kristen
A1  - Smyth, Jolene
ER  - 

TY  - ABST
T1  - Response Scales: Effects on Data Quality for Interviewer Administered Surveys
Y1  - 0
A1  - Sarwar, Mazen
A1  - Olson, Kristen
A1  - Smyth, Jolene
ER  - 

TY  - ABST
T1  - Using audit trails to evaluate an event history calendar survey instrument
Y1  - 0
A1  - Lee, Jinyoung
A1  - Seloske, Ben
A1  - Belli, Robert F.
ER  - 

TY  - ABST
T1  - Why do Mobile Interviews Take Longer? A Behavior Coding Perspective
Y1  - 0
A1  - Timbrook, Jerry
A1  - Smyth, Jolene
A1  - Olson, Kristen
ER  - 

TY  - ABST
T1  - Working with the SIPP-EHC audit trails:  Parallel and sequential retrieval
Y1  - 0
A1  - Lee, Jinyoung
A1  - Seloske, Ben
A1  - Córdova Cazar, Ana Lucía
A1  - Eck, Adam
A1  - Belli, Robert F.
ER  -