%0 Journal Article
%D forthcoming
%T The Response of Consumer Spending to Changes in Gasoline Prices
%A Gelman, Michael
%A Gorodnichenko, Yuriy
%A Kariv, Shachar
%A Koustas, Dmitri
%A Shapiro, Matthew D
%A Silverman, Daniel
%A Tadelis, Steven
%X This paper estimates how overall consumer spending responds to changes in gasoline prices. It uses the differential impact across consumers of the sudden, large drop in gasoline prices in 2014 for identification. This estimation strategy is implemented using comprehensive, daily transaction-level data for a large panel of individuals. The estimated marginal propensity to consume (MPC) is approximately one, a higher estimate than estimates found in less comprehensive or well-measured data. This estimate takes into account the elasticity of demand for gasoline and potential slow adjustment to changes in prices. The high MPC implies that changes in gasoline prices have large aggregate effects.
%G eng

%0 Report
%D 2017
%T Recalculating - How Uncertainty in Local Labor Market Definitions Affects Empirical Findings
%A Foote, Andrew
%A Kutzbach, Mark J.
%A Vilhuber, Lars
%X Recalculating - How Uncertainty in Local Labor Market Definitions Affects Empirical Findings Foote, Andrew; Kutzbach, Mark J.; Vilhuber, Lars This paper evaluates the use of commuting zones as a local labor market definition. We revisit Tolbert and Sizer (1996) and demonstrate the sensitivity of definitions to two features of the methodology. We show how these features impact empirical estimates using a well-known application of commuting zones. We conclude with advice to researchers using commuting zones on how to demonstrate the robustness of empirical findings to uncertainty in definitions. The analysis, conclusions, and opinions expressed herein are those of the author(s) alone and do not necessarily represent the views of the U.S. Census Bureau or the Federal Deposit Insurance Corporation. All results have been reviewed to ensure that no confidential information is disclosed, and no confidential data was used in this paper. This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Much of the work developing this paper occurred while Mark Kutzbach was an employee of the U.S. Census Bureau.
%I Cornell University
%G eng
%U http://hdl.handle.net/1813/52649
%9 Preprint

%0 Journal Article
%J Journal of the Royal Statistical Society -- Series B.
%D 2017
%T Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error
%A Bradley, J.R.
%A Wikle, C.K.
%A Holan, S.H.
%K American Community Survey
%K empirical orthogonal functions
%K MAUP
%K Reduced rank
%K Spatial basis functions
%K Survey data
%X The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds.
%B Journal of the Royal Statistical Society -- Series B.
%G eng
%U https://arxiv.org/abs/1502.01974

%0 Report
%D 2017
%T Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
%A Abowd, John
%A Schmutte, Ian M.
%X Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial.
%I NCRN Coordinating Office
%G eng
%U http://hdl.handle.net/1813/52612
%9 Preprint

%0 Report
%D 2017
%T Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
%A John M. Abowd
%A Ian M. Schmutte
%X We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial.
%B Labor Dynamics Institute Document
%8 04/2017
%G eng
%U http://digitalcommons.ilr.cornell.edu/ldi/37/

%0 Report
%D 2017
%T Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
%A Abowd, John
%A Schmutte, Ian M.
%X Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial. A complete archive of the data and programs used in this paper is available via http://doi.org/10.5281/zenodo.345385.
%I Cornell University
%G eng
%U http://hdl.handle.net/1813/39081
%9 Preprint

%0 Journal Article
%J Total Survey Error in Practice
%D 2017
%T The role of statistical disclosure limitation in total survey error
%A A. F. Karr
%K big data issues
%K data quality
%K data swapping
%K decision quality
%K risk-utility paradigms
%K Statistical Disclosure Limitation
%K total survey error
%X This chapter presents the thesis, which is statistical disclosure limitation (SDL) that ought to be viewed as an integral component of total survey error (TSE). TSE and SDL will move forward together, but integrating multiple criteria: cost, risk, data quality, and decision quality. The chapter explores the value of unifying two key TSE procedures - editing and imputation - with SDL. It discusses “Big data” issues, which contains a mathematical formulation that, at least conceptually and at some point in the future, does unify TSE and SDL. Modern approaches to SDL are based explicitly or implicitly on tradeoffs between disclosure risk and data utility. There are three principal classes of SDL methods: reduction/coarsening techniques; perturbative methods; and synthetic data methods. Data swapping is among the most frequently applied SDL methods for categorical data. The chapter sketches how it can be informed by knowledge of TSE.
%B Total Survey Error in Practice
%P 71 – 94
%G eng
%R 10.1002/9781119041702.ch4

%0 Report
%D 2016
%T Regression Modeling and File Matching Using Possibly Erroneous Matching Variables
%A Dalzell, N. M.
%A Reiter, J. P.
%K Statistics - Applications
%X Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both files. Typically, however, the resulting matches are inexact: some cross-classifications of the matching variables do not generate unique links across files. Further, the matching variables can be subject to reporting errors, which introduce additional uncertainty in analyses. We present a Bayesian file matching methodology designed to estimate regression models and match records simultaneously when categorical matching variables are subject to reporting error. The method relies on a hierarchical model that includes (1) the regression of interest involving variables from the two files given a vector indicating the links, (2) a model for the linking vector given the true values of the matching variables, (3) a measurement error model for reported values of the matching variables given their true values, and (4) a model for the true values of the matching variables. We describe algorithms for sampling from the posterior distribution of the model. We illustrate the methodology using artificial data and data from education records in the state of North Carolina.
%I ArXiv
%G eng
%U http://arxiv.org/abs/1608.06309

%0 Journal Article
%J Statistical Journal of the International Association for Official Statistics
%D 2016
%T Releasing synthetic magnitude micro data constrained to fixed marginal totals
%A Wei, Lan
%A Reiter, Jerome P.
%K Confidential
%K Disclosure
%K establishment
%K mixture
%K poisson
%K risk
%X We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariate distribution, and release draws from the posterior predictive distribution of the model. We develop approaches that guarantee the synthetic data sum to marginal totals computed from the original data, as well approaches that do not enforce this equality. For both cases, we present methods for assessing disclosure risks inherent in releasing synthetic magnitude microdata. We illustrate the methodology using economic data from a survey of manufacturing establishments.
%B Statistical Journal of the International Association for Official Statistics
%V 32
%P 93-108
%8 02/2016
%G eng
%U http://content.iospress.com/download/statistical-journal-of-the-iaos/sji959
%N 1
%& 93
%R 10.3233/SJI-160959

%0 Thesis
%B Department of Economics
%D 2015
%T Ranking Firms Using Revealed Preference and Other Essays About Labor Markets
%A Isaac Sorkin
%K economics
%K labor markets
%X This dissertation contains essays on three questions about the labor market. Chapter 1 considers the question: why do some firms pay so much and some so little? Firms account for a substantial portion of earnings inequality. Although the standard explanation is that there are search frictions that support an equilibrium with rents, this chapter finds that compensating differentials for nonpecuniary characteristics are at least as important. To reach this finding, this chapter develops a structural search model and estimates it on U.S. administrative data. The model analyzes the revealed preference information in the labor market: specifically, how workers move between the 1.5 million firms in the data. With on the order of 1.5 million parameters, standard estimation approaches are infeasible and so the chapter develops a new estimation approach that is feasible on such big data. Chapter 2 considers the question: why do men and women work at different firms? Men work for higher-paying firms than women. The chapter builds on chapter 1 to consider two explanations for why men and women work in different firms. First, men and women might search from different offer distributions. Second, men and women might have different rankings of firms. Estimation finds that the main explanation for why men and women are sorted is that women search from a lower-paying offer distribution than men. Indeed, men and women are estimated to have quite similar rankings of firms. Chapter 3 considers the question: what are there long-run effects of the minimum wage? An empirical consensus suggests that there are small employment effects of minimum wage increases. This chapter argues that these are short-run elasticities. Long-run elasticities, which may differ from short-run elasticities, are more policy relevant. This chapter develops a dynamic industry equilibrium model of labor demand. The model makes two points. First, long-run regressions have been misinterpreted because even if the short- and long-run employment elasticities differ, standard methods would not detect a difference using U.S. variation. Second, the model offers a reconciliation of the small estimated short-run employment effects with the commonly found pass-through of minimum wage increases to product prices.
%B Department of Economics
%I University of Michigan
%C Ann Arbor, MI
%G eng
%U http://hdl.handle.net/2027.42/116747
%9 Ph.D.

%0 Journal Article
%J The Stata Journal
%D 2015
%T Record Linkage using STATA: Pre-processing, Linking and Reviewing Utilities
%A Wasi, Nada
%A Flaaen, Aaron
%X In this article, we describe Stata utilities that facilitate probabilistic record linkage—the technique typically used for merging two datasets with no common record identifier. While the preprocessing tools are developed specifically for linking two company databases, the other tools can be used for many different types of linkage. Specifically, the stnd_compname and stnd_address commands parse and standardize company names and addresses to improve the match quality when linking. The reclink2 command is a generalized version of Blasnik's reclink (2010, Statistical Software Components S456876, Department of Economics, Boston College) that allows for many-to-one matching. Finally, clrevmatch is an interactive tool that allows the user to review matched results in an efficient and seamless manner. Rather than exporting results to another file format (for example, Excel), inputting clerical reviews, and importing back into Stata, one can use the clrevmatch tool to conduct all of these steps within Stata. This helps improve the speed and flexibility of matching, which often involves multiple runs.
%B The Stata Journal
%V 15
%P 1-15
%G eng
%U http://www.stata-journal.com/article.html?article=dm0082
%N 3

%0 Conference Paper
%B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
%D 2015
%T Recording What the Respondent Says: Does Question Format Matter?
%A Smyth, J.D.
%A Olson, K.
%B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
%C Hollywood, Florida
%G eng
%U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx

%0 Journal Article
%J PlosOne
%D 2015
%T Reducing the Margins of Error in the American Community Survey Through Data-Driven Regionalization
%A Folch, D.
%A Spielman, S. E.
%B PlosOne
%8 02/2015
%G eng
%U http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115626
%R 10.1371/journal.pone.0115626

%0 Journal Article
%J ArXiv
%D 2015
%T Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error
%A Bradley, J. R.
%A Wikle, C.K.
%A Holan, S. H.
%X The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds.
%B ArXiv
%G eng
%U http://arxiv.org/abs/1502.01974
%N 1502.01974

%0 Journal Article
%J Test
%D 2015
%T Rejoinder on: Comparing and selecting spatial predictors using local criteria
%A Bradley, J.R.
%A Cressie, N.
%A Shi, T.
%B Test
%V 24
%P 54-60
%8 03/2015
%G eng
%U http://dx.doi.org/10.1007/s11749-014-0414-2
%N 1
%R 10.1007/s11749-014-0414-2

%0 Thesis
%B Statistics Department
%D 2015
%T Relaxations of differential privacy and risk utility evaluations of synthetic data and fidelity measures
%A McClure, D.
%X Many organizations collect data that would be useful to public researchers, but cannot be shared due to promises of confidentiality to those that participated in the study. This thesis evaluates the risks and utility of several existing release methods, as well as develops new ones with different risk/utility tradeoffs.    In Chapter 2, I present a new risk metric, called model-specific probabilistic differ- ential privacy (MPDP), which is a relaxed version of differential privacy that allows the risk of a release to be based on the worst-case among plausible datasets instead of all possible datasets. In addition, I develop a generic algorithm called local sensitiv- ity random sampling (LSRS) that, under certain assumptions, is guaranteed to give releases that meet MPDP for any query with computable local sensitivity. I demon- strate, using several well-known queries, that LSRS releases have much higher utility than standard differentially private release mechanism, the Laplace Mechanism, at only marginally higher risk.    In Chapter 3, using to synthesis models, I empirically characterize the risks of releasing synthetic data under the standard “all but one” assumption on intruder background knowledge, as well the effect decreasing the number of observations the intruder knows beforehand has on that risk. I find in these examples that even in the “all but one” case, there is no risk except to extreme outliers, and even then the risk is mild. I find that the effect of removing observations from an intruder’s background knowledge has on risk heavily depends on how well that intruder can fill in those missing observations: the risk remains fairly constant if he/she can fill them in well, and the risk drops quickly if he/she cannot.    In Chapter 4, I characterize the risk/utility tradeoffs for an augmentation of synthetic data called fidelity measures (see Section 1.2.3). Fidelity measures were proposed in Reiter et al. (2009) to quantify the degree to which the results of an analysis performed on a released synthetic dataset match with the results of the same analysis performed on the confidential data. I compare the risk/utility of two different fidelity measures, the confidence interval overlap (Karr et al., 2006) and a new fidelity measure I call the mean predicted probability difference (MPPD). Simultaneously, I compare the risk/utility tradeoffs of two different private release mechanisms, LSRS and a heuristic release method called “safety zones”. I find that the confidence interval overlap can be applied to a wider variety of analyses and is more specific than MPPD, but MPPD is more robust to the influence of individual observations in the confidential data, which means it can be released with less noise than the confidence interval overlap with the same level of risk. I also find that while safety zones are much simpler to compute and generally have good utility (whereas the utility of LSRS depends on the value of ε), it is also much more vulnerable to context specific attacks that, while not easy for an intruder to implement, are difficult to anticipate.
%B Statistics Department
%I Duke University
%V PhD
%G eng
%U http://hdl.handle.net/10161/11365

%0 Conference Paper
%B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
%D 2015
%T The Role of Device Type and Respondent Characteristics in Internet Panel Survey Breakoff
%A Allan L. McCutcheon
%B 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
%C Hollywood, Florida
%G eng
%U http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx

%0 Report
%D 2014
%T Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization
%A Spielman, Seth
%A Folch, David
%X Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Spielman, Seth; Folch, David The American Community Survey (ACS) is the largest US survey of households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article develops a spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to post-process survey data in order to reduce the margins of error to some user-specified threshold.
%I University of Colorado at Boulder / University of Tennessee
%G eng
%U http://hdl.handle.net/1813/38121
%9 Preprint

%0 Conference Paper
%B Paper presented at the annual conference of the Midwest Association for Public Opinion Research
%D 2014
%T Remembering where: A look at the American Time Use Survey
%A Deal, C.
%A Cordova-Cazar, A.L.
%A Countryman, A.
%A Kirchner, A.
%A Belli, R.F.
%B Paper presented at the annual conference of the Midwest Association for Public Opinion Research
%C Chicago, IL
%8 11/2014
%G eng
%U http://www.mapor.org/conferences.html

%0 Journal Article
%J Behavior Research Methods
%D 2014
%T Reputation as a Sufficient Condition for Data Quality on Amazon Mechanical Turk
%A Peer, E.
%A Vosgerau, J.
%A Acquisti, A.
%B Behavior Research Methods
%V 46
%P 1023–1031
%8 December
%G eng

%0 Book Section
%B The Routledge Handbook of Poverty in the United States
%D 2014
%T The Rise of Incarceration Among the Poor with Mental Illnesses: How Neoliberal Policies Contribute
%A Camp, J.
%A Haymes, S.
%A Haymes, M. V. d.
%A Miller, R.J.
%B The Routledge Handbook of Poverty in the United States
%I Routledge
%G eng

%0 Conference Paper
%B Midwest Association for Public Opinion Research Annual Conference
%D 2014
%T The Role of Device Type in Internet Panel Survey Breakoff
%A McCutcheon, A.L.
%B Midwest Association for Public Opinion Research Annual Conference
%C Chicago, IL
%G eng
%U http://www.mapor.org/conferences.html

%0 Generic
%D 2013
%T Recent Advances in Spatial Methods for Federal Surveys
%A Holan, S.H.
%8 September
%G eng

%0 Report
%D 2013
%T Reconsidering the Consequences of Worker Displacements: Survey versus Administrative Measurements
%A Flaaen, Aaron
%A Shapiro, Matthew
%A Isaac Sorkin
%X Displaced workers suffer persistent earnings losses. This stark finding has been established by  following workers in administrative data after mass layoffs under the presumption that these are  involuntary job losses owing to economic distress. Using linked survey and administrative data,  this paper examines this presumption by matching worker-supplied reasons for separations with  what is happening at the firm. The paper documents substantially different earnings dynamics  in mass layoffs depending on the reason the worker gives for the separation. Using a new methodology  for accounting for the increase in the probability of separation among all types of survey  response during in a mass layoff, the paper finds earnings loss estimates that are surprisingly  close to those using only administrative data. Finally, the survey-administrative link allows the  decomposition of earnings losses due to subsequent nonemployment into non-participation and  unemployment. Including the zero earnings of those identified as being unemployed substantially  increases the estimate of earnings losses.
%I University of Michigan
%G eng
%U http://www-personal.umich.edu/~shapiro/papers/ReconsideringDisplacements.pdf
%9 mimeo

%0 Generic
%D 2013
%T A Reduced Rank Model for Analyzing Multivariate Spatial Datasets
%A Bradley, J.R.
%B University of Missouri-Kansas City
%I University of Missouri-Kansas City
%8 November
%G eng

%0 Journal Article
%J WebDB
%D 2013
%T Ringtail: a generalized nowcasting system.
%A Antenucci, Dolan
%A Li, Erdong
%A Liu, Shaobo
%A Zhang, Bochun
%A Cafarella, Michael J
%A Ré, Christopher
%X Social media nowcasting—using online user activity to de- scribe real-world phenomena—is an active area of research to supplement more traditional and costly data collection methods such as phone surveys. Given the potential impact of such research, we would expect general-purpose nowcast- ing systems to quickly become a standard tool among non- computer scientists, yet it has largely remained a research topic. We believe a major obstacle to widespread adoption is the nowcasting feature selection problem. Typical now- casting systems require the user to choose a handful of social media objects from a pool of billions of potential candidates, which can be a time-consuming and error-prone process.  We have built Ringtail, a nowcasting system that helps the user by automatically suggesting high-quality signals. We demonstrate that Ringtail can make nowcasting easier by suggesting relevant features for a range of topics. The user provides just a short topic query (e.g., unemployment) and a small conventional dataset in order for Ringtail to quickly return a usable predictive nowcasting model.
%B WebDB
%V 6
%P 1358-1361
%G eng
%U http://cs.stanford.edu/people/chrismre/papers/Ringtail-VLDB-demo.pdf
%& 1358

%0 Journal Article
%J WebDB
%D 2013
%T Ringtail: Feature Selection for Easier Nowcasting.
%A Antenucci, Dolan
%A Cafarella, Michael J
%A Levenstein, Margaret C.
%A Ré, Christopher
%A Shapiro, Matthew
%X In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have.  We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user.
%B WebDB
%P 49-54
%G eng
%U http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf
%& 49

%0 Journal Article
%J Social Service Review
%D 2013
%T Rising extreme poverty in the United States and the response of means-tested transfers.
%A H. Luke Shaefer
%A Edin, K.
%X This study documents an increase in the prevalence of extreme poverty among US households with children between 1996 and 2011 and assesses the response of major federal means-tested transfer programs. Extreme poverty is defined using a World Bank metric of global poverty: \$2 or less, per person, per day. Using the 1996–2008 panels of the Survey of Income and Program Participation (SIPP), we estimate that in mid-2011, 1.65 million households with 3.55 million children were living in extreme poverty in a given month, based on cash income, constituting 4.3 percent of all nonelderly households with children. The prevalence of extreme poverty has risen sharply since 1996, particularly among those most affected by the 1996 welfare reform. Adding SNAP benefits to household income reduces the number of extremely poor households with children by 48.0 percent in mid-2011. Adding SNAP, refundable tax credits, and housing subsidies reduces it by 62.8 percent.
%B Social Service Review
%V 87
%P 250-268
%8 06/2013
%G eng
%U http://www.jstor.org/stable/10.1086/671012
%N 2
%& 250
%R 10.1086/671012

%0 Journal Article
%J Applied Stochastic Models in Business and Industry
%D 2012
%T Rejoinder: An approach for identifying and predicting economic recessions in real time using time frequency functional models
%A Holan, S.
%A Yang, W.
%A Matteson, D.
%A Wikle, C.
%B Applied Stochastic Models in Business and Industry
%V 28
%P 504-505
%G eng
%U http://onlinelibrary.wiley.com/doi/10.1002/asmb.1955/full
%R 10.1002/asmb.1955

%0 Generic
%D 0
%T Relation of questionnaire navigation patterns and data quality: Keystroke data analysis
%A Lee, Jinyoung
%G eng

%0 Generic
%D 0
%T Respondent retrieval strategies inform the structure of autobiographical knowledge
%A Belli, R. F.
%G eng

%0 Generic
%D 0
%T Response Scales: Effects on Data Quality for Interviewer Administered Surveys
%A Sarwar, Mazen
%A Olson, Kristen
%A Smyth, Jolene
%G eng