Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong

How Will Statistical Agencies Operate When All Data Are Private Abowd, John M The dual problems of respecting citizen privacy and protecting the confidentiality of their data have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the information processed by the agency was “asked” in a context wholly outside the agency’s operations—blurring the distinction between what was asked and what is published. Already, private businesses like Microsoft, Google and Apple recognize that cybersecurity (safeguarding the integrity and access controls for internal data) and privacy protection (ensuring that what is published does not reveal too much about any person or business) are two sides of the same coin. This is a paradigm-shifting moment for statistical agencies.

PB - Cornell University VL - 7 UR - http://repository.cmu.edu/jpc/vol7/iss3/1/ IS - 3 ER - TY - JOUR T1 - Itemwise conditionally independent nonresponse modeling for incomplete multivariate data JF - Biometrika Y1 - 2017 A1 - M. Sadinle A1 - J.P. Reiter KW - Loglinear model KW - Missing not at random KW - Missingness mechanism KW - Nonignorable KW - Nonparametric saturated KW - Sensitivity analysis AB - We introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness mechanism, in that nonresponse for any item can depend on values of other items that are themselves missing. We show that, under this itemwise conditionally independent nonresponse assumption, one can define and identify nonparametric saturated classes of joint multivariate models for the study variables and their missingness indicators. We also show how to perform sensitivity analysis to violations of the conditional independence assumptions encoded by this missingness mechanism. Throughout, we illustrate the use of this modeling approach with data analyses. VL - 104 UR - https://doi.org/10.1093/biomet/asw063 IS - 1 ER - TY - JOUR T1 - Itemwise conditionally independent nonresponse modeling for multivariate categorical data JF - Biometrika Y1 - 2017 A1 - Sadinle, M. A1 - Reiter, J. P. KW - Identification KW - Missing not at random KW - Non-parametric saturated KW - Partial ignorability KW - Sensitivity analysis AB - With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose restrictions that make the models uniquely obtainable from the distribution of the observed data. We present an approach for constructing classes of identifiable nonignorable missing data models. The main idea is to use a sequence of carefully set up identifying assumptions, whereby we specify potentially different missingness mechanisms for different blocks of variables. We show that the procedure results in models with the desirable property of being non-parametric saturated. VL - 104 ER - TY - RPRT T1 - Making Confidential Data Part of Reproducible Research Y1 - 2017 A1 - Lars Vilhuber A1 - Carl Lagoze PB - Labor Dynamics Institute, Cornell University UR - http://digitalcommons.ilr.cornell.edu/ldi/41/ ER - TY - RPRT T1 - Making Confidential Data Part of Reproducible Research Y1 - 2017 A1 - Vilhuber, Lars A1 - Lagoze, Carl AB - Making Confidential Data Part of Reproducible Research Vilhuber, Lars; Lagoze, Carl Disclaimer and acknowledgements: While this column mentions the Census Bureau several times, any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the other statistical agencies mentioned herein. PB - Cornell University UR - http://hdl.handle.net/1813/52474 ER - TY - JOUR T1 - Making Confidential Data Part of Reproducible Research JF - Chance Y1 - 2017 A1 - Vilhuber, Lars A1 - Lagoze, Carl UR - http://chance.amstat.org/2017/09/reproducible-research/ ER - TY - JOUR T1 - Modeling Endogenous Mobility in Earnings Determination JF - Journal of Business & Economic Statistics Y1 - 2017 A1 - John M. Abowd A1 - Kevin L. Mckinney A1 - Ian M. Schmutte AB - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The mobility-bias corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. UR - http://dx.doi.org/10.1080/07350015.2017.1356727 ER - TY - RPRT T1 - Modeling Endogenous Mobility in Wage Determination Y1 - 2017 A1 - John M. Abowd A1 - Kevin L. Mckinney A1 - Ian M. Schmutte AB - We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. UR - http://digitalcommons.ilr.cornell.edu/ldi/28/ ER - TY - JOUR T1 - Multiple imputation of missing categorical and continuous outcomes via Bayesian mixture models with local dependence JF - Journal of the American Statistical Association Y1 - 2017 A1 - J. S. Murray A1 - J. P. Reiter KW - Hierarchical mixture model KW - Missing data KW - Nonparametric Bayes KW - Stick-breaking process AB - We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial distributions for categorical variables with Dirichlet process mixtures of multivariate normal distributions for continuous variables. We incorporate dependence between the continuous and categorical variables by (i) modeling the means of the normal distributions as component-specific functions of the categorical variables and (ii) forming distinct mixture components for the categorical and continuous data with probabilities that are linked via a hierarchical model. This structure allows the model to capture complex dependencies between the categorical and continuous data with minimal tuning by the analyst. We apply the model to impute missing values due to item nonresponse in an evaluation of the redesign of the Survey of Income and Program Participation (SIPP). The goal is to compare estimates from a field test with the new design to estimates from selected individuals from a panel collected under the old design. We show that accounting for the missing data changes some conclusions about the comparability of the distributions in the two datasets. We also perform an extensive repeated sampling simulation using similar data from complete cases in an existing SIPP panel, comparing our proposed model to a default application of multiple imputation by chained equations. Imputations based on the proposed model tend to have better repeated sampling properties than the default application of chained equations in this realistic setting. VL - 111 IS - 516 ER - TY - JOUR T1 - Multi-rubric Models for Ordinal Spatial Data with Application to Online Ratings from Yelp Y1 - 2017 A1 - Linero, A.R. A1 - Bradley, J.R. A1 - Desai, A. KW - Bayesian hierarchical model KW - Data augmentation KW - Nonparametric Bayes KW - ordinal data KW - recommender systems KW - spatial prediction. AB - Interest in online rating data has increased in recent years. Such data consists of ordinal ratings of products or local businesses provided by users of a website, such as \Yelp\ or \texttt{Amazon}. One source of heterogeneity in ratings is that users apply different standards when supplying their ratings; even if two users benefit from a product the same amount, they may translate their benefit into ratings in different ways. In this article we propose an ordinal data model, which we refer to as a multi-rubric model, which treats the criteria used to convert a latent utility into a rating as user-specific random effects, with the distribution of these random effects being modeled nonparametrically. We demonstrate that this approach is capable of accounting for this type of variability in addition to usual sources of heterogeneity due to item quality, user biases, interactions between items and users, and the spatial structure of the users and items. We apply the model developed here to publicly available data from the website \Yelp\ and demonstrate that it produces interpretable clusterings of users according to their rating behavior, in addition to providing better predictions of ratings and better summaries of overall item quality. UR - https://arxiv.org/abs/1706.03012 ER - TY - RPRT T1 - NCRN Meeting Spring 2017 Y1 - 2017 A1 - Vilhuber, Lars AB - NCRN Meeting Spring 2017 Vilhuber, Lars PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52163 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Y1 - 2017 A1 - Nissim, Kobbi A1 - Gasser, Urs A1 - Smith, Adam A1 - Vadhan, Salil A1 - O'Brien, David A1 - Wood, Alexandra AB - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52164 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Welcome Y1 - 2017 A1 - Vilhuber, Lars AB - NCRN Meeting Spring 2017: Welcome Vilhuber, Lars PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52163 ER - TY - RPRT T1 - NCRN Newsletter: Volume 3 - Issue 3 Y1 - 2017 A1 - Vilhuber, Lars A1 - Knight-Ingram, Dory AB - NCRN Newsletter: Volume 3 - Issue 3 Vilhuber, Lars; Knight-Ingram, Dory Overview of activities at NSF-Census Research Network nodes from December 2016 through February 2017. NCRN Newsletter Vol. 3, Issue 3: March 10, 2017 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/46686 ER - TY - RPRT T1 - NCRN Newsletter: Volume 3 - Issue 4 Y1 - 2017 A1 - Vilhuber, Lars A1 - Knight-Ingram, Dory AB - NCRN Newsletter: Volume 3 - Issue 4 Vilhuber, Lars; Knight-Ingram, Dory The NCRN Newsletter is published quarterly by the NCRN Coordinating Office. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52259 ER - TY - RPRT T1 - Presentation: Introduction to Stan for Markov Chain Monte Carlo Y1 - 2017 A1 - Simpson, Matthew AB - Presentation: Introduction to Stan for Markov Chain Monte Carlo Simpson, Matthew An introduction to Stan (http://mc-stan.org/): a probabilistic programming language that implements Hamiltonian Monte Carlo (HMC), variational Bayes, and (penalized) maximum likelihood estimation. Presentation given at the U.S. Census Bureau on April 25, 2017. PB - University of Missouri UR - http://hdl.handle.net/1813/52656 ER - TY - RPRT T1 - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy Y1 - 2017 A1 - Vilhuber, Lars A1 - Schmutte, Ian AB - Proceedings from the 2016 NSF–Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian On October 14, 2016, we hosted a workshop that brought together economists, survey statisticians, and computer scientists with expertise in the field of privacy preserving methods: Census Bureau staff working on implementing cutting-edge methods in the Bureau’s flagship public-use products mingled with academic researchers from a variety of universities. The four products discussed as part of the workshop were 1. the American Community Survey (ACS); 2. Longitudinal Employer-Household Data (LEHD), in particular the LEHD Origin-Destination Employment Statistics (LODES); the 3. 2020 Decennial Census; and the 4. 2017 Economic Census. The goal of the workshop was to 1. Discuss the specific challenges that have arisen in ongoing efforts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers 2. Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas. PB - Cornell University UR - http://hdl.handle.net/1813/46197 ER - TY - RPRT T1 - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy Y1 - 2017 A1 - Vilhuber, Lars A1 - Schmutte, Ian M. AB - Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy Vilhuber, Lars; Schmutte, Ian M. ese proceedings report on a workshop hosted at the U.S. Census Bureau on May 8, 2017. Our purpose was to gather experts from various backgrounds together to continue discussing the development of formal privacy systems for Census Bureau data products. is workshop was a successor to a previous workshop held in October 2016 (Vilhuber & Schmu e 2017). At our prior workshop, we hosted computer scientists, survey statisticians, and economists, all of whom were experts in data privacy. At that time we discussed the practical implementation of cu ing-edge methods for publishing data with formal, provable privacy guarantees, with a focus on applications to Census Bureau data products. e teams developing those applications were just starting out when our rst workshop took place, and we spent our time brainstorming solutions to the various problems researchers were encountering, or anticipated encountering. For these cu ing-edge formal privacy models, there had been very li le e ort in the academic literature to apply those methods in real-world se ings with large, messy data. We therefore brought together an expanded group of specialists from academia and government who could shed light on technical challenges, subject ma er challenges and address how data users might react to changes in data availability and publishing standards. In May 2017, we organized a follow-up workshop, which these proceedings report on. We reviewed progress made in four di erent areas. e four topics discussed as part of the workshop were 1. the 2020 Decennial Census; 2. the American Community Survey (ACS); 3. the 2017 Economic Census; 4. measuring the demand for privacy and for data quality. As in our earlier workshop, our goals were to 1. Discuss the speci c challenges that have arisen in ongoing e orts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers; 2. Produce short wri en memos that summarize concrete suggestions for practical applications to speci c Census Bureau priority areas. Comments can be provided at h ps://goo.gl/ZAh3YE PB - Cornell University UR - http://hdl.handle.net/1813/52473 ER - TY - RPRT T1 - Proceedings from the Synthetic LBD International Seminar Y1 - 2017 A1 - Vilhuber, Lars A1 - Kinney, Saki A1 - Schmutte, Ian M. AB - Proceedings from the Synthetic LBD International Seminar Vilhuber, Lars; Kinney, Saki; Schmutte, Ian M. On May 9, 2017, we hosted a seminar to discuss the conditions necessary to implement the SynLBD approach with interested parties, with the goal of providing a straightforward toolkit to implement the same procedure on other data. The proceedings summarize the discussions during the workshop. PB - Cornell University UR - http://hdl.handle.net/1813/52472 ER - TY - RPRT T1 - Recalculating - How Uncertainty in Local Labor Market Definitions Affects Empirical Findings Y1 - 2017 A1 - Foote, Andrew A1 - Kutzbach, Mark J. A1 - Vilhuber, Lars AB - Recalculating - How Uncertainty in Local Labor Market Definitions Affects Empirical Findings Foote, Andrew; Kutzbach, Mark J.; Vilhuber, Lars This paper evaluates the use of commuting zones as a local labor market definition. We revisit Tolbert and Sizer (1996) and demonstrate the sensitivity of definitions to two features of the methodology. We show how these features impact empirical estimates using a well-known application of commuting zones. We conclude with advice to researchers using commuting zones on how to demonstrate the robustness of empirical findings to uncertainty in definitions. The analysis, conclusions, and opinions expressed herein are those of the author(s) alone and do not necessarily represent the views of the U.S. Census Bureau or the Federal Deposit Insurance Corporation. All results have been reviewed to ensure that no confidential information is disclosed, and no confidential data was used in this paper. This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Much of the work developing this paper occurred while Mark Kutzbach was an employee of the U.S. Census Bureau. PB - Cornell University UR - http://hdl.handle.net/1813/52649 ER - TY - JOUR T1 - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error JF - Journal of the Royal Statistical Society -- Series B. Y1 - 2017 A1 - Bradley, J.R. A1 - Wikle, C.K. A1 - Holan, S.H. KW - American Community Survey KW - empirical orthogonal functions KW - MAUP KW - Reduced rank KW - Spatial basis functions KW - Survey data AB - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds. UR - https://arxiv.org/abs/1502.01974 ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - John M. Abowd A1 - Ian M. Schmutte AB - We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial. JF - Labor Dynamics Institute Document UR - http://digitalcommons.ilr.cornell.edu/ldi/37/ ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial. A complete archive of the data and programs used in this paper is available via http://doi.org/10.5281/zenodo.345385. PB - Cornell University UR - http://hdl.handle.net/1813/39081 ER - TY - RPRT T1 - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2017 A1 - Abowd, John A1 - Schmutte, Ian M. AB - Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John; Schmutte, Ian M. We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently under-supplied by a private provider. Solving the appropriate social planner’s problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52612 ER - TY - JOUR T1 - The role of statistical disclosure limitation in total survey error JF - Total Survey Error in Practice Y1 - 2017 A1 - A. F. Karr KW - big data issues KW - data quality KW - data swapping KW - decision quality KW - risk-utility paradigms KW - Statistical Disclosure Limitation KW - total survey error AB - This chapter presents the thesis, which is statistical disclosure limitation (SDL) that ought to be viewed as an integral component of total survey error (TSE). TSE and SDL will move forward together, but integrating multiple criteria: cost, risk, data quality, and decision quality. The chapter explores the value of unifying two key TSE procedures - editing and imputation - with SDL. It discusses “Big data” issues, which contains a mathematical formulation that, at least conceptually and at some point in the future, does unify TSE and SDL. Modern approaches to SDL are based explicitly or implicitly on tradeoffs between disclosure risk and data utility. There are three principal classes of SDL methods: reduction/coarsening techniques; perturbative methods; and synthetic data methods. Data swapping is among the most frequently applied SDL methods for categorical data. The chapter sketches how it can be informed by knowledge of TSE. ER - TY - ABST T1 - Sequential Prediction of Respondent Behaviors Leading to Error in Web-based Surveys Y1 - 2017 A1 - Eck, Adam A1 - Soh, Leen-Kiat ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2017 A1 - John M. Abowd A1 - Francis Kramarz A1 - Sebastien Perez-Duarte A1 - Ian M. Schmutte AB - We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Labor Dynamics Institute UR - http://digitalcommons.ilr.cornell.edu/ldi/40/ ER - TY - JOUR T1 - Stop or continue data collection: A nonignorable missing data approach for continuous variables JF - Journal of Official Statistics Y1 - 2017 A1 - T. Paiva A1 - J. P. Reiter AB - We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents' data, and adjust the probabilities of the mixture components to generate nonrespondents' distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures. ER - TY - RPRT T1 - Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files Y1 - 2017 A1 - Green, Andrew A1 - Kutzbach, Mark J. A1 - Vilhuber, Lars AB - Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files Green, Andrew; Kutzbach, Mark J.; Vilhuber, Lars Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households’ responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets. PB - Cornell University UR - http://hdl.handle.net/1813/52611 ER - TY - RPRT T1 - Unique Entity Estimation with Application to the Syrian Conflict Y1 - 2017 A1 - Chen, B. A1 - Shrivastava, A. A1 - Steorts, R. C. KW - Computer Science - Data Structures and Algorithms KW - Computer Science - Databases KW - Statistics - Applications AB - Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the data generation process, error rates, and computational scalability that make it a difficult task for real applications. In this paper, we focus on a related problem of unique entity estimation, which is the task of estimating the unique number of entities and associated standard errors in a data set with duplicate entities. Unique entity estimation shares many fundamental challenges of entity resolution, namely, that the computational cost of all-to-all entity comparisons is intractable for large databases. To circumvent this computational barrier, we propose an efficient (near-linear time) estimation algorithm based on locality sensitive hashing. Our estimator, under realistic assumptions, is unbiased and has provably low variance compared to existing random sampling based approaches. In addition, we empirically show its superiority over the state-of-the-art estimators on three real applications. The motivation for our work is to derive an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict. Our methodology, when applied to the Syrian data set, provides an estimate of $191,874 \pm 1772$ documented, identifiable deaths, which is very close to the Human Rights Data Analysis Group (HRDAG) estimate of 191,369. Our work provides an example of challenges and efforts involved in solving a real, noisy challenging problem where modeling assumptions may not hold. JF - arXiv UR - https://arxiv.org/abs/1710.02690 ER - TY - JOUR T1 - Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics JF - Proceedings of the 2017 ACM International Conference on Management of Data Y1 - 2017 A1 - Samuel Haney A1 - Ashwin Machanavajjhala A1 - John M. Abowd A1 - Matthew Graham A1 - Mark Kutzbach AB - National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniques that do not reveal the exact characteristics of particular employers and employees, but lack provable privacy guarantees limiting inferential disclosures. In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. We show that state-of-the-art differentially private algorithms add too much noise for the output to be useful. Instead, we identify the privacy requirements mandated by current interpretations of the relevant laws, and formalize them using the Pufferfish framework. We then develop new privacy definitions that are customized to ER-EE data and satisfy the statutory privacy requirements. We implement the experiments in this paper on production data gathered by the U.S. Census Bureau. An empirical evaluation of utility for these data shows that for reasonable values of the privacy-loss parameter ε≥ 1, the additive error introduced by our provably private algorithms is comparable, and in some cases better, than the error introduced by existing SDL techniques that have no provable privacy guarantees. For some complex queries currently published, however, our algorithms do not have utility comparable to the existing traditional SDL algorithms. Those queries are fodder for future research. SN - 978-1-4503-4197-4 UR - http://dl.acm.org/citation.cfm?doid=3035918.3035940 ER - TY - RPRT T1 - Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics Y1 - 2017 A1 - Haney, Samuel A1 - Machanavajjhala, Ashwin A1 - Abowd, John M A1 - Graham, Matthew A1 - Kutzbach, Mark A1 - Vilhuber, Lars AB - Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics Haney, Samuel; Machanavajjhala, Ashwin; Abowd, John M; Graham, Matthew; Kutzbach, Mark; Vilhuber, Lars National statistical agencies around the world publish tabular summaries based on combined employeremployee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniques that do not reveal the exact characteristics of particular employers and employees, but lack provable privacy guarantees limiting inferential disclosures. In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. We show that state-of-the-art differentially private algorithms add too much noise for the output to be useful. Instead, we identify the privacy requirements mandated by current interpretations of the relevant laws, and formalize them using the Pufferfish framework. We then develop new privacy definitions that are customized to ER-EE data and satisfy the statutory privacy requirements. We implement the experiments in this paper on production data gathered by the U.S. Census Bureau. An empirical evaluation of utility for these data shows that for reasonable values of the privacy-loss parameter ϵ≥1, the additive error introduced by our provably private algorithms is comparable, and in some cases better, than the error introduced by existing SDL techniques that have no provable privacy guarantees. For some complex queries currently published, however, our algorithms do not have utility comparable to the existing traditional PB - Cornell University UR - http://hdl.handle.net/1813/49652 ER - TY - JOUR T1 - Visualizing uncertainty in areal data estimates with bivariate choropleth maps, map pixelation, and glyph rotation JF - Stat Y1 - 2017 A1 - Lucchesi, L.R. A1 - Wikle, C.K. AB - In statistics, we quantify uncertainty to help determine the accuracy of estimates, yet this crucial piece of information is rarely included on maps visualizing areal data estimates. We develop and present three approaches to include uncertainty on maps: (1) the bivariate choropleth map repurposed to visualize uncertainty; (2) the pixelation of counties to include values within an estimate's margin of error; and (3) the rotation of a glyph, located at a county's centroid, to represent an estimate's uncertainty. The second method is presented as both a static map and visuanimation. We use American Community Survey estimates and their corresponding margins of error to demonstrate the methods and highlight the importance of visualizing uncertainty in areal data. An extensive online supplement provides the R code necessary to produce the maps presented in this article as well as alternative versions of them. VL - 6 UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.150/abstract IS - 1 ER - TY - RPRT T1 - 2017 Economic Census: Towards Synthetic Data Sets Y1 - 2016 A1 - Caldwell, Carol A1 - Thompson, Katherine Jenny AB - 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52165 ER - TY - JOUR T1 - Assessing disclosure risks for synthetic data with arbitrary intruder knowledge JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - McClure, D. A1 - Reiter , J. P. KW - confidentiality KW - Disclosure KW - risk KW - synthetic AB - Several statistical agencies release synthetic microdata, i.e., data with all confidential values replaced with draws from statistical models, in order to protect data subjects' confidentiality. While fully synthetic data are safe from record linkage attacks, intruders might be able to use the released synthetic values to estimate confidential values for individuals in the collected data. We demonstrate and investigate this potential risk using two simple but informative scenarios: a single continuous variable possibly with outliers, and a three-way contingency table possibly with small counts in some cells. Beginning with the case that the intruder knows all but one value in the confidential data, we examine the effect on risk of decreasing the number of observations the intruder knows beforehand. We generally find that releasing synthetic data (1) can pose little risk to records in the middle of the distribution, and (2) can pose some risks to extreme outliers, although arguably these risks are mild. We also find that the effect of removing observations from an intruder's background knowledge heavily depends on how well that intruder can fill in those missing observations: the risk remains fairly constant if he/she can fill them in well, and drops quickly if he/she cannot. VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji957 IS - 1 ER - TY - JOUR T1 - A Bayesian nonparametric Markovian model for nonstationary time series JF - Statistics and Computing Y1 - 2016 A1 - De Yoreo, M. A1 - Kottas, A. KW - Autoregressive Models KW - Bayesian Nonparametrics KW - Dirichlet Process Mixtures KW - Markov chain Monte Carlo KW - Nonstationarity KW - Time Series AB - Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture nonstandard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This implies a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, nonstationary, Markovian model for real-valued data indexed in discrete-time. To obtain a more computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest the model is able to recover challenging transition and predictive densities. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed. ER - TY - JOUR T1 - A Bayesian Approach to Graphical Record Linkage and Deduplication JF - Journal of the American Statistical Association Y1 - 2016 A1 - Rebecca C. Steorts A1 - Rob Hall A1 - Stephen E. Fienberg AB - ABSTRACTWe propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online. VL - 111 UR - http://dx.doi.org/10.1080/01621459.2015.1105807 ER - TY - JOUR T1 - Bayesian Hierarchical Models with Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family JF - Journal of the American Statistical Association - T&M. Y1 - 2016 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K. AB - We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce something we call the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. The results in this manuscript are extremely general, and can be adapted to many different settings. We demonstrate the proposed methodology through simulated examples and analyses based on estimates obtained from the US Census Bureaus' American Community Survey (ACS). UR - https://arxiv.org/abs/1701.07506 ER - TY - JOUR T1 - Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples JF - Annals of Applied Statistics Y1 - 2016 A1 - Y. Si A1 - J. P. Reiter A1 - D. S. Hillygus VL - 10 UR - http://projecteuclid.org/euclid.aoas/1458909910 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis JF - Bayesian Analysis Y1 - 2016 A1 - Yang, W.H. A1 - Holan, S.H. A1 - Wikle, C.K. AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - https://arxiv.org/abs/1408.2757 ER - TY - RPRT T1 - Bayesian mixture modeling for multivariate conditional distributions Y1 - 2016 A1 - Maria DeYoreo A1 - Jerome P. Reiter AB - We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The model uses multivariate normal and categorical mixture kernels for the random variables. It induces dependence between the random and fixed variables through the means of the multivariate normal mixture kernels and via a truncated local Dirichlet process. The latter encourages observations with similar values of the fixed variables to share mixture components. Using a simulation of data fusion, we illustrate that the model can estimate underlying relationships in the data and the distributions of the missing values more accurately than a mixture model applied to the random and fixed variables jointly. We use the model to analyze consumers' reading behaviors using a quota sample, i.e., a sample where the empirical distribution of some variables is fixed by design and so should not be modeled as random, conducted by the book publisher HarperCollins. PB - ArXiv UR - http://arxiv.org/abs/1606.04457 ER - TY - RPRT T1 - A Bayesian nonparametric Markovian model for nonstationary time series Y1 - 2016 A1 - Maria DeYoreo A1 - Athanasios Kottas AB - Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture nonstandard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This implies a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, nonstationary, Markovian model for real-valued data indexed in discrete-time. To obtain a more computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest the model is able to recover challenging transition and predictive densities. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed. PB - ArXiv UR - http://arxiv.org/abs/1601.04331 ER - TY - JOUR T1 - A Bayesian Partial Identification Approach to Inferring the Prevalence of Accounting Misconduct JF - Journal of the American Statistical Association Y1 - 2016 A1 - P. R. Hahn A1 - J. S. Murray A1 - I. Manolopoulou AB - This article describes the use of flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data are available—inferring the prevalence of accounting misconduct among publicly traded U.S. businesses. Supplementary materials for this article are available online. VL - 111 UR - http://www.tandfonline.com/doi/full/10.1080/01621459.2015.1084307 IS - 513 ER - TY - JOUR T1 - Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data JF - Journal of the American Statistical Association Y1 - 2016 A1 - Daniel Manrique-Vallier A1 - Jerome P. Reiter AB - In categorical data, it is typically the case that some combinations of variables are theoretically impossible, such as a three year old child who is married or a man who is pregnant. In practice, however, reported values often include such structural zeros due to, for example, respondent mistakes or data processing errors. To purge data of such errors, many statistical organizations use a process known as edit-imputation. The basic idea is first to select reported values to change according to some heuristic or loss function, and second to replace those values with plausible imputations. This two-stage process typically does not fully utilize information in the data when determining locations of errors, nor does it appropriately reflect uncertainty resulting from the edits and imputations. We present an alternative approach to editing and imputation for categorical microdata with structural zeros that addresses these shortcomings. Specifically, we use a Bayesian hierarchical model that couples a stochastic model for the measurement error process with a Dirichlet process mixture of multinomial distributions for the underlying, error free values. The latter model is restricted to have support only on the set of theoretically possible combinations. We illustrate this integrated approach to editing and imputation using simulation studies with data from the 2000 U. S. census, and compare it to a two-stage edit-imputation routine. Supplementary material is available online. UR - http://dx.doi.org/10.1080/01621459.2016.1231612 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2016 A1 - Bradley, J.R. A1 - Wikle, C.K. A1 - Holan, S.H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - https://arxiv.org/abs/1405.7227 ER - TY - JOUR T1 - Categorical data fusion using auxiliary information JF - Annals of Applied Statistics Y1 - 2016 A1 - B. K. Fosdick A1 - M. De Yoreo A1 - J. P. Reiter KW - Imputation KW - Integration KW - Latent Class KW - Matching AB - In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher HarperCollins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people's preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion. VL - 10 UR - http://projecteuclid.org/euclid.aoas/1483606845 ER - TY - JOUR T1 - Computation of the Autocovariances for Time Series with Multiple Long-Range Persistencies JF - Computational Statistics and Data Analysis Y1 - 2016 A1 - McElroy, T.S. A1 - Holan, S.H. AB - Gegenbauer processes allow for flexible and convenient modeling of time series data with multiple spectral peaks, where the qualitative description of these peaks is via the concept of cyclical long-range dependence. The Gegenbauer class is extensive, including ARFIMA, seasonal ARFIMA, and GARMA processes as special cases. Model estimation is challenging for Gegenbauer processes when multiple zeros and poles occur in the spectral density, because the autocovariance function is laborious to compute. The method of splitting–essentially computing autocovariances by convolving long memory and short memory dynamics–is only tractable when a single long memory pole exists. An additive decomposition of the spectrum into a sum of spectra is proposed, where each summand has a single singularity, so that a computationally efficient splitting method can be applied to each term and then aggregated. This approach differs from handling all the poles in the spectral density at once, via an analysis of truncation error. The proposed technique allows for fast estimation of time series with multiple long-range dependences, which is illustrated numerically and through several case-studies. UR - http://www.sciencedirect.com/science/article/pii/S0167947316300202 ER - TY - ABST T1 - Data management and analytic use of paradata: SIPP-EHC audit trails Y1 - 2016 A1 - Lee, Jinyoung A1 - Seloske, Ben A1 - Córdova Cazar, Ana Lucía A1 - Eck, Adam A1 - Kirchner, Antje A1 - Belli, Robert F. ER - TY - JOUR T1 - Differentially private publication of data on wages and job mobility JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Schmutte, Ian M. KW - Demand for public statistics KW - differential privacy KW - job mobility KW - matched employer-employee data KW - optimal confidentiality protection KW - optimal data accuracy KW - technology for statistical agencies AB - Brazil, like many countries, is reluctant to publish business-level data, because of legitimate concerns about the establishments' confidentiality. A trusted data curator can increase the utility of data, while managing the risk to establishments, either by releasing synthetic data, or by infusing noise into published statistics. This paper evaluates the application of a differentially private mechanism to publish statistics on wages and job mobility computed from Brazilian employer-employee matched data. The publication mechanism can result in both the publication of specific statistics as well as the generation of synthetic data. I find that the tradeoff between the privacy guaranteed to individuals in the data, and the accuracy of published statistics, is potentially much better that the worst-case theoretical accuracy guarantee. However, the synthetic data fare quite poorly in analyses that are outside the set of queries to which it was trained. Note that this article only explores and characterizes the feasibility of these publication strategies, and will not directly result in the publication of any data. VL - 32 UR - http://content.iospress.com/articles/statistical-journal-of-the-iaos/sji962 IS - 1 ER - TY - RPRT T1 - Differentially Private Verification of Regression Model Results Y1 - 2016 A1 - Reiter, Jerry AB - Differentially Private Verification of Regression Model Results Reiter, Jerry PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52167 ER - TY - JOUR T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors JF - Survey Practice Y1 - 2016 A1 - Olson, Kristen A1 - Kirchner, Antje A1 - Smyth, Jolene D. AB - Interviewers are required to be flexible in responding to respondent concerns during recruitment, but standardized during administration of the questionnaire. These skill sets may be at odds. Recent research has shown a U-shaped relationship between interviewer cooperation rates and interviewer variance: the least and the most successful interviewers during recruitment have the largest interviewer variance components. Little is known about why this association occurs. We posit four hypotheses for this association: 1) interviewers with higher cooperation rates more conscientious interviewers altogether, 2) interviewers with higher cooperation rates continue to use rapport behaviors from the cooperation request throughout an interview, 3) interviewers with higher cooperation rates display more confidence which translates into different interview behavior, and 4) interviewers with higher cooperation rates continue their flexible interviewing style throughout the interview and deviate more from standardized interviewing. We use behavior codes from the Work and Leisure Today Survey (n=450, AAPOR RR3=6.3%) to evaluate interviewer behavior. Our results largely support the confidence hypothesis. Interviewers with higher cooperation rates do not show evidence of being “better” interviewers. VL - 9 UR - http://www.surveypractice.org/index.php/SurveyPractice/article/view/351 IS - 2 ER - TY - RPRT T1 - Estimating Compensating Wage Differentials with Endogenous Job Mobility Y1 - 2016 A1 - Kurt Lavetti A1 - Ian M. Schmutte AB - We demonstrate a strategy for using matched employer-employee data to correct endogenous job mobility bias when estimating compensating wage differentials. Applied to fatality rates in the census of formal-sector jobs in Brazil between 2003-2010, we show why common approaches to eliminating ability bias can greatly amplify endogenous job mobility bias. By extending the search-theoretic hedonic wage frame- work, we establish conditions necessary to interpret our estimates as preferences. We present empirical analyses supporting the predictions of the model and identifying conditions, demonstrating that the standard models are misspecified, and that our proposed model eliminates latent ability and endogenous mobility biases. UR - http://digitalcommons.ilr.cornell.edu/ldi/29/ ER - TY - JOUR T1 - Generating Partially Synthetic Geocoded Public Use Data with Decreased Disclosure Risk Using Differential Smoothing JF - Journal of the Royal Statistical Society - Series A Y1 - 2016 A1 - Quick, H. A1 - Holan, S.H. A1 - Wikle, C.K. AB - When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies prior to making data publicly available due to data privacy obligations. An alternative to releasing aggregated and/or perturbed data is to release multiply-imputed synthetic data, where sensitive values are replaced with draws from statistical models designed to capture important distributional features in the collected data. One issue that has received relatively little attention, however, is how to handle spatially outlying observations in the collected data, as common spatial models often have a tendency to overfit these observations. The goal of this work is to bring this issue to the forefront and propose a solution, which we refer to as "differential smoothing." After implementing our method on simulated data, highlighting the effectiveness of our approach under various scenarios, we illustrate the framework using data consisting of sale prices of homes in San Francisco. UR - https://arxiv.org/abs/1507.05529 ER - TY - RPRT T1 - Hours Off the Clock Y1 - 2016 A1 - Green, Andrew AB - Hours Off the Clock Green, Andrew To what extent do workers work more hours than they are paid for? The relationship between hours worked and hours paid, and the conditions under which employers can demand more hours “off the clock,” is not well understood. The answer to this question impacts worker welfare, as well as wage and hour regulation. In addition, work off the clock has important implications for the measurement and cyclical movement of productivity and wages. In this paper, I construct a unique administrative dataset of hours paid by employers linked to a survey of workers on their reported hours worked to measure work off the clock. Using cross-sectional variation in local labor markets, I find only a small cyclical component to work off the clock. The results point to labor hoarding rather than efficiency wage theory, indicating work off the clock cannot explain the counter-cyclical movement of productivity. I find workers employed by small firms, and in industries with a high rate of wage and hour violations are associated with larger differences in hours worked than hours paid. These findings suggest the importance of tracking hours of work for enforcement of labor regulations. PB - Cornell University UR - http://hdl.handle.net/1813/52610 ER - TY - JOUR T1 - How Should We Define Low-Wage Work? An Analysis Using the Current Population Survey JF - Monthly Labor Review Y1 - 2016 A1 - Fusaro, V. A1 - Shaefer, H. Luke AB - Low-wage work is a central concept in considerable research, yet it lacks an agreed-upon definition. Using data from the Current Population Survey’s Annual Social and Economic Supplement, the analysis presented in this article suggests that defining low-wage work on the basis of alternative hourly wage cutoffs changes the size of the low-wage population, but does not noticeably alter time trends in the rate of change. The analysis also indicates that different definitions capture groups of workers with substantively different demographic, social, and economic characteristics. Although the individuals in any of the categories examined might reasonably be considered low-wage workers, a single definition obscures these distinctions. UR - http://www.bls.gov/opub/mlr/2016/article/pdf/how-should-we-define-low-wage-work.pdf ER - TY - RPRT T1 - How Will Statistical Agencies Operate When All Data Are Private? Y1 - 2016 A1 - Abowd, John M. AB - How Will Statistical Agencies Operate When All Data Are Private? Abowd, John M. The dual problems of respecting citizen privacy and protecting the confidentiality of their data have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the information processed by the agency was “asked” in a context wholly outside the agency’s operations—blurring the distinction between what was asked and what is published. Already, private businesses like Microsoft, Google and Apple recognize that cybersecurity (safeguarding the integrity and access controls for internal data) and privacy protection (ensuring that what is published does not reveal too much about any person or business) are two sides of the same coin. This is a paradigm-shifting moment for statistical agencies. PB - Cornell University UR - http://hdl.handle.net/1813/44663 ER - TY - JOUR T1 - Incorporating marginal prior information into latent class models JF - Bayesian Analysis Y1 - 2016 A1 - Schifeling, T. S. A1 - Reiter, J. P. VL - 11 UR - https://projecteuclid.org/euclid.ba/1434649584 ER - TY - JOUR T1 - Measuring Poverty Using the Supplemental Poverty Measure in the Panel Study of Income Dynamics, 1998 to 2010 JF - Journal of Economic and Social Measurement Y1 - 2016 A1 - Kimberlin, S. A1 - Shaefer, H.L. A1 - Kim, J. AB - The Supplemental Poverty Measure (SPM) was recently introduced by the U.S. Census Bureau as an alternative measure of poverty that addresses many shortcomings of the official poverty measure (OPM) to better reflect the resources households have available to meet their basic needs. The Census SPM is available only in the Current Population Survey (CPS). This paper describes a method for constructing SPM poverty estimates in the Panel Study of Income Dynamics (PSID), for the biennial years 1998 through 2010. A public-use dataset of individual-level SPM status produced in this analysis will be available for download on the PSID website. Annual SPM poverty estimates from the PSID are presented for the years 1998, 2000, 2002, 2004, 2006, 2008, and 2010 and compared to SPM estimates for the same years derived from CPS data by the Census Bureau and independent researchers. We find that SPM poverty rates in the PSID are somewhat lower than those found in the CPS, though trends over time and impact of specific SPM components are similar across the two datasets. VL - 41 UR - http://content.iospress.com/articles/journal-of-economic-and-social-measurement/jem425 IS - 1 ER - TY - ABST T1 - Mismatches Y1 - 2016 A1 - Smyth, Jolene A1 - Olson, Kristen ER - TY - RPRT T1 - Modeling Endogenous Mobility in Earnings Determination Y1 - 2016 A1 - Abowd, John M. A1 - McKinney, Kevin L. A1 - Schmutte, Ian M. AB - Modeling Endogenous Mobility in Earnings Determination Abowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. Replication code can be found at DOI: http://doi.org/10.5281/zenodo.zenodo.376600 and our Github repository endogenous-mobility-replication . PB - Cornell University UR - http://hdl.handle.net/1813/40306 ER - TY - JOUR T1 - Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence JF - Journal of the American Statistical Association Y1 - 2016 A1 - Jared S. Murray A1 - Jerome P. Reiter AB - We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial distributions for categorical variables with Dirichlet process mixtures of multivariate normal distributions for continuous variables. We incorporate dependence between the continuous and categorical variables by (i) modeling the means of the normal distributions as component-specific functions of the categorical variables and (ii) forming distinct mixture components for the categorical and continuous data with probabilities that are linked via a hierarchical model. This structure allows the model to capture complex dependencies between the categorical and continuous data with minimal tuning by the analyst. We apply the model to impute missing values due to item nonresponse in an evaluation of the redesign of the Survey of Income and Program Participation (SIPP). The goal is to compare estimates from a field test with the new design to estimates from selected individuals from a panel collected under the old design. We show that accounting for the missing data changes some conclusions about the comparability of the distributions in the two datasets. We also perform an extensive repeated sampling simulation using similar data from complete cases in an existing SIPP panel, comparing our proposed model to a default application of multiple imputation by chained equations. Imputations based on the proposed model tend to have better repeated sampling properties than the default application of chained equations in this realistic setting. UR - http://dx.doi.org/10.1080/01621459.2016.1174132 ER - TY - JOUR T1 - Multivariate Spatio-Temporal Survey Fusion with Application to the American Community Survey and Local Area Unemployment Statistics JF - Stat Y1 - 2016 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K AB - There are often multiple surveys available that estimate and report related demographic variables of interest that are referenced over space and/or time. Not all surveys produce the same information, and thus, combining these surveys typically leads to higher quality estimates. That is, not every survey has the same level of precision nor do they always provide estimates of the same variables. In addition, various surveys often produce estimates with incomplete spatio-temporal coverage. By combining surveys using a Bayesian approach, we can account for different margins of error and leverage dependencies to produce estimates of every variable considered at every spatial location and every time point. Specifically, our strategy is to use a hierarchical modelling approach, where the first stage of the model incorporates the margin of error associated with each survey. Then, in a lower stage of the hierarchical model, the multivariate spatio-temporal mixed effects model is used to incorporate multivariate spatio-temporal dependencies of the processes of interest. We adopt a fully Bayesian approach for combining surveys; that is, given all of the available surveys, the conditional distributions of the latent processes of interest are used for statistical inference. To demonstrate our proposed methodology, we jointly analyze period estimates from the US Census Bureau's American Community Survey, and estimates obtained from the Bureau of Labor Statistics Local Area Unemployment Statistics program. Copyright © 2016 John Wiley & Sons, Ltd. UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.120/full ER - TY - RPRT T1 - NCRN Meeting Fall 2016 Y1 - 2016 A1 - Vilhuber, Lars AB - NCRN Meeting Fall 2016 Vilhuber, Lars Taken place at the U.S. Census Bureau HQ, Washington DC. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45885 ER - TY - RPRT T1 - NCRN Meeting Fall 2016: Audit Trails, Parallel Navigation, and the SIPP Y1 - 2016 A1 - Lee, Jinyoung AB - NCRN Meeting Fall 2016: Audit Trails, Parallel Navigation, and the SIPP Lee, Jinyoung Thanks to Dr. Robert Belli, Ana Lucía Córdova Cazar, and Ben Seloske for the team effort. PB - University of Nebraska UR - http://hdl.handle.net/1813/45823 ER - TY - RPRT T1 - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach Y1 - 2016 A1 - Redding, Stephen J. A1 - Weinstein, David E. AB - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach Redding, Stephen J.; Weinstein, David E. PB - University of Michigan UR - http://hdl.handle.net/1813/45821 ER - TY - RPRT T1 - NCRN Meeting Spring 2016 Y1 - 2016 A1 - Vilhuber, Lars AB - NCRN Meeting Spring 2016 Vilhuber, Lars Taken place at U.S. Census Bureau HQ, Washington DC. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45899 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits Y1 - 2016 A1 - Spencer, Bruce D. AB - NCRN Meeting Spring 2016: A 2016 View of 2020 Census Quality, Costs, Benefits Spencer, Bruce D. Census costs affect data quality and data quality affects census benefits. Although measuring census data quality is difficult enough ex post, census planning requires it to be done well in advance. The topic of this talk is the prediction of the cost-quality curve, its uncertainty, and its relation to benefits from census data. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - Northwestern University UR - http://hdl.handle.net/1813/43897 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms Y1 - 2016 A1 - Brandimarte, Laura A1 - Chiew, Ernest A1 - Ventura, Sam A1 - Acquisti, Alessandro AB - NCRN Meeting Spring 2016: Attitudes Towards Geolocation-Enabled Census Forms Brandimarte, Laura; Chiew, Ernest; Ventura, Sam; Acquisti, Alessandro Geolocation refers to the automatic identification of the physical locations of Internet users. In an online survey experiment, we studied respondent reactions towards different types of geolocation. After coordinating with US Census Bureau researchers, we designed and administered a replica of a census form to a sample of respondents. We also created slightly different forms by manipulating the type of geolocation implemented. Using the IP address of each respondent, we approximated the geographical coordinates of the respondent and displayed this location on a map on the survey. Across different experimental conditions, we manipulated the map interface between the three interfaces on the Google Maps API: default road map, Satellite View, and Street View. We also provided either a specific, pinpointed location, or a set of two circles of 1- and 2-miles radius. Snapshots of responses were captured at every instant information was added, altered, or deleted by respondents when completing the survey. We measured willingness to provide information on the typical Census form, as well as privacy concerns associated with geolocation technologies and attitudes towards the use of online geographical maps to identify one’s exact current location. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - Carnegie-Mellon University UR - http://hdl.handle.net/1813/43889 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study Y1 - 2016 A1 - Mccue, Kristin A1 - Abowd, John A1 - Levenstein, Margaret A1 - Patki, Dhiren A1 - Rodgers, Ann A1 - Shapiro, Matthew A1 - Wasi, Nada AB - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Michigan UR - http://hdl.handle.net/1813/43895 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata Y1 - 2016 A1 - Córdova Cazar, Ana Lucía A1 - Belli, Robert AB - NCRN Meeting Spring 2016: Evaluating Data quality in Time Diary Surveys Using Paradata Córdova Cazar, Ana Lucía; Belli, Robert Over the past decades, time use researchers have been increasingly interested in analyzing wellbeing in tandem with the use of time (Juster and Stafford, 1985; Krueger et al, 2009). Many methodological issues have arose in this endeavor, including the concern about the quality of the time use data. Survey researchers have increasingly turned to the analysis of paradata to better understand and model data quality. In particular, it has been argued that paradata may serve as proxy of the respondents’ cognitive response process, and can be used as an additional tool to assess the impact of data generation on data quality. In this presentation, data quality in the American Time Use Survey (ATUS) will be assessed through the use of paradata and survey responses. Specifically, I will talk about a data quality index I have created, which includes measures of different types of ATUS errors (e.g. low number of reported activities, failures to report an activity), and paradata variables (e.g. response latencies, incompletes). The overall objective of this study is to contribute to data quality assessment in the collection of timeline data from national surveys by providing insights on those interviewing dynamics that most impact data quality. These insights will help to improve future instruments and training of interviewers, as well as to reduce costs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Nebraska UR - http://hdl.handle.net/1813/43896 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: The ATUS and SIPP-EHC: Recent Developments Y1 - 2016 A1 - Belli, Robert F. AB - NCRN Meeting Spring 2016: The ATUS and SIPP-EHC: Recent Developments Belli, Robert F. One of the main objectives of the NCRN award to the University of Nebraska node is to investigate data quality associated with timeline interviewing as conducted with the American Time Use Survey (ATUS) time diary and the Survey of Income and Program Participation event history calendar (SIPP-EHC). Specifically, our efforts are focused on the relationships between interviewing dynamics as extracted from analyses of paradata with measures of data quality. With the ATUS, our recent efforts have revealed that respondents differ in how they handle difficulty with remembering activities, with some overcoming these difficulties and others succumbing to them. With the SIPP-EHC, we are still in the initial stages of extracting variables from the paradata that are associated with interviewing dynamics. Our work has also involved the development of a CATI time diary in which we are able to analyze audio streams to capture interviewing dynamics. I will conclude this talk by discussing challenges that have yet to be overcome with our work, and our vision of moving forward with the eventual development of self-administered timeline instruments that will be respondent-friendly due to the assistance of intelligent-agent driven virtual interviewers. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Nebraska UR - http://hdl.handle.net/1813/43893 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets Y1 - 2016 A1 - Caldwell, Carol A1 - Thompson, Katherine Jenny AB - NCRN Meeting Spring 2017: 2017 Economic Census: Towards Synthetic Data Sets Caldwell, Carol; Thompson, Katherine Jenny PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52165 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Differentially Private Verification of Regression Model Results Y1 - 2016 A1 - Reiter, Jerry AB - NCRN Meeting Spring 2017: Differentially Private Verification of Regression Model Results Reiter, Jerry PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52167 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Practical Issues in Anonymity Y1 - 2016 A1 - Clifton, Chris A1 - Merill, Shawn A1 - Merill, Keith AB - NCRN Meeting Spring 2017: Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52166 ER - TY - RPRT T1 - NCRN Newsletter: Volume 2 - Issue 4 Y1 - 2016 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB -

NCRN Newsletter: Volume 2 - Issue 4 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from September 2015 through December 2015. NCRN Newsletter Vol. 2, Issue 4: January 28, 2016.

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/42394 ER - TY - RPRT T1 - NCRN Newsletter: Volume 3 - Issue 1 Y1 - 2016 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 3 - Issue 1 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from January 2016 through May 2016. NCRN Newsletter Vol. 3, Issue 1: June 10, 2016 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/44199 ER - TY - RPRT T1 - NCRN Newsletter: Volume 3 - Issue 2 Y1 - 2016 A1 - Vilhuber, Lars A1 - Knight-Ingram, Dory AB - NCRN Newsletter: Volume 3 - Issue 2 Vilhuber, Lars; Knight-Ingram, Dory Overview of activities at NSF-Census Research Network nodes from June 2016 through December 2016. NCRN Newsletter Vol. 3, Issue 2: December 23, 2016 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/46171 ER - TY - JOUR T1 - Noise infusion as a confidentiality protection measure for graph-based statistics JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Abowd, John M. A1 - McKinney, Kevin L. AB - We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightforward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs. VL - 32 UR - http://content.iospress.com/articles/statistical-journal-of-the-iaos/sji958 IS - 1 ER - TY - RPRT T1 - The NSF-Census Research Network in 2016: Taking stock, looking forward Y1 - 2016 A1 - Vilhuber, Lars AB - The NSF-Census Research Network in 2016: Taking stock, looking forward Vilhuber, Lars An overview of the activities of the NSF-Census Research Network as of 2016, given on Saturday, May 21, 2016, at a workshop on spatial and spatio-temporal design and analysis for official statistics, hosted by the Spatio-Temporal Statistics NSF Census Research Network (STSN) at the University of Missouri, and sponsored by the NSF-Census Research Network (NCRN) PB - University of Missouri UR - http://hdl.handle.net/1813/46210 ER - TY - JOUR T1 - Parallel associations and the structure of autobiographical knowledge JF - Journal of Applied Research in Memory and Cognition Y1 - 2016 A1 - Belli, R.F. A1 - T. Al Baghal KW - Autobiographical memory; Autobiographical knowledge; Autobiographical periods; Episodic memory; Retrospective reports AB - The self-memory system (SMS) model of autobiographical knowledge conceives that memories are structured thematically, organized both hierarchically and temporally. This model has been challenged on several fronts, including the absence of parallel linkages across pathways. Calendar survey interviewing shows the frequent and varied use of parallel associations in autobiographical recall. Parallel associations in these data are commonplace, and are driven more by respondents’ generative retrieval than by interviewers’ probing. Parallel associations represent a number of autobiographical knowledge themes that are interrelated across life domains. The content of parallel associations is nearly evenly split between general and transitional events, supporting the importance of transitions in autographical memory. Associations in respondents’ memories (both parallel and sequential), demonstrate complex interactions with interviewer verbal behaviors during generative retrieval. In addition to discussing the implications of these results to the SMS model, implications are also drawn for transition theory and the basic-systems model. VL - 5 IS - 2 ER - TY - RPRT T1 - Practical Issues in Anonymity Y1 - 2016 A1 - Clifton, Chris A1 - Merill, Shawn A1 - Merill, Keith AB - Practical Issues in Anonymity Clifton, Chris; Merill, Shawn; Merill, Keith PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52166 ER - TY - JOUR T1 - Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering, JF - Journal of Privacy and Confidentiality Y1 - 2016 A1 - Murray, J. S. AB - Probabilistic record linkage, the task of merging two or more databases in the absence of a unique identifier, is a perennial and challenging problem. It is closely related to the problem of deduplicating a single database, which can be cast as linking a single database against itself. In both cases the number of possible links grows rapidly in the size of the databases under consideration, and in most applications it is necessary to first reduce the number of record pairs that will be compared. Spurred by practical considerations, a range of methods have been developed for this task. These methods go under a variety of names, including indexing and blocking, and have seen significant development. However, methods for inferring linkage structure that account for indexing, blocking, and additional filtering steps have not seen commensurate development. In this paper we review the implications of indexing, blocking and filtering within the popular Fellegi-Sunter framework, and propose a new model to account for particular forms of indexing and filtering. VL - 7 UR - http://repository.cmu.edu/jpc/vol7/iss1/2 IS - 1 ER - TY - RPRT T1 - Regression Modeling and File Matching Using Possibly Erroneous Matching Variables Y1 - 2016 A1 - Dalzell, N. M. A1 - Reiter, J. P. KW - Statistics - Applications AB - Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both files. Typically, however, the resulting matches are inexact: some cross-classifications of the matching variables do not generate unique links across files. Further, the matching variables can be subject to reporting errors, which introduce additional uncertainty in analyses. We present a Bayesian file matching methodology designed to estimate regression models and match records simultaneously when categorical matching variables are subject to reporting error. The method relies on a hierarchical model that includes (1) the regression of interest involving variables from the two files given a vector indicating the links, (2) a model for the linking vector given the true values of the matching variables, (3) a measurement error model for reported values of the matching variables given their true values, and (4) a model for the true values of the matching variables. We describe algorithms for sampling from the posterior distribution of the model. We illustrate the methodology using artificial data and data from education records in the state of North Carolina. PB - ArXiv UR - http://arxiv.org/abs/1608.06309 ER - TY - JOUR T1 - Releasing synthetic magnitude micro data constrained to fixed marginal totals JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Wei, Lan A1 - Reiter, Jerome P. KW - Confidential KW - Disclosure KW - establishment KW - mixture KW - poisson KW - risk AB - We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariate distribution, and release draws from the posterior predictive distribution of the model. We develop approaches that guarantee the synthetic data sum to marginal totals computed from the original data, as well approaches that do not enforce this equality. For both cases, we present methods for assessing disclosure risks inherent in releasing synthetic magnitude microdata. We illustrate the methodology using economic data from a survey of manufacturing establishments. VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji959 IS - 1 ER - TY - JOUR T1 - Simultaneous edit-imputation and disclosure limitation for business establishment data JF - Journal of Applied Statistics Y1 - 2016 A1 - H. J. Kim A1 - J. P. Reiter A1 - A. F. Karr AB - Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks. ER - TY - JOUR T1 - Spatial Variation in the Quality of American Community Survey Estimates JF - Demography Y1 - 2016 A1 - Folch, David C. A1 - Arribas-Bel, Daniel A1 - Koschinsky, Julia A1 - Spielman, Seth E. VL - 53 ER - TY - JOUR T1 - Synthetic establishment microdata around the world JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Vilhuber, Lars A1 - Abowd, John M. A1 - Reiter, Jerome P. KW - Business data KW - confidentiality KW - differential privacy KW - international comparison KW - Multiple imputation KW - synthetic AB - In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business microdata is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic \emph{establishment} microdata. This overview situates those papers, published in this issue, within the broader literature. VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji964 IS - 1 ER - TY - THES T1 - Topics on Official Statistics and Statistical Policy T2 - Statistics Y1 - 2016 A1 - Zachary Seeskin AB - My dissertation studies decision questions for government statistical agencies, both regarding data collection and how to combine data from multiple sources. Informed decisions regarding expenditure on data collection require information about the effects of data quality on data use. For the first topic, I study two important uses of decennial census data in the U.S.: for apportioning the House of Representatives and for allocating federal funds. Estimates of distortions in these two uses are developed for different levels of census accuracy. Then, I thoroughly investigate the sensitivity of findings to the census error distribution and to the choice of how to measure the distortions. The chapter concludes with a proposed framework for partial cost-benefit analysis that charges a share of the cost of the census to allocation programs. Then, I investigate an approximation to make analysis of the effects of census error on allocations feasible when allocations also depend on non-census statistics, as is the case for many formula-based allocations. The approximation conditions on the realized values of the non-census statistics instead of using the joint distribution over both census and non-census statistics. The research studies how using the approximation affects conclusions. I find that in some simple cases, the approximation always either overstates or equals the true effects of census error. Understatement is possible in other cases, but theory suggests that the largest possible understatements are about one-third the amount of the largest possible overstatements. In simulations with a more complex allocation formula, the approximation tends to overstate the effects of census error with the overstatement increasing with error in non-census statistics but decreasing with error in census statistics. In the final chapter, I evaluate the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, I evaluate the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. I examine three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compare how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method. JF - Statistics PB - Northwestern University CY - Evanston, Illinois VL - PHD UR - http://search.proquest.com/docview/1826016819 ER - TY - JOUR T1 - Using Data Mining to Predict the Occurrence of Respondent Retrieval Strategies in Calendar Interviewing: The Quality of Retrospective Reports JF - Journal of Official Statistics Y1 - 2016 A1 - Belli, Robert F. A1 - Miller, L. Dee A1 - Baghal, Tarek Al A1 - Soh, Leen-Kiat AB - Determining which verbal behaviors of interviewers and respondents are dependent on one another is a complex problem that can be facilitated via data-mining approaches. Data are derived from the interviews of 153 respondents of the Panel Study of Income Dynamics (PSID) who were interviewed about their life-course histories. Behavioral sequences of interviewer-respondent interactions that were most predictive of respondents spontaneously using parallel, timing, duration, and sequential retrieval strategies in their generation of answers were examined. We also examined which behavioral sequences were predictive of retrospective reporting data quality as shown by correspondence between calendar responses with responses collected in prior waves of the PSID. The verbal behaviors of immediately preceding interviewer and respondent turns of speech were assessed in terms of their co-occurrence with each respondent retrieval strategy. Interviewers’ use of parallel probes is associated with poorer data quality, whereas interviewers’ use of timing and duration probes, especially in tandem, is associated with better data quality. Respondents’ use of timing and duration strategies is also associated with better data quality and both strategies are facilitated by interviewer timing probes. Data mining alongside regression techniques is valuable to examine which interviewer-respondent interactions will benefit data quality. VL - 32 IS - 3 ER - TY - JOUR T1 - Using partially synthetic microdata to protect sensitive cells in business statistics JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Miranda, Javier A1 - Vilhuber, Lars KW - confidentiality protection KW - gross job flows KW - local labor markets KW - Statistical Disclosure Limitation KW - Synthetic data KW - time-series AB - We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions). VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji963 IS - 1 ER - TY - RPRT T1 - Why Statistical Agencies Need to Take Privacy-loss Budgets Seriously, and What It Means When They Do Y1 - 2016 A1 - John M. Abowd UR - http://digitalcommons.ilr.cornell.edu/ldi/32/ ER - TY - JOUR T1 - Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples JF - Journal of Survey Statistics and Methodology Y1 - 2015 A1 - Schifeling, T. A1 - Cheng, C. A1 - Hillygus, D. S. A1 - Reiter, J. P. AB - Panel surveys typically su↵er from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, panel data alone cannot inform the extent of the bias from the attrition, so that analysts using the panel data alone must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during some later wave of the panel. Refreshment samples o↵er information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the refreshment sample itself. As we illustrate, nonignorable unit nonresponse can significantly compromise the analyst’s ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences—corrected for panel attrition—are to di↵erent assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. VL - 3 UR - http://jssam.oxfordjournals.org/content/3/3/265.abstract IS - 3 ER - TY - JOUR T1 - Bayesian Analysis of Spatially-Dependent Functional Responses with Spatially-Dependent Multi-Dimensional Functional Predictors JF - Statistica Sinica Y1 - 2015 A1 - Yang, W. H. A1 - Wikle, C.K. A1 - Holan, S.H. A1 - Sudduth, K. A1 - Meyers, D.B. VL - 25 UR - http://www3.stat.sinica.edu.tw/preprint/SS-13-245w_Preprint.pdf ER - TY - JOUR T1 - Bayesian Binomial Mixture Models for Estimating Abundance in Ecological Monitoring Studies JF - Annals of Applied Statistics Y1 - 2015 A1 - Wu, G. A1 - Holan, S.H. A1 - Nilon, C.H. A1 - Wikle, C.K. VL - 9 UR - http://projecteuclid.org/euclid.aoas/1430226082 ER - TY - JOUR T1 - Bayesian Hierarchical Statistical SIRS Models JF - Statistical Methods and Applications Y1 - 2015 A1 - Zhuang, L. A1 - Cressie, N. VL - 23 ER - TY - JOUR T1 - Bayesian Latent Pattern Mixture Models for Handling Attrition in Panel Studies With Refreshment Samples JF - ArXiv Y1 - 2015 A1 - Yajuan Si A1 - Jerome P. Reiter A1 - D. Sunshine Hillygus KW - Categorical KW - Dirichlet pro- cess KW - Multiple imputation KW - Non-ignorable KW - Panel attrition KW - Refreshment sample AB - Many panel studies collect refreshment samples---new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by non-ignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. UR - http://arxiv.org/abs/1509.02124 IS - 1509.02124 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis JF - ArXiv Y1 - 2015 A1 - Yang, W. H. A1 - Holan, S. H. A1 - Wikle, C.K. AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - http://arxiv.org/abs/1408.2757 IS - 1408.2757 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time–Frequency Analysis JF - Project Euclid Y1 - 2015 A1 - Yang, W. H. A1 - Holan, Scott H. A1 - Wikle, Christopher K. KW - locally stationary KW - model selection KW - nonstationary partial autocorrelation KW - piecewise stationary KW - sequential estimation KW - time-varying spectral density AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time–frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - http://projecteuclid.org/euclid.ba/1445263834 ER - TY - JOUR T1 - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography JF - Spatial Statistics Y1 - 2015 A1 - Quick, Harrison A1 - Holan, Scott H. A1 - Wikle, Christopher K. A1 - Reiter, Jerome P. VL - 14 UR - http://www.sciencedirect.com/science/article/pii/S2211675315000718 ER - TY - JOUR T1 - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography JF - ArXiv Y1 - 2015 A1 - Quick, H. A1 - Holan, S. H. A1 - Wikle, C. K. A1 - Reiter, J. P. AB - Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protecting the confidentiality of data subjects' identities and attributes. Typically, data stewards meet this challenge by coarsening the resolution of the released geography and, as needed, perturbing the confidential attributes. When done with high intensity, these redaction strategies can result in released data with poor analytic quality. We propose an alternative dissemination approach based on fully synthetic data. We generate data using marked point process models that can maintain both the statistical properties and the spatial dependence structure of the confidential data. We illustrate the approach using data consisting of mortality records from Durham, North Carolina. UR - http://arxiv.org/abs/1407.7795 IS - 1407.7795 ER - TY - JOUR T1 - Bayesian Semiparametric Hierarchical Empirical Likelihood Spatial Models JF - Journal of Statistical Planning and Inference Y1 - 2015 A1 - Porter, A.T. A1 - Holan, S.H. A1 - Wikle, C.K. VL - 165 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2015 A1 - Bradley, Jonathan A1 - Wikle, C.K. A1 - Holan, S. H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2015 A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. A1 - Holan, Scott H. KW - Aggregation KW - American Community Survey KW - Bayesian hierarchical model KW - Givens angle prior KW - Markov chain Monte Carlo KW - Multiscale model KW - Non-Gaussian. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count–Valued Survey Data JF - ArXiv Y1 - 2015 A1 - Bradley, J. R. A1 - Wikle, C.K. A1 - Holan, S. H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://arxiv.org/abs/1405.7227 IS - 1405.7227 ER - TY - RPRT T1 - Blocking Methods Applied to Casualty Records from the Syrian Conflict Y1 - 2015 A1 - Sadosky, Peter A1 - Shrivastava, Anshumali A1 - Price, Megan A1 - Steorts, Rebecca JF - ArXiv UR - http://arxiv.org/abs/1510.07714 ER - TY - JOUR T1 - Capturing multivariate spatial dependence: Model, estimate, and then predict JF - Statistical Science Y1 - 2015 A1 - Cressie, N. A1 - Burden, S. A1 - Davis, W. A1 - Krivitsky, P. A1 - Mokhtarian, P. A1 - Seusse, T. A1 - Zammit-Mangion, A. VL - 30 UR - http://projecteuclid.org/euclid.ss/1433341474 IS - 2 ER - TY - RPRT T1 - Categorical data fusion using auxiliary information Y1 - 2015 A1 - Fosdick, B. K. A1 - Maria DeYoreo A1 - J. P. Reiter AB - In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher HarperCollins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people's preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion. PB - arXiv UR - http://arxiv.org/abs/1506.05886 ER - TY - JOUR T1 - Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the “Great Recession:” Spatial Differentiation in Remotely Sensed Land-Cover Dynamics JF - Population and Environment Y1 - 2015 A1 - Wilson, C. R. A1 - Brown, D. G. VL - 36 UR - http://link.springer.com/article/10.1007%2Fs11111-014-0219-y IS - 3 ER - TY - CONF T1 - Changing ‘Who’ or ‘Where’: Implications for Data Quality in the American Time Use Survey T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Deal, C.E. A1 - Kirchner, A. A1 - Cordova-Cazar, A.L. A1 - Ellyne, L. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Comment on Article by Ferreira and Gamerman JF - Bayesian Analysis Y1 - 2015 A1 - Cressie, N. A1 - Chambers, R. L. VL - 10 UR - http://projecteuclid.org/euclid.ba/1429880217 IS - 3 ER - TY - JOUR T1 - Comment on Semiparametric Bayesian Density Estimation with Disparate Data Sources: A Meta-Analysis of Global Childhood Undernutrition" by Finncane, M. M., Paciorek, C. J., Stevens, G. A., and Ezzati, M. JF - Journal of the American Statistical Association Y1 - 2015 A1 - Wikle, C.K. A1 - Holan, S.H. ER - TY - JOUR T1 - Comment: Spatial sampling designs depend as much on “how much?” and “why?” as on “where?” JF - Bayesian Analysis Y1 - 2015 A1 - Cressie, N. A1 - Chambers, R. L. AB - A comment on “Optimal design in geostatistics under preferential sampling” by G. da Silva Ferreira and D. Gamerman ER - TY - JOUR T1 - Communicating Uncertainty in Official Economic Statistics: An Appraisal Fifty Years after Morgenstern JF - Journal of Economic Literature Y1 - 2015 A1 - Manski, Charles F. KW - and Organizing Macroeconomic Data; Data Access E23: Macroeconomics: Production KW - B22: History of Economic Thought: Macroeconomics C82: Methodology for Collecting KW - Estimating AB - Federal statistical agencies in the United States and analogous agencies elsewhere commonly report official economic statistics as point estimates, without accompanying measures of error. Users of the statistics may incorrectly view them as error free or may incorrectly conjecture error magnitudes. This paper discusses strategies to mitigate misinterpretation of official statistics by communicating uncertainty to the public. Sampling error can be measured using established statistical principles. The challenge is to satisfactorily measure the various forms of nonsampling error. I find it useful to distinguish transitory statistical uncertainty, permanent statistical uncertainty, and conceptual uncertainty. I illustrate how each arises as the Bureau of Economic Analysis periodically revises GDP estimates, the Census Bureau generates household income statistics from surveys with nonresponse, and the Bureau of Labor Statistics seasonally adjusts employment statistics. I anchor my discussion of communication of uncertainty in the contribution of Oskar Morgenstern (1963a), who argued forcefully for agency publication of error estimates for official economic statistics. (JEL B22, C82, E23) VL - 53 UR - http://www.aeaweb.org/articles.php?doi=10.1257/jel.53.3.631 ER - TY - JOUR T1 - Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0415-1 IS - 1 ER - TY - THES T1 - A Comparison of Multiple Imputation Methods for Categorical Data (Master's Thesis) T2 - Statistical Science Y1 - 2015 A1 - Akande, O. JF - Statistical Science PB - Duke University ER - TY - RPRT T1 - Cost-Benefit Analysis for a Quinquennial Census: The 2016 Population Census of South Africa. Y1 - 2015 A1 - Spencer, Bruce D. A1 - May, Julian A1 - Kenyon, Steven A1 - Seeskin, Zachary H. KW - demographic statistics KW - fiscal allocations KW - loss function KW - population estimates KW - post-censal estimates AB -

The question of whether to carry out a quinquennial census is being faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. The authors describe uses, and limitations, of cost-benefit analysis for this decision problem in the case of the 2016 census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 census or to rely on increasingly inaccurate post-censal estimates accounting for births, deaths, and migration since the previous (2011) census. The cost-benefit analysis compared predicted costs of the 2016 census to the benefits from improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 census, although not the only important benefit. Without the 2016 census, allocations would be based on population estimates. Accuracy of the post-censal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 census, but instead to improve data and capacity for producing post-censal estimates.

JF - IPR Working Paper Series PB - Northwestern University, Institute for Policy Research UR - http://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-06.html ER - TY - CONF T1 - Determining Potential for Breakoff in Time Diary Survey Using Paradata T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Wettlaufer, D. A1 - Arunachalam, H. A1 - Atkin, G. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Dirichlet Process Mixture Models for Nested Categorical Data JF - ArXiv Y1 - 2015 A1 - Hu, J. A1 - Reiter, J.P. A1 - Wang, Q. AB - We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files with high analytic validity and low disclosure risks. Supplementary materials for this article are available online. UR - http://arxiv.org/pdf/1412.2282v3.pdf IS - 1412.2282 ER - TY - THES T1 - Dirichlet Process Mixture Models for Nested Categorical Data (Ph.D. Thesis) T2 - Statistical Science Y1 - 2015 A1 - Hu, J. JF - Statistical Science PB - Duke University UR - http://dukespace.lib.duke.edu/dspace/handle/10161/9933 ER - TY - CONF T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors T2 - International Conference on Total Survey Error Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. A1 - Kirchner, A. JF - International Conference on Total Survey Error CY - Baltimore, MD UR - http://www.niss.org/events/2015-international-total-survey-error-conference ER - TY - CONF T1 - Do Interviewers with High Cooperation Rates Behave Differently? Interviewer Cooperation Rates and Interview Behaviors T2 - Joint Statistical Meetings Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. A1 - Kirchner, A. JF - Joint Statistical Meetings CY - Seattle, WA UR - http://www.amstat.org/meetings/jsm/2015/program.cfm ER - TY - THES T1 - Dynamic Models of Human Capital Accumulation (Ph.D. Thesis) T2 - Economics Y1 - 2015 A1 - Ransom, T. JF - Economics PB - Duke University UR - http://dukespace.lib.duke.edu/dspace/handle/10161/9929 ER - TY - RPRT T1 - Economic Analysis and Statistical Disclosure Limitation Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian M. AB -

Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies.

PB - Cornell University UR - http://hdl.handle.net/1813/40581 ER - TY - JOUR T1 - Economic Analysis and Statistical Disclosure Limitation JF - Brookings Papers on Economic Activity Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian M. AB - Economic Analysis and Statistical Disclosure Limitation Abowd, John M.; Schmutte, Ian M. This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies. VL - Spring 2015 UR - http://www.brookings.edu/about/projects/bpea/papers/2015/economic-analysis-statistical-disclosure-limitation ER - TY - JOUR T1 - The Effect of CATI Questionnaire Design Features on Response Timing JF - Journal of Survey Statistics and Methodology Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. VL - 3 IS - 3 ER - TY - RPRT T1 - Effects of Census Accuracy on Apportionment of Congress and Allocations of Federal Funds. Y1 - 2015 A1 - Seeskin, Zachary H. A1 - Spencer, Bruce D. AB -

How much accuracy is needed in the 2020 census depends on the cost of attaining accuracy and on the consequences of imperfect accuracy. The cost target for the 2020 census of the United States has been specified, and the Census Bureau is developing projections of the accuracy attainable for that cost. It is desirable to have information about the consequences of the accuracy that might be attainable for that cost or for alternative cost levels. To assess the consequences of imperfect census accuracy, Seeskin and Spencer consider alternative profiles of accuracy for states and assess their implications for apportionment of the U.S. House of Representatives and for allocation of federal funds. An error in allocation is defined as the difference between the allocation computed under imperfect data and the allocation computed with perfect data. Estimates of expected sums of absolute values of errors are presented for House apportionment and for federal funds allocations.

NCRN Meeting Spring 2015: A Vision for the Future of Data Access Reiter, J.P. Presentation at the NCRN Meeting Spring 2015

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40181 ER - TY - Generic T1 - NCRN Meeting Spring 2015: Broadening data access through synthetic data Y1 - 2015 A1 - Vilhuber, Lars AB -

NCRN Meeting Spring 2015: Broadening data access through synthetic data Vilhuber, Lars Presentation at the NCRN Meeting Spring 2015

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40185 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Building and Training the Next Generation of Survey Methodologists and Researchers Y1 - 2015 A1 - Nugent, Rebecca AB - NCRN Meeting Spring 2015: Building and Training the Next Generation of Survey Methodologists and Researchers Nugent, Rebecca Presentation at the NCRN Meetings Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40188 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System? Examples from the NSF-Census Research Network Y1 - 2015 A1 - Abowd, John M. A1 - Fienberg, Stephen E. AB - NCRN Meeting Spring 2015: Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System? Examples from the NSF-Census Research Network Abowd, John M.; Fienberg, Stephen E. May 8, 2015 CNSTAT Public Seminar PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40186 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Comment on: Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System? Examples from the NSF-Census Research Network Y1 - 2015 A1 - Groshen, Erica L. AB - NCRN Meeting Spring 2015: Comment on: Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System? Examples from the NSF-Census Research Network Groshen, Erica L. Public Seminar Presentation by Erica L. Groshen at the Spring 2015 NCRN/CNSTAT Meetings PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40187 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Geographic Aspects of Direct and Indirect Estimators for Small Area Estimation Y1 - 2015 A1 - Nagle, Nicholas AB - NCRN Meeting Spring 2015: Geographic Aspects of Direct and Indirect Estimators for Small Area Estimation Nagle, Nicholas Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40182 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey Y1 - 2015 A1 - Spielman, Seth AB - NCRN Meeting Spring 2015: Geography and Usability of the American Community Survey Spielman, Seth Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40183 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data Y1 - 2015 A1 - Holan, Scott A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. AB - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data Holan, Scott; Bradley, Jonathan R.; Wikle, Christopher K. Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40176 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error Y1 - 2015 A1 - Wikle, Christopher K. A1 - Bradley, Jonathan A1 - Holan, Scott AB - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error Wikle, Christopher K.; Bradley, Jonathan; Holan, Scott Develop and implement a statistical criterion to diagnose spatial aggregation error that can facilitate the choice of regionalizations of spatial data. Presentation at NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40177 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Y1 - 2015 A1 - Abowd, John M. A1 - Schmutte, Ian AB - NCRN Meeting Spring 2015: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods Abowd, John M.; Schmutte, Ian Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40184 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Survey Informatics: The Future of Survey Methodology and Survey Statistics Training in the Academy? Y1 - 2015 A1 - McCutcheon, Allan L. AB -

NCRN Meeting Spring 2015: Survey Informatics: The Future of Survey Methodology and Survey Statistics Training in the Academy? McCutcheon, Allan L. Presentation at the NCRN Meeting Spring 2015

NCRN Newsletter: Volume 2 - Issue 3 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from June 2015 through August 2015. NCRN Newsletter Vol. 2, Issue 3: September 15, 2015.

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/42393 ER - TY - RPRT T1 - Noise Infusion as a Confidentiality Protection Measure for Graph-Based Statistics Y1 - 2015 A1 - Abowd, John A. A1 - McKinney, Kevin L. AB - Noise Infusion as a Confidentiality Protection Measure for Graph-Based Statistics Abowd, John A.; McKinney, Kevin L. We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightforward extension of the dynamic noise-infusion method used in the U.S. Census Bureau’s Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs. PB - Cornell University UR - http://hdl.handle.net/1813/42338 ER - TY - JOUR T1 - Nonparametric Bayesian models with focused clustering for mixed ordinal and nominal data JF - ArXiV Y1 - 2015 A1 - DeYoreo, Maria A1 - Reiter , J. P. A1 - Hillygus, D. S. AB - Dirichlet process mixtures can be useful models of multivariate categorical data and effective tools for multiple imputation of missing categorical values. In some contexts, however, these models can fit certain variables well at the expense of others in ways beyond the analyst's control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the marginal distributions of the nearly and fully complete variables at the expense of the variables with high fractions of missing data. Motivated by this setting, we present a Dirichlet process mixture model for mixed ordinal and nominal data that allows analysts to split variables into two groups: focus variables and remainder variables. The model uses three sets of clusters, one set for ordinal focus variables, one for nominal focus variables, and one for all remainder variables. The model uses a multivariate ordered probit specification for the ordinal variables and independent multinomial kernels for the nominal variables. The three sets of clusters are linked using an infinite tensor factorization prior, as well as via dependence of the means of the latent continuous focus variables on the remainder variables. This effectively specifies a rich, complex model for the focus variables and a simpler model for remainder variables, yet still potentially captures associations among the variables. In the multiple imputation context, focus variables include key variables with high rates of missing values, and remainder variables include variables without much missing data. Using simulations, we illustrate advantages and limitations of using focused clustering compared to mixture models that do not distinguish variables. We apply the model to handle missing values in an analysis of the 2012 American National Election Study. PB - arXiv UR - http://arxiv.org/abs/1508.03758 IS - 1508.03758 ER - TY - JOUR T1 - Nonparametric Bayesian models with focused clustering for mixed ordinal and nominal data JF - Bayesian Analysis Y1 - 2015 A1 - M. De Yoreo A1 - J. P. Reiter A1 - D. S. Hillygus AB - Dirichlet process mixtures can be useful models of multivariate categorical data and effective tools for multiple imputation of missing categorical values. In some contexts, however, these models can fit certain variables well at the expense of others in ways beyond the analyst's control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the marginal distributions of the nearly and fully complete variables at the expense of the variables with high fractions of missing data. Motivated by this setting, we present a Dirichlet process mixture model for mixed ordinal and nominal data that allows analysts to split variables into two groups: focus variables and remainder variables. The model uses three sets of clusters, one set for ordinal focus variables, one for nominal focus variables, and one for all remainder variables. The model uses a multivariate ordered probit specification for the ordinal variables and independent multinomial kernels for the nominal variables. The three sets of clusters are linked using an infinite tensor factorization prior, as well as via dependence of the means of the latent continuous focus variables on the remainder variables. This effectively specifies a rich, complex model for the focus variables and a simpler model for remainder variables, yet still potentially captures associations among the variables. In the multiple imputation context, focus variables include key variables with high rates of missing values, and remainder variables include variables without much missing data. Using simulations, we illustrate advantages and limitations of using focused clustering compared to mixture models that do not distinguish variables. We apply the model to handle missing values in an analysis of the 2012 American National Election Study. ER - TY - JOUR T1 - A nonparametric, multiple imputation-based method for the retrospective integration of data sets JF - Multivariate Behavioral Research Y1 - 2015 A1 - M.M. Carrig A1 - D. Manrique-Vallier A1 - K. Ranby A1 - J.P. Reiter A1 - R. Hoyle VL - 50 UR - http://www.tandfonline.com/doi/full/10.1080/00273171.2015.1022641 IS - 4 ER - TY - JOUR T1 - Perceptions, behaviors and satisfaction related to public safety for persons with disabilities in the United States JF - Criminal Justice Review Y1 - 2015 A1 - Brucker, D. VL - 1 IS - 18 ER - TY - CONF T1 - Predicting Breakoff Using Sequential Machine Learning Methods T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Soh, L.-K. A1 - Eck, A. A1 - McCutcheon, A.L. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Presentation: NADDI 2015: Crowdsourcing DDI Development: New Features from the CED2AR Project Y1 - 2015 A1 - Perry, Benjamin A1 - Kambhampaty, Venkata A1 - Brumsted, Kyle A1 - Vilhuber, Lars A1 - Block, William AB - Presentation: NADDI 2015: Crowdsourcing DDI Development: New Features from the CED2AR Project Perry, Benjamin; Kambhampaty, Venkata; Brumsted, Kyle; Vilhuber, Lars; Block, William Recent years have shown the power of user-sourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a labor-intensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED2AR) infrastructure, we demonstrate a prototype of crowdsourced DDI, using DDI-C and supplemental XML. The system allows for any number of network connected instances (web or desktop deployments) of the CED2AR DDI editor to concurrently create and modify metadata. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowd-sourced content. We briefly discuss offline edit contributions as well. CED2AR uses DDI-C and supplemental XML together with Git for a very portable and lightweight implementation. This distributed network implementation allows for large scale metadata curation without the need for a hardware intensive computing environment, and can leverage existing cloud services, such as Github or Bitbucket. Ben Perry (Cornell/NCRN) presents joint work with Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block at NADDI 2015. PB - Cornell University UR - http://hdl.handle.net/1813/40172 ER - TY - JOUR T1 - Preventive policy strategy for banking the unbanked: Savings accounts for teenagers? JF - Journal of Poverty Y1 - 2015 A1 - Friedline, T. A1 - Despard, M. A1 - Chowa, G. KW - financial assets KW - savings KW - Survey of Income and Program Participation (SIPP) KW - teenagers KW - unbanked KW - young adults AB - Concern over percentages of unbanked and underbanked households in the United States and their lack of connectedness to the financial mainstream has led to policy strategies geared toward reaching these households. Using nationally-representative longitudinal data, a preventive strategy for banking households is tested that asks whether young adults are more likely to be banked and own a diversity of financial assets when they are connected to the financial mainstream as teenagers. Young adults are more likely to own checking accounts, savings accounts, certificates of deposit, and stocks when they had savings accounts as teenagers. Policy implications are discussed. VL - 20 UR - http://www.tandfonline.com/doi/full/10.1080/10875549.2015.1015068 IS - 1 ER - TY - JOUR T1 - Privacy and human behavior in the age of information JF - Science Y1 - 2015 A1 - Alessandro Acquisti A1 - Laura Brandimarte A1 - George Loewenstein KW - confidentiality KW - privacy AB - This Review summarizes and draws connections between diverse streams of empirical research on privacy behavior. We use three themes to connect insights from social and behavioral sciences: people’s uncertainty about the consequences of privacy-related behaviors and their own preferences over those consequences; the context-dependence of people’s concern, or lack thereof, about privacy; and the degree to which privacy concerns are malleable—manipulable by commercial and governmental interests. Organizing our discussion by these themes, we offer observations concerning the role of public policy in the protection of privacy in the information age. VL - 347 UR - http://www.sciencemag.org/content/347/6221/509 IS - 6221 ER - TY - THES T1 - Probabilistic Hashing Techniques For Big Data T2 - Computer Science Y1 - 2015 A1 - Anshumali Shrivastava AB - We investigate probabilistic hashing techniques for addressing computational and memory challenges in large scale machine learning and data mining systems. In this thesis, we show that the traditional idea of hashing goes far beyond near-neighbor search and there are some striking new possibilities. We show that hashing can improve state of the art large scale learning algorithms, and it goes beyond the conventional notions of pairwise similarities. Despite being a very well studied topic in literature, we found several opportunities for fundamentally improving some of the well know textbook hashing algorithms. In particular, we show that the traditional way of computing minwise hashes is unnecessarily expensive and without loosing anything we can achieve an order of magnitude speedup. We also found that for cosine similarity search there is a better scheme than SimHash. In the end, we show that the existing locality sensitive hashing framework itself is very restrictive, and we cannot have efficient algorithms for some important measures like inner products which are ubiquitous in machine learning. We propose asymmetric locality sensitive hashing (ALSH), an extended framework, where we show provable and practical efficient algorithms for Maximum Inner Product Search (MIPS). Having such an efficient solutions to MIPS directly scales up many popular machine learning algorithms. We believe that this thesis provides significant improvements to some of the heavily used subroutines in big-data systems, which we hope will be adopted. JF - Computer Science PB - Cornell University VL - Ph.D. UR - https://ecommons.cornell.edu/handle/1813/40886 ER - TY - THES T1 - Ranking Firms Using Revealed Preference and Other Essays About Labor Markets T2 - Department of Economics Y1 - 2015 A1 - Isaac Sorkin KW - economics KW - labor markets AB - This dissertation contains essays on three questions about the labor market. Chapter 1 considers the question: why do some firms pay so much and some so little? Firms account for a substantial portion of earnings inequality. Although the standard explanation is that there are search frictions that support an equilibrium with rents, this chapter finds that compensating differentials for nonpecuniary characteristics are at least as important. To reach this finding, this chapter develops a structural search model and estimates it on U.S. administrative data. The model analyzes the revealed preference information in the labor market: specifically, how workers move between the 1.5 million firms in the data. With on the order of 1.5 million parameters, standard estimation approaches are infeasible and so the chapter develops a new estimation approach that is feasible on such big data. Chapter 2 considers the question: why do men and women work at different firms? Men work for higher-paying firms than women. The chapter builds on chapter 1 to consider two explanations for why men and women work in different firms. First, men and women might search from different offer distributions. Second, men and women might have different rankings of firms. Estimation finds that the main explanation for why men and women are sorted is that women search from a lower-paying offer distribution than men. Indeed, men and women are estimated to have quite similar rankings of firms. Chapter 3 considers the question: what are there long-run effects of the minimum wage? An empirical consensus suggests that there are small employment effects of minimum wage increases. This chapter argues that these are short-run elasticities. Long-run elasticities, which may differ from short-run elasticities, are more policy relevant. This chapter develops a dynamic industry equilibrium model of labor demand. The model makes two points. First, long-run regressions have been misinterpreted because even if the short- and long-run employment elasticities differ, standard methods would not detect a difference using U.S. variation. Second, the model offers a reconciliation of the small estimated short-run employment effects with the commonly found pass-through of minimum wage increases to product prices. JF - Department of Economics PB - University of Michigan CY - Ann Arbor, MI UR - http://hdl.handle.net/2027.42/116747 ER - TY - JOUR T1 - Record Linkage using STATA: Pre-processing, Linking and Reviewing Utilities JF - The Stata Journal Y1 - 2015 A1 - Wasi, Nada A1 - Flaaen, Aaron AB - In this article, we describe Stata utilities that facilitate probabilistic record linkage—the technique typically used for merging two datasets with no common record identifier. While the preprocessing tools are developed specifically for linking two company databases, the other tools can be used for many different types of linkage. Specifically, the stnd_compname and stnd_address commands parse and standardize company names and addresses to improve the match quality when linking. The reclink2 command is a generalized version of Blasnik's reclink (2010, Statistical Software Components S456876, Department of Economics, Boston College) that allows for many-to-one matching. Finally, clrevmatch is an interactive tool that allows the user to review matched results in an efficient and seamless manner. Rather than exporting results to another file format (for example, Excel), inputting clerical reviews, and importing back into Stata, one can use the clrevmatch tool to conduct all of these steps within Stata. This helps improve the speed and flexibility of matching, which often involves multiple runs. VL - 15 UR - http://www.stata-journal.com/article.html?article=dm0082 IS - 3 ER - TY - CONF T1 - Recording What the Respondent Says: Does Question Format Matter? T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Smyth, J.D. A1 - Olson, K. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Reducing the Margins of Error in the American Community Survey Through Data-Driven Regionalization JF - PlosOne Y1 - 2015 A1 - Folch, D. A1 - Spielman, S. E. UR - http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115626 ER - TY - JOUR T1 - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error JF - ArXiv Y1 - 2015 A1 - Bradley, J. R. A1 - Wikle, C.K. A1 - Holan, S. H. AB - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds. UR - http://arxiv.org/abs/1502.01974 IS - 1502.01974 ER - TY - JOUR T1 - Rejoinder on: Comparing and selecting spatial predictors using local criteria JF - Test Y1 - 2015 A1 - Bradley, J.R. A1 - Cressie, N. A1 - Shi, T. VL - 24 UR - http://dx.doi.org/10.1007/s11749-014-0414-2 IS - 1 ER - TY - THES T1 - Relaxations of differential privacy and risk utility evaluations of synthetic data and fidelity measures T2 - Statistics Department Y1 - 2015 A1 - McClure, D. AB - Many organizations collect data that would be useful to public researchers, but cannot be shared due to promises of confidentiality to those that participated in the study. This thesis evaluates the risks and utility of several existing release methods, as well as develops new ones with different risk/utility tradeoffs. In Chapter 2, I present a new risk metric, called model-specific probabilistic differ- ential privacy (MPDP), which is a relaxed version of differential privacy that allows the risk of a release to be based on the worst-case among plausible datasets instead of all possible datasets. In addition, I develop a generic algorithm called local sensitiv- ity random sampling (LSRS) that, under certain assumptions, is guaranteed to give releases that meet MPDP for any query with computable local sensitivity. I demon- strate, using several well-known queries, that LSRS releases have much higher utility than standard differentially private release mechanism, the Laplace Mechanism, at only marginally higher risk. In Chapter 3, using to synthesis models, I empirically characterize the risks of releasing synthetic data under the standard “all but one” assumption on intruder background knowledge, as well the effect decreasing the number of observations the intruder knows beforehand has on that risk. I find in these examples that even in the “all but one” case, there is no risk except to extreme outliers, and even then the risk is mild. I find that the effect of removing observations from an intruder’s background knowledge has on risk heavily depends on how well that intruder can fill in those missing observations: the risk remains fairly constant if he/she can fill them in well, and the risk drops quickly if he/she cannot. In Chapter 4, I characterize the risk/utility tradeoffs for an augmentation of synthetic data called fidelity measures (see Section 1.2.3). Fidelity measures were proposed in Reiter et al. (2009) to quantify the degree to which the results of an analysis performed on a released synthetic dataset match with the results of the same analysis performed on the confidential data. I compare the risk/utility of two different fidelity measures, the confidence interval overlap (Karr et al., 2006) and a new fidelity measure I call the mean predicted probability difference (MPPD). Simultaneously, I compare the risk/utility tradeoffs of two different private release mechanisms, LSRS and a heuristic release method called “safety zones”. I find that the confidence interval overlap can be applied to a wider variety of analyses and is more specific than MPPD, but MPPD is more robust to the influence of individual observations in the confidential data, which means it can be released with less noise than the confidence interval overlap with the same level of risk. I also find that while safety zones are much simpler to compute and generally have good utility (whereas the utility of LSRS depends on the value of ε), it is also much more vulnerable to context specific attacks that, while not easy for an intruder to implement, are difficult to anticipate. JF - Statistics Department PB - Duke University VL - PhD UR - http://hdl.handle.net/10161/11365 ER - TY - CONF T1 - The Role of Device Type and Respondent Characteristics in Internet Panel Survey Breakoff T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Allan L. McCutcheon JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - The SAR model for very large datasets: A reduced-rank approach JF - Econometrics Y1 - 2015 A1 - Burden, S. A1 - Cressie, N. A1 - Steel, D.G. VL - 3 UR - http://www.mdpi.com/2225-1146/3/2/317 IS - 2 ER - TY - JOUR T1 - Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples JF - Political Analysis Y1 - 2015 A1 - Y. Si A1 - J.P. Reiter A1 - D.S. Hillygus VL - 23 UR - http://pan.oxfordjournals.org/cgi/reprint/mpu009?%20ijkey=joX8eSl6gyIlQKP&keytype=ref ER - TY - JOUR T1 - Simultaneous Edit-Imputation for Continuous Microdata JF - Journal of the American Statistical Association Y1 - 2015 A1 - Kim, H. J. A1 - Cox, L. H. A1 - Karr, A. F. A1 - Reiter, J. P. A1 - Wang, Q. VL - 110 UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881 ER - TY - JOUR T1 - Small Area Estimation via Multivariate Fay-Herriot Models With Latent Spatial Dependence JF - Australian & New Zealand Journal of Statistics Y1 - 2015 A1 - Porter, A.T. A1 - Wikle, C.K. A1 - Holan, S.H. VL - 57 UR - http://arxiv.org/abs/1310.7211 ER - TY - JOUR T1 - Spatio-temporal change of support with application to American Community Survey multi-year period estimates JF - Stat Y1 - 2015 A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. A1 - Holan, Scott H. KW - Bayesian KW - change-of-support KW - dynamical KW - hierarchical models KW - mixed-effects model KW - Moran's I KW - multi-year period estimate AB - We present hierarchical Bayesian methodology to perform spatio-temporal change of support (COS) for survey data with Gaussian sampling errors. This methodology is motivated by the American Community Survey (ACS), which is an ongoing survey administered by the US Census Bureau that provides timely information on several key demographic variables. The ACS has published 1-year, 3-year, and 5-year period estimates, and margins of errors, for demographic and socio-economic variables recorded over predefined geographies. The spatio-temporal COS methodology considered here provides data users with a way to estimate ACS variables on customized geographies and time periods while accounting for sampling errors. Additionally, 3-year ACS period estimates are to be discontinued, and this methodology can provide predictions of ACS variables for 3-year periods given the available period estimates. The methodology is based on a spatio-temporal mixed-effects model with a low-dimensional spatio-temporal basis function representation, which provides multi-resolution estimates through basis function aggregation in space and time. This methodology includes a novel parameterization that uses a target dynamical process and recently proposed parsimonious Moran's I propagator structures. Our approach is demonstrated through two applications using public-use ACS estimates and is shown to produce good predictions on a hold-out set of 3-year period estimates. Copyright © 2015 John Wiley & Sons, Ltd. VL - 4 UR - http://dx.doi.org/10.1002/sta4.94 ER - TY - JOUR T1 - Statistical Disclosure Limitation in the Presence of Edit Rules JF - Journal of Official Statistics Y1 - 2015 A1 - Kim, H.J. A1 - Karr, A.F. A1 - Reiter, J.P. VL - 31 ER - TY - JOUR T1 - A stochastic bioenergetics model based approach to translating large river flow and temperature in to fish population responses: the pallid sturgeon example JF - Geological Society Y1 - 2015 A1 - Wildhaber, M.L. A1 - Dey, R. A1 - Wikle, C.K. A1 - Anderson, C.J. A1 - Moran, E.H. A1 - Franz, K.J. VL - 408 ER - TY - JOUR T1 - Stop or continue data collection: A nonignorable missing data approach for continuous variables JF - ArXiv Y1 - 2015 A1 - T. Paiva A1 - J.P. Reiter KW - Methodology AB - We present an approach to inform decisions about nonresponse followup sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents' data, and adjust the probabilities of the mixture components to generate nonrespondents' distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures. UR - http://arxiv.org/abs/1511.02189 IS - 1511.02189 ER - TY - JOUR T1 - Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach JF - Annals of the Association of American Geographers Y1 - 2015 A1 - Seth E. Spielman A1 - Alex Singleton AB - In 2010 the American Community Survey (ACS) replaced the long form of the decennial census as the sole national source of demographic and economic data for small geographic areas such as census tracts. These small area estimates suffer from large margins of error, however, which makes the data difficult to use for many purposes. The value of a large and comprehensive survey like the ACS is that it provides a richly detailed, multivariate, composite picture of small areas. This article argues that one solution to the problem of large margins of error in the ACS is to shift from a variable-based mode of inquiry to one that emphasizes a composite multivariate picture of census tracts. Because the margin of error in a single ACS estimate, like household income, is assumed to be a symmetrically distributed random variable, positive and negative errors are equally likely. Because the variable-specific estimates are largely independent from each other, when looking at a large collection of variables these random errors average to zero. This means that although single variables can be methodologically problematic at the census tract scale, a large collection of such variables provides utility as a contextual descriptor of the place(s) under investigation. This idea is demonstrated by developing a geodemographic typology of all U.S. census tracts. The typology is firmly rooted in the social scientific literature and is organized around a framework of concepts, domains, and measures. The typology is validated using public domain data from the City of Chicago and the U.S. Federal Election Commission. The typology, as well as the data and methods used to create it, is open source and published freely online. VL - 105 UR - http://dx.doi.org/10.1080/00045608.2015.1052335 ER - TY - CONF T1 - Survey Informatics: The Future of Survey Methodology and Survey Statistics Training in the Academy? T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Allan L. McCutcheon JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Synthetic Establishment Microdata Around the World Y1 - 2015 A1 - Vilhuber, Lars A1 - Abowd, John A. A1 - Reiter, Jerome P. AB - Synthetic Establishment Microdata Around the World Vilhuber, Lars; Abowd, John A.; Reiter, Jerome P. In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business micro data is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic establishment microdata. This overview situates those papers, published in this issue, within the broader literature. PB - Cornell University UR - http://hdl.handle.net/1813/42340 ER - TY - JOUR T1 - Understanding the Dynamics of $2-a-Day Poverty in the United States JF - The Russell Sage Foundation Journal of the Social Sciences Y1 - 2015 A1 - Shaefer, H. Luke A1 - Edin, Kathryn A1 - Talbert, E. VL - 1 IS - Severe Deprivation ER - TY - JOUR T1 - Understanding the Human Condition through Survey Informatics JF - IEEE Computer Y1 - 2015 A1 - Eck, A. A1 - Leen-Kiat, S. A1 - McCutcheon, A. L. A1 - Smyth, J.D. A1 - Belli, R.F. VL - 48 IS - 11 ER - TY - CONF T1 - The Use of Paradata to Evaluate Interview Complexity and Data Quality (in Calendar and Time Diary Surveys) T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Cordova-Cazar, A.L. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Using Data Mining to Examine Interviewer-Respondent Interactions in Calendar Interviews T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L.-K. A1 - T. Al Baghal JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Using Machine Learning Techniques to Predict Respondent Type from A Priori Demographic Information T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Wettlaufer, D. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics Y1 - 2015 A1 - Vilhuber, Lars A1 - Miranda, Javier AB - Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics Vilhuber, Lars; Miranda, Javier We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions). PB - Cornell University UR - http://hdl.handle.net/1813/42339 ER - TY - CONF T1 - Web Surveys, Online Panels, and Paradata: Automating Responsive Design T2 - 2015 Joint Program in Survey Methodology (JPSM) Distinguished Lecture Y1 - 2015 A1 - Allan L. McCutcheon JF - 2015 Joint Program in Survey Methodology (JPSM) Distinguished Lecture CY - University of Maryland. College Park, MD UR - http://www.jpsm.umd.edu/ ER - TY - JOUR T1 - Who’s Left Out? Characteristics of Households in Economic Need not Receiving Public Support JF - Journal of Sociology and Social Welfare Y1 - 2015 A1 - Fusaro, V. VL - 42 IS - 3 ER - TY - CONF T1 - Why Do Interviewers Speed Up? An Examination of Changes in Interviewer Behaviors over the Course of the Survey Field Period T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Olson, K. A1 - Smyth, J.D. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Achieving balance: Understanding the relationship between complexity and response quality T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Powell, R.J. A1 - Kirchner, A. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Agent Based Models: Statistical Challenges and Opportunities JF - Statistics Views Y1 - 2014 A1 - Wikle, C.K. PB - Wiley UR - http://www.statisticsviews.com/details/feature/6354691/Agent-Based-Models-Statistical-Challenges-and-Opportunities.html ER - TY - CHAP T1 - Analytical frameworks for data release: A statistical view T2 - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches Y1 - 2014 A1 - A. F. Karr A1 - J. P. Reiter JF - Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches PB - Cambridge University Press CY - New York City, NY ER - TY - ABST T1 - An Approach for Identifying and Predicting Economic Recessions in Real-Time Using Time-Frequency Functional Models, Seminar on Bayesian Inference in Econometrics and Statistics (SBIES) Y1 - 2014 A1 - Holan, S.H. ER - TY - CONF T1 - An Approach for Identifying and Predicting Economic Recessions in Real-Time Using Time-Frequency Functional Models T2 - Joint Statistical Meetings 2014 Y1 - 2014 A1 - Holan, S.H. JF - Joint Statistical Meetings 2014 PB - Joint Statistical Meetings CY - Boston, MA UR - http://www.amstat.org/meetings/jsm/2014/onlineprogram/AbstractDetails.cfm?abstractid=310841 ER - TY - JOUR T1 - Asymptotic Theory of Cepstral Random Fields JF - Annals of Statistics Y1 - 2014 A1 - McElroy, T. A1 - Holan, S. PB - University of Missouri VL - 42 UR - http://arxiv.org/pdf/1112.1977v4.pdf ER - TY - CHAP T1 - Autobiographical memory dynamics in survey research T2 - SAGE Handbook of Applied Memory Y1 - 2014 A1 - Belli, R. F. ED - T. J. Perfect ED - D. S. Lindsay JF - SAGE Handbook of Applied Memory PB - Sage UR - http://dx.doi.org/10.4135/9781446294703 ER - TY - ABST T1 - A Bayesian Approach to Estimating Agricultural Yield Based on Multiple Repeated Surveys Y1 - 2014 A1 - Holan, S.H. ER - TY - CONF T1 - Bayesian Dynamic Time-Frequency Estimation T2 - Twelfth World Meeting of ISBA Y1 - 2014 A1 - Holan, S.H. JF - Twelfth World Meeting of ISBA PB - ISBA CY - Cancun, Mexico ER - TY - JOUR T1 - Bayesian estimation of disclosure risks for multiply imputed, synthetic data JF - Journal of Privacy and Confidentiality Y1 - 2014 A1 - Reiter, J. P. A1 - Wang, Q. A1 - Zhang, B. AB - Agencies seeking to disseminate public use microdata, i.e., data on individual records, can replace confidential values with multiple draws from statistical models estimated with the collected data. We present a famework for evaluating disclosure risks inherent in releasing multiply-imputed, synthetic data. The basic idea is to mimic an intruder who computes posterior distributions of confidential values given the released synthetic data and prior knowledge. We illustrate the methodology with artificial fully synthetic data and with partial synthesis of the Survey of Youth in Custody. VL - 6 UR - http://repository.cmu.edu/jpc/vol6/iss1/2 IS - 1 ER - TY - JOUR T1 - Bayesian estimation of discrete multivariate latent structure models with structural zeros JF - Journal of Computational and Graphical Statistics Y1 - 2014 A1 - Manrique-Vallier, D. A1 - Reiter, J.P. VL - 23 ER - TY - JOUR T1 - Bayesian multiple imputation for large-scale categorical data with structural zeros JF - Survey Methodology Y1 - 2014 A1 - D. Manrique-Vallier A1 - J.P. Reiter VL - 40 UR - http://www.stat.duke.edu/ jerry/Papers/SurvMeth14.pdf ER - TY - RPRT T1 - Bayesian Nonparametric Modeling for Multivariate Ordinal Regression Y1 - 2014 A1 - DeYoreo, M. A1 - Kottas, A. KW - Statistics - Methodology AB - Univariate or multivariate ordinal responses are often assumed to arise from a latent continuous parametric distribution, with covariate effects which enter linearly. We introduce a Bayesian nonparametric modeling approach for univariate and multivariate ordinal regression, which is based on mixture modeling for the joint distribution of latent responses and covariates. The modeling framework enables highly flexible inference for ordinal regression relationships, avoiding assumptions of linearity or additivity in the covariate effects. In standard parametric ordinal regression models, computational challenges arise from identifiability constraints and estimation of parameters requiring nonstandard inferential techniques. A key feature of the nonparametric model is that it achieves inferential flexibility, while avoiding these difficulties. In particular, we establish full support of the nonparametric mixture model under fixed cut-off points that relate through discretization the latent continuous responses with the ordinal responses. The practical utility of the modeling approach is illustrated through application to two data sets from econometrics, an example involving regression relationships for ozone concentration, and a multirater agreement problem. PB - ArXiv UR - http://arxiv.org/abs/1408.1027 ER - TY - ABST T1 - Big Data Methodology Applied to Small Area Estimation Y1 - 2014 A1 - Porter, A.T. ER - TY - CONF T1 - Call back later: The association of recruitment contact and error in the American Time Use Survey T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Countryman, A. A1 - Cordova-Cazar, A.L. A1 - Deal, C.E. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - A CAR model for multiple outcomes on mismatched lattices JF - Spatial and Spatio-Temporal Epidemiology Y1 - 2014 A1 - Porter, A.T. A1 - Oleson, J. VL - 11 UR - http://www.sciencedirect.com/science/article/pii/S1877584514000604 ER - TY - JOUR T1 - Causes and Patterns of Uncertainty in the American Community Survey JF - Applied Geography Y1 - 2014 A1 - Spielman, S. E. A1 - Folch, D. A1 - Nagle, N. VL - 46 UR - http://www.sciencedirect.com/science/article/pii/S0143622813002518 ER - TY - RPRT T1 - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository Y1 - 2014 A1 - Lagoze, Carl A1 - Vilhuber, Lars A1 - Williams, Jeremy A1 - Perry, Benjamin A1 - Block, William C. AB - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository Lagoze, Carl; Vilhuber, Lars; Williams, Jeremy; Perry, Benjamin; Block, William C. We describe the design, implementation, and deployment of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR). This is a metadata repository system that allows researchers to search, browse, access, and cite confidential data and metadata through either a web-based user interface or programmatically through a search API, all the while re-reusing and linking to existing archive and provider generated metadata. CED 2 AR is distinguished from other metadata repository-based applications due to requirements that derive from its social science context. These include the need to cloak confidential data and metadata and manage complex provenance chains Presented at 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Sept 8-12, 2014 PB - Cornell University UR - http://hdl.handle.net/1813/44702 ER - TY - RPRT T1 - The Cepstral Model for Multivariate Time Series: The Vector Exponential Model. Y1 - 2014 A1 - Holan, S.H. A1 - McElroy, T.S. A1 - Wu, G. AB - Vector autoregressive (VAR) models have become a staple in the analysis of multivariate time series and are formulated in the time domain as difference equations, with an implied covariance structure. In many contexts, it is desirable to work with a stable, or at least stationary, representation. To fit such models, one must impose restrictions on the coefficient matrices to ensure that certain determinants are nonzero; which, except in special cases, may prove burdensome. To circumvent these difficulties, we propose a flexible frequency domain model expressed in terms of the spectral density matrix. Specifically, this paper treats the modeling of covariance stationary vector-valued (i.e., multivariate) time series via an extension of the exponential model for the spectrum of a scalar time series. We discuss the modeling advantages of the vector exponential model and its computational facets, such as how to obtain Wold coefficients from given cepstral coefficients. Finally, we demonstrate the utility of our approach through simulation as well as two illustrative data examples focusing on multi-step ahead forecasting and estimation of squared coherence. PB - arXiv UR - http://arxiv.org/abs/1406.0801 ER - TY - CONF T1 - Changes in interviewer-related error over the course of the field period: An empirical examination using paradata T2 - Joint Statistical Meetings Y1 - 2014 A1 - Olson, K. A1 - Kirchner, A. JF - Joint Statistical Meetings CY - Boston, MA ER - TY - CONF T1 - Changes in interviewer-related error over the course of the field period: An empirical examination using paradata T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Olson, K. A1 - Kirchner, A. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - The Co-Evolution of Residential Segregation and the Built Environment at the Turn of the 20th Century: a Schelling Model JF - Transactions in GIS Y1 - 2014 A1 - Spielman, S. E. A1 - Harrison, P. VL - 18 UR - http://onlinelibrary.wiley.com/enhanced/doi/10.1111/tgis.12014/ ER - TY - RPRT T1 - Collaborative Editing of DDI Metadata: The Latest from the CED2AR Project Y1 - 2014 A1 - Perry, Benjamin A1 - Kambhampaty, Venkata A1 - Brumsted, Kyle A1 - Vilhuber, Lars A1 - Block, William AB - Collaborative Editing of DDI Metadata: The Latest from the CED2AR Project Perry, Benjamin; Kambhampaty, Venkata; Brumsted, Kyle; Vilhuber, Lars; Block, William Benjamin Perry's presentation on "Collaborative Editing and Versioning of DDI Metadata: The Latest from Cornell's NCRN CED²AR Software" at the 6th Annual European DDI User Conference in London, 12/02/2014. PB - Cornell University UR - http://hdl.handle.net/1813/38200 ER - TY - CONF T1 - Commitment, concealment, and confusion: An empirical assessment of interviewer and respondent behaviors in survey interviews T2 - 39th Annual Conference of the Midwest Association for Public Opinion Research Y1 - 2014 A1 - Kirchner, A. A1 - Olson, K. JF - 39th Annual Conference of the Midwest Association for Public Opinion Research CY - Chicago, IL UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Communicating Uncertainty in Official Economic Statistics Y1 - 2014 A1 - Manski, Charles AB - Communicating Uncertainty in Official Economic Statistics Manski, Charles Federal statistical agencies in the United States and analogous agencies elsewhere commonly report official economic statistics as point estimates, without accompanying measures of error. Users of the statistics may incorrectly view them as error-free or may incorrectly conjecture error magnitudes. This paper discusses strategies to mitigate misinterpretation of official statistics by communicating uncertainty to the public. Sampling error can be measured using established statistical principles. The challenge is to satisfactorily measure the various forms of nonsampling error. I find it useful to distinguish transitory statistical uncertainty, permanent statistical uncertainty, and conceptual uncertainty. I illustrate how each arises as the Bureau of Economic Analysis periodically revises GDP estimates, the Census Bureau generates household income statistics from surveys with nonresponse, and the Bureau of Labor Statistics seasonally adjusts employment statistics. PB - Northwestern University UR - http://hdl.handle.net/1813/36323 ER - TY - RPRT T1 - Communicating Uncertainty in Official Economic Statistics: An Appraisal Fifty Years after Morgenstern Y1 - 2014 A1 - Manski, Charles F. AB - Communicating Uncertainty in Official Economic Statistics: An Appraisal Fifty Years after Morgenstern Manski, Charles F. Federal statistical agencies in the United States and analogous agencies elsewhere commonly report official economic statistics as point estimates, without accompanying measures of error. Users of the statistics may incorrectly view them as error-free or may incorrectly conjecture error magnitudes. This paper discusses strategies to mitigate misinterpretation of official statistics by communicating uncertainty to the public. Sampling error can be measured using established statistical principles. The challenge is to satisfactorily measure the various forms of nonsampling error. I find it useful to distinguish transitory statistical uncertainty, permanent statistical uncertainty, and conceptual uncertainty. I illustrate how each arises as the Bureau of Economic Analysis periodically revises GDP estimates, the Census Bureau generates household income statistics from surveys with nonresponse, and the Bureau of Labor Statistics seasonally adjusts employment statistics. I anchor my discussion of communication of uncertainty in the contribution of Morgenstern (1963), who argued forcefully for agency publication of error estimates for official economic statistics. PB - Northwestern University UR - http://hdl.handle.net/1813/40830 ER - TY - THES T1 - Comparing models of Demographic Subpopulations (Master's Thesis) Y1 - 2014 A1 - Moehl, J. PB - University of Tennessee UR - http://trace.tennessee.edu/utk_gradthes/2835/; http://trace.tennessee.edu/cgi/viewcontent.cgi?article=4005&context=utk_gradthes ER - TY - CHAP T1 - A Comparison of Blocking Methods for Record Linkage T2 - Privacy in Statistical Databases Y1 - 2014 A1 - Steorts, R. A1 - Ventura, S. A1 - Sadinle, M. A1 - Fienberg, S. E. A1 - Domingo-Ferrer, J. JF - Privacy in Statistical Databases PB - Springer VL - 8744 UR - http://link.springer.com/chapter/10.1007/978-3-319-11257-2_20 ER - TY - JOUR T1 - A Comparison of Spatial Predictors when Datasets Could be Very Large JF - ArXiv Y1 - 2014 A1 - Bradley, J. R. A1 - Cressie, N. A1 - Shi, T. KW - Statistics - Methodology AB - In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of CO2 data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data. UR - http://arxiv.org/abs/1410.7748 IS - 1410.7748 ER - TY - JOUR T1 - Dasymetric Modeling and Uncertainty JF - The Annals of the Association of American Geographers Y1 - 2014 A1 - Nagle, N. A1 - Buttenfield, B. A1 - Leyk, S. A1 - Spielman, S. E. VL - 104 UR - http://www.tandfonline.com/doi/abs/10.1080/00045608.2013.843439 ER - TY - THES T1 - Data Fusion Methods for Improved Demographic Resolution of Population Distribution Datasets (Ph.D. Thesis) Y1 - 2014 A1 - Rose, A. PB - University of Tennessee ER - TY - CONF T1 - Data Quality among Devices to Complete Surveys: Comparing Personal Computers, Smartphones and Tablets T2 - Midwest Association for Public Opinion Research Annual Meeting Y1 - 2014 A1 - Wang, Mengyang A1 - McCutcheon, Allan L. JF - Midwest Association for Public Opinion Research Annual Meeting CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - JOUR T1 - Deprivation Among U.S. Children With Disabilities Who Receive Supplemental Security Income JF - Journal of Disability Policy Studies Y1 - 2014 A1 - Ghosth, S. A1 - Parish, S. L. ER - TY - CONF T1 - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R.F. JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - CONF T1 - Designing an Intelligent Time Diary Instrument: Visualization, Dynamic Feedback, and Error Prevention and Mitigation T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA. UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach JF - Annals of Applied Statistics Y1 - 2014 A1 - Sadinle, M. VL - 8 ER - TY - CHAP T1 - Disclosure risk evaluation for fully synthetic data T2 - Privacy in Statistical Databases Y1 - 2014 A1 - J. Hu A1 - J.P. Reiter A1 - Q. Wang JF - Privacy in Statistical Databases PB - Springer CY - Heidelberg VL - 8744 ER - TY - JOUR T1 - The Economics of Privacy JF - Journal of Economic Literature Y1 - 2014 A1 - Acquisti, A. A1 - Taylor, C. N1 - Commissioned article. To appear ER - TY - CONF T1 - The Effect of CATI Questionnaire Design Features on Response Timing T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Olson, K. A1 - Smyth, Jolene JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - The effects of unfamiliar terms on interviewer and respondent behaviors: Are subsequent questions affected? T2 - Paper presented at the Midwest Association for Public Opinion Research annual meeting Y1 - 2014 A1 - Lee, J. A1 - Olson, K. JF - Paper presented at the Midwest Association for Public Opinion Research annual meeting CY - Chicago, IL UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CHAP T1 - Enabling statistical analysis of suppressed tabular data, in Privacy in Statistical Databases T2 - Lecture Notes in Computer Science Y1 - 2014 A1 - L. Cox JF - Lecture Notes in Computer Science PB - Springer CY - Heidelberg VL - 8744 ER - TY - JOUR T1 - Entity Resolution with Empirically Motivated Priors JF - ArXiv Y1 - 2014 A1 - Steorts, R. C. KW - Statistics - Methodology AB - Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed records to unobserved latent entities. Bayesian approaches provide attractive benefits, naturally providing uncertainty quantification via posterior probabilities. We propose a novel record linkage approach based on empirical Bayesian principles. Specifically, the empirical Bayesian--type step consists of taking the empirical distribution function of the data as the prior for the latent entities. This approach improves on the earlier HB approach not only by avoiding the prior specification problem but also by allowing both categorical and string-valued variables. Our extension to string-valued variables also involves the proposal of a new probabilistic mechanism by which observed record values for string fields can deviate from the values of their associated latent entities. Categorical fields that deviate from their corresponding true value are simply drawn from the empirical distribution function. We apply our proposed methodology to a simulated data set of German names and an Italian household survey, showing our method performs favorably compared to several standard methods in the literature. We also consider the robustness of our methods to changes in the hyper-parameters. UR - http://arxiv.org/abs/1409.0643 IS - 1409.0643 ER - TY - CONF T1 - Fast Estimation of Time Series with Multiple Long-Range Persistencies T2 - ASA Proceedings of the Joint Statistical Meetings Y1 - 2014 A1 - McElroy, T.S. A1 - Holan, S.H. JF - ASA Proceedings of the Joint Statistical Meetings PB - American Statistical Association CY - Alexandria, VA ER - TY - CONF T1 - Flexible Bayesian Methodology for Multivariate Spatial Small Area Estimation T2 - Joint Statistical Meetings 2014 Y1 - 2014 A1 - Porter, A.T. JF - Joint Statistical Meetings 2014 CY - Boston, MA ER - TY - RPRT T1 - Flexible prior specification for partially identified nonlinear regression with binary responses Y1 - 2014 A1 - P. R. Hahn A1 - J. S. Murray A1 - I. Manolopoulou AB - This paper adapts tree-based Bayesian regression models for estimating a partially identified probability function. In doing so, ideas from the recent literature on Bayesian partial identification are applied within a sophisticated applied regression context. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors over the partially identified component of the regression model. The new methodology is illustrated on an important problem where we only have partially observed data -- inferring the prevalence of accounting misconduct among publicly traded U.S. businesses. PB - arXiv UR - https://arxiv.org/abs/1407.8430v1 IS - 1407.8430 ER - TY - CONF T1 - A Fully Bayesian Approach for Generating Synthetic Marks and Geographies for Confidential Data T2 - International Indian Statistical Association Y1 - 2014 A1 - Quick, H. JF - International Indian Statistical Association PB - IISA ER - TY - JOUR T1 - The generalized multiset sampler JF - Journal of Computational and Graphical Statistics Y1 - 2014 A1 - H. Kim A1 - S. N. MacEachern UR - http://dx.doi.org/10.1080/10618600.2014.962701 ER - TY - CONF T1 - ‘Good Respondent, Bad Respondent’? Assessing Response Quality in Internet Surveys T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Kirchner, A. A1 - Powell, R. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Harnessing Naturally Occurring Data to Measure the Response of Spending to Income JF - Science Y1 - 2014 A1 - Gelman, M. A1 - Kariv, S. A1 - Shapiro, M.D. A1 - Silverman, D. A1 - Tadelis, S. AB - This paper presents a new data infrastructure for measuring economic activity. The infrastructure records transactions and account balances, yielding measurements with scope and accuracy that have little precedent in economics. The data are drawn from a diverse population that overrepresents males and younger adults but contains large numbers of underrepresented groups. The data infrastructure permits evaluation of a benchmark theory in economics that predicts that individuals should use a combination of cash management, saving, and borrowing to make the timing of income irrelevant for the timing of spending. As in previous studies and in contrast to the predictions of the theory, there is a response of spending to the arrival of anticipated income. The data also show, however, that this apparent excess sensitivity of spending results largely from the coincident timing of regular income and regular spending. The remaining excess sensitivity is concentrated among individuals with less liquidity. Link to data at Berkeley Econometrics Lab (EML): https://eml.berkeley.edu/cgi-bin/HarnessingDataScience2014.cgi VL - 345 UR - http://www.sciencemag.org/content/345/6193/212.full IS - 11 ER - TY - CONF T1 - Having a Lasting Impact: The Effects of Interviewer Errors on Data Quality T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - Timm, A. A1 - Olson, K. A1 - Smyth, J.D. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - CHAP T1 - Hierarchical Linkage Clustering with Distributions of Distances for Large Scale Record Linkage T2 - Privacy in Statistical Databases (Lecture Notes in Computer Science Y1 - 2014 A1 - Ventura, S. A1 - Nugent, R. A1 - Fuchs, E. ED - Domingo-Ferrer, J. JF - Privacy in Statistical Databases (Lecture Notes in Computer Science PB - Springer VL - 8744 ER - TY - CONF T1 - Hours or Minutes: Does One Unit Fit All? T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - Cochran, B. A1 - Smyth, J.D. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - ICOMM T1 - How to Make a Better Map—Using Neuroscience Y1 - 2014 A1 - Laura Bliss KW - Nicholas Nagle KW - Seth Spielman AB - The work of Seth Spielman and Nicholas Nagle was noted in this article in City Lab, a publication from The Atlantic magazine, available at http://www.citylab.com/design/2014/11/how-to-make-a-better-map-according-to-science/382898/. PB - Citylab UR - http://www.citylab.com/design/2014/11/how-to-make-a-better-map-according-to-science/382898/ ER - TY - JOUR T1 - I Cheated, but only a Little–Partial Confessions to Unethical Behavior JF - Journal of Personality and Social Psychology Y1 - 2014 A1 - Peer, E. A1 - Acquisti, A. A1 - Shalvi, S. VL - 106 ER - TY - JOUR T1 - Identifying Regions based on Flexible User Defined Constraints JF - International Journal of Geographic Information Science Y1 - 2014 A1 - Folch, D. A1 - Spielman, S. E. VL - 28 UR - http://www.tandfonline.com/doi/abs/10.1080/13658816.2013.848986 ER - TY - JOUR T1 - Imputation of confidential data sets with spatial locations using disease mapping models JF - Statistics in Medicine Y1 - 2014 A1 - T. Paiva A1 - A. Chakraborty A1 - J.P. Reiter A1 - A.E. Gelfand VL - 33 ER - TY - RPRT T1 - Interval Estimates for Official Statistics with Survey Nonresponse Y1 - 2014 A1 - Manski, C. ER - TY - CONF T1 - Interviewer variance and prevalence of verbal behaviors in calendar and conventional interviewing T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Belli, R.F. A1 - Charoenruk, N., JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Interviewer variance of interviewer and respondent behaviors: A comparison between calendar and conventional interviewing T2 - XVIII International Sociological Association World Congress of Sociology Y1 - 2014 A1 - Belli, R.F. A1 - Charoenruk, N., JF - XVIII International Sociological Association World Congress of Sociology CY - Yokohama, Japan UR - https://isaconf.confex.com/isaconf/wc2014/webprogram/Paper34278.html ER - TY - JOUR T1 - Longitudinal mixed membership trajectory models for disability survey data JF - Annals of Applied Statistics Y1 - 2014 A1 - Manrique-Vallier, D VL - 8 ER - TY - CONF T1 - Making sense of paradata: Challenges faced and lessons learned T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Eck, A. A1 - Stuart, L. A1 - Atkin, G. A1 - Soh, L-K A1 - McCutcheon, A.L. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Making Sense of Paradata: Challenges Faced and Lessons Learned T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Eck, A. A1 - Stuart, L. A1 - Atkin, G. A1 - Soh, L-K A1 - McCutcheon, A.L. A1 - Belli, R.F. JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - JOUR T1 - Multiple imputation by ordered monotone blocks with application to the Anthrax Vaccine Adsorbed Trial JF - Journal of Computational and Graphical Statistics Y1 - 2014 A1 - Li, Fan A1 - Baccini, Michela A1 - Mealli, Fabrizia A1 - Zell, Elizabeth R. A1 - Frangakis, Constantine E. A1 - Rubin, Donald B VL - 23 UR - http://www.tandfonline.com/doi/abs/10.1080/10618600.2013.826583 ER - TY - THES T1 - Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies (Ph.D. thesis) T2 - Department of Statistical Sciences Y1 - 2014 A1 - Thais Paiva JF - Department of Statistical Sciences PB - Duke University VL - Ph.D. UR - http://dukespace.lib.duke.edu/dspace/handle/10161/9406 ER - TY - JOUR T1 - Multiple imputation of missing or faulty values under linear constraints JF - Journal of Business and Economic Statistics Y1 - 2014 A1 - Kim, H. J. A1 - Reiter, J. P. A1 - Wang, Q. A1 - Cox, L. H. A1 - Karr, A. F. AB - Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online. VL - 32 ER - TY - RPRT T1 - NCRN Meeting Fall 2014 Y1 - 2014 A1 - Vilhuber, Lars AB - NCRN Meeting Fall 2014 Vilhuber, Lars Taken place at the ILR NYC Conference Center. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45868 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography Y1 - 2014 A1 - Quick, Harrison A1 - Holan, Scott A1 - Wikle, Christopher A1 - Reiter, Jerry AB - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography Quick, Harrison; Holan, Scott; Wikle, Christopher; Reiter, Jerry Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37750 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the "Great Recession" Y1 - 2014 A1 - Wilson, Courtney A1 - Brown, Daniel G. AB - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the "Great Recession" Wilson, Courtney; Brown, Daniel G. Presentation at Fall 2014 NCRN meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37446 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation Y1 - 2014 A1 - Steorts, Rebecca A1 - Shalizi, Cosma AB - NCRN Meeting Fall 2014: Constrained Smoothed Bayesian Estimation Steorts, Rebecca; Shalizi, Cosma Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37748 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth Y1 - 2014 A1 - Dunn, Abe A1 - Liebman, Eli A1 - Shapiro, Adam AB - NCRN Meeting Fall 2014: Decomposing Medical-Care Expenditure Growth Dunn, Abe; Liebman, Eli; Shapiro, Adam PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37411 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Designer Census Geographies Y1 - 2014 A1 - Spielman, Seth AB - NCRN Meeting Fall 2014: Designer Census Geographies Spielman, Seth Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37747 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Geographic linkages between National Center for Health Statistics’ population health surveys and air quality measures Y1 - 2014 A1 - Parker, Jennifer AB - NCRN Meeting Fall 2014: Geographic linkages between National Center for Health Statistics’ population health surveys and air quality measures Parker, Jennifer PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37412 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data Y1 - 2014 A1 - Bradley, Jonathan A1 - Holan, Scott A1 - Wikle, Christopher AB - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data Bradley, Jonathan; Holan, Scott; Wikle, Christopher Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37749 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System Y1 - 2014 A1 - Spiller, Michael (Trey) AB - NCRN Meeting Fall 2014: Respondent-Driven Sampling Estimation and the National HIV Behavioral Surveillance System Spiller, Michael (Trey) PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37414 ER - TY - RPRT T1 - NCRN Meeting Spring 2014 Y1 - 2014 A1 - Vilhuber, Lars AB - NCRN Meeting Spring 2014 Vilhuber, Lars Taken place at the Census Headquarters, Washington, DC. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45869 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Adaptive Protocols and the DDI 4 Process Model Y1 - 2014 A1 - Greenfield, Jay A1 - Kuan, Sophia AB - NCRN Meeting Spring 2014: Adaptive Protocols and the DDI 4 Process Model Greenfield, Jay; Kuan, Sophia Presentation from NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36393 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Aiming at a More Cost-Effective Census Via Online Data Collection: Privacy Trade-Offs of Geo-Location Y1 - 2014 A1 - Brandimarte, Laura A1 - Acquisti, Alessandro AB - NCRN Meeting Spring 2014: Aiming at a More Cost-Effective Census Via Online Data Collection: Privacy Trade-Offs of Geo-Location Brandimarte, Laura; Acquisti, Alessandro presentation at NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36397 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Imputation of multivariate continuous data with non-ignorable missingness Y1 - 2014 A1 - Paiva, Thais A1 - Reiter, Jerry AB - NCRN Meeting Spring 2014: Imputation of multivariate continuous data with non-ignorable missingness Paiva, Thais; Reiter, Jerry Presentation at Spring 2014 NCRN meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36399 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau Y1 - 2014 A1 - Block, William A1 - Brown, Warren A1 - Williams, Jeremy A1 - Vilhuber, Lars A1 - Lagoze, Carl AB - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau Block, William; Brown, Warren; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl presentation at NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36392 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Introduction Y1 - 2014 A1 - Thompson, John AB - NCRN Meeting Spring 2014: Introduction Thompson, John NCRN Spring 2014 Meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36395 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Metadata Standards & Technology Development for the NSF Survey of Earned Doctorates Y1 - 2014 A1 - Noonan, Kimberly A1 - Heus, Pascal A1 - Mulcahy, Tim AB - NCRN Meeting Spring 2014: Metadata Standards & Technology Development for the NSF Survey of Earned Doctorates Noonan, Kimberly; Heus, Pascal; Mulcahy, Tim Presentation from NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36394 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Research Program and Enterprise Architecture for Adaptive Survey Design at Census Y1 - 2014 A1 - Miller, Peter A1 - Mathur, Anup A1 - Thieme, Michael AB - NCRN Meeting Spring 2014: Research Program and Enterprise Architecture for Adaptive Survey Design at Census Miller, Peter; Mathur, Anup; Thieme, Michael PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36400 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL) Y1 - 2014 A1 - Gathright, Graton A1 - Kutzbach, Mark A1 - Mccue, Kristin A1 - McEntarfer, Erika A1 - Monti, Holly A1 - Trageser, Kelly A1 - Vilhuber, Lars A1 - Wasi, Nada A1 - Wignall, Christopher AB - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL) Gathright, Graton; Kutzbach, Mark; Mccue, Kristin; McEntarfer, Erika; Monti, Holly; Trageser, Kelly; Vilhuber, Lars; Wasi, Nada; Wignall, Christopher Presentation for NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36396 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Web Surveys, Online Panels, and Paradata: Automating Adaptive Design Y1 - 2014 A1 - McCutcheon, Allan AB - NCRN Meeting Spring 2014: Web Surveys, Online Panels, and Paradata: Automating Adaptive Design McCutcheon, Allan Presentation at NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36398 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 2 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 2 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from November 2013 to March 2014. NCRN Newsletter Vol. 1, Issue 2: March 20, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40233 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 3 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 3 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from March 2014 to July 2014. NCRN Newsletter Vol. 1, Issue 3: July 23, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40234 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 4 Y1 - 2014 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 4 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from July 2014 to October 2014. NCRN Newsletter Vol. 1, Issue 4: October 15, 2014 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40192 ER - TY - RPRT T1 - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data Y1 - 2014 A1 - Schneider, Matthew J. A1 - Abowd, John M. AB - A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data Schneider, Matthew J.; Abowd, John M. Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators. PB - Cornell University UR - http://hdl.handle.net/1813/40828 ER - TY - Generic T1 - NewsViews: An Automated Pipeline for Creating Custom Geovisualizations for News Y1 - 2014 A1 - Gao, T. A1 - Hullman, J. A1 - Adar, E. A1 - Hect, B. A1 - Diakopoulos, N. AB - Interactive visualizations add rich, data-based context to online news articles. Geographic maps are currently the most prevalent form of these visualizations. Unfortunately, designers capable of producing high-quality, customized geovisualizations are scarce. We present NewsViews, a novel automated news visualization system that generates interactive, annotated maps without requiring professional designers. NewsViews’ maps support trend identification and data comparisons relevant to a given news article. The NewsViews system leverages text mining to identify key concepts and locations discussed in articles (as well as po-tential annotations), an extensive repository of “found” databases, and techniques adapted from cartography to identify and create visually “interesting” thematic maps. In this work, we develop and evaluate key criteria in automatic, annotated, map generation and experimentally validate the key features for successful representations (e.g., relevance to context, variable selection, "interestingness" of representation and annotation quality). UR - http://cond.org/newsviews.html ER - TY - JOUR T1 - The Past, Present, and Future of Geodemographic Research in the Unites States and United Kingdom JF - The Professional Geographer Y1 - 2014 A1 - Singleton, A. A1 - Spielman, S. E. VL - 4 ER - TY - CONF T1 - The Poisson Change of Support Problem with Applications to the American Community Survey T2 - Joint Statistical Meetings 2014 Y1 - 2014 A1 - Bradley, J.R. JF - Joint Statistical Meetings 2014 ER - TY - CONF T1 - Predicting Survey Breakoff in Online Survey Panels T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - McCutcheon, A.L. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Y1 - 2014 A1 - Spielman, Seth A1 - Folch, David AB - Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization Spielman, Seth; Folch, David The American Community Survey (ACS) is the largest US survey of households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article develops a spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to post-process survey data in order to reduce the margins of error to some user-specified threshold. PB - University of Colorado at Boulder / University of Tennessee UR - http://hdl.handle.net/1813/38121 ER - TY - CONF T1 - Remembering where: A look at the American Time Use Survey T2 - Paper presented at the annual conference of the Midwest Association for Public Opinion Research Y1 - 2014 A1 - Deal, C. A1 - Cordova-Cazar, A.L. A1 - Countryman, A. A1 - Kirchner, A. A1 - Belli, R.F. JF - Paper presented at the annual conference of the Midwest Association for Public Opinion Research CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - JOUR T1 - Reputation as a Sufficient Condition for Data Quality on Amazon Mechanical Turk JF - Behavior Research Methods Y1 - 2014 A1 - Peer, E. A1 - Vosgerau, J. A1 - Acquisti, A. VL - 46 ER - TY - CHAP T1 - The Rise of Incarceration Among the Poor with Mental Illnesses: How Neoliberal Policies Contribute T2 - The Routledge Handbook of Poverty in the United States Y1 - 2014 A1 - Camp, J. A1 - Haymes, S. A1 - Haymes, M. V. d. A1 - Miller, R.J. JF - The Routledge Handbook of Poverty in the United States PB - Routledge ER - TY - CONF T1 - The Role of Device Type in Internet Panel Survey Breakoff T2 - Midwest Association for Public Opinion Research Annual Conference Y1 - 2014 A1 - McCutcheon, A.L. JF - Midwest Association for Public Opinion Research Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - JOUR T1 - Savings from ages 16 to 35: A test to inform Child Development Account policy JF - Poverty & Public Policy Y1 - 2014 A1 - Friedline, T. A1 - Nam, I. VL - 6 UR - http://onlinelibrary.wiley.com/store/10.1002/pop4.59/asset/pop459.pdf IS - 1 ER - TY - JOUR T1 - Seeing the Non-Stars: (Some) Sources of Bias in Past Disambiguation Approaches and a New Public Tool Leveraging Labeled Records JF - Research Policy Y1 - 2014 A1 - Ventura, S. A1 - Nugent, R. A1 - Fuchs, E. N1 - Selected for Special Issue on Big Data ER - TY - ABST T1 - SIPP: From Conventional Questionnaire to Event History Calendar Interviewing Y1 - 2014 A1 - Belli, R.F. N1 - Workshop on ìConducting Research using the Survey of Income and Program Participation (SIPP). Presented at Duke University, Social Science Research Institute, Durham, NC ER - TY - CONF T1 - SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication T2 - AISTATS 2014 Proceedings, JMLR Y1 - 2014 A1 - Steorts, R. A1 - Hall, R. A1 - Fienberg, S. E. JF - AISTATS 2014 Proceedings, JMLR PB - W& CP VL - 33 ER - TY - RPRT T1 - Sorting Between and Within Industries: A Testable Model of Assortative Matching Y1 - 2014 A1 - Abowd, John M. A1 - Kramarz, Francis A1 - Perez-Duarte, Sebastien A1 - Schmutte, Ian M. AB - Sorting Between and Within Industries: A Testable Model of Assortative Matching Abowd, John M.; Kramarz, Francis; Perez-Duarte, Sebastien; Schmutte, Ian M. We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting–more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated. PB - Cornell University UR - http://hdl.handle.net/1813/52607 ER - TY - JOUR T1 - Spatial Collective Intelligence? Accuracy, Credibility in Crowdsourced Data JF - Cartography and Geographic Information Science Y1 - 2014 A1 - Spielman, S. E. VL - 41 UR - http://go.galegroup.com/ps/i.do?action=interpret&id=GALE|A361943563&v=2.1&u=nysl_sc_cornl&it=r&p=AONE&sw=w&authCount=1 IS - 2 ER - TY - ABST T1 - Spatial Fay-Herriot Models for Small Area Estimation With Functional Covariates Y1 - 2014 A1 - Holan, S.H. ER - TY - JOUR T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates JF - Spatial Statistics Y1 - 2014 A1 - Porter, A. T., A1 - Holan, S.H., A1 - Wikle, C.K., A1 - Cressie, N. VL - 10 UR - http://arxiv.org/pdf/1303.6668v3.pdf ER - TY - CONF T1 - Spiny CACTOS: OSN Users Attitudes and Perceptions Towards Cryptographic Access Control Tools T2 - Proceedings of the Workshop on Usable Security (USEC) Y1 - 2014 A1 - Balsa, E., A1 - Brandimarte, L., A1 - Acquisti, A., A1 - Diaz, C., A1 - Gürses, S. JF - Proceedings of the Workshop on Usable Security (USEC) UR - https://www.internetsociety.org/doc/spiny-cactos-osn-users-attitudes-and-perceptions-towards-cryptographic-access-control-tools ER - TY - CONF T1 - Supporting Planners' Work with Uncertain Demographic Data T2 - GIScience Workshop on Uncertainty Visualization Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Nagle, N. N. A1 - Folch, D. C. JF - GIScience Workshop on Uncertainty Visualization VL - 23 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf. ER - TY - CONF T1 - Supporting Planners' work with Uncertain Demographic Data T2 - Proceedings of IEEE VIS 2014 Y1 - 2014 A1 - Griffin, A. L. A1 - Spielman, S. E. A1 - Nagle, N. N. A1 - Jurjevich, J. A1 - Merrick, M. A1 - Folch, D. C. JF - Proceedings of IEEE VIS 2014 PB - Proceedings of IEEE VIS 2014 UR - http://cognitivegiscience.psu.edu/uncertainty2014/papers/griffin_demographic.pdf ER - TY - CONF T1 - Survey Fusion for Data that Exhibit Multivariate, Spatio-Temporal Dependencies T2 - Joint Statistical Meetings 2014 Y1 - 2014 A1 - Bradley, J.R. JF - Joint Statistical Meetings 2014 ER - TY - CONF T1 - Survey Informatics: Ideas, Opportunities, and Discussions T2 - UNL/SRAM/Gallup Symposium Y1 - 2014 A1 - Eck, A. A1 - Soh, L-K JF - UNL/SRAM/Gallup Symposium CY - Omaha, NE UR - http://grc.unl.edu/unlsramgallup-symposium ER - TY - ABST T1 - A Survey of Contemporary Spatial Models for Small Area Estimation Y1 - 2014 A1 - Porter, A.T. ER - TY - JOUR T1 - SynLBD 2.0: Improving the Synthetic Longitudinal Business Database JF - Statistical Journal of the International Association for Official Statistics Y1 - 2014 A1 - S. K. Kinney A1 - J. P. Reiter A1 - J. Miranda VL - 30 ER - TY - JOUR T1 - Top-Coding and Public Use Microdata Samples from the U.S. Census Bureau JF - Journal of Privacy and Confidentiality Y1 - 2014 A1 - Crimi, N. A1 - Eddy, W. C. VL - 6 UR - http://repository.cmu.edu/jpc/vol6/iss2/2/ ER - TY - JOUR T1 - Toward healthy balance sheets: Savings accounts as a gateway for young adults’ asset diversification and accumulation JF - The St. Louis Federal Reserve Bulletin Y1 - 2014 A1 - Friedline, T. A1 - Johnson, P. A1 - Hughes, R. UR - http://research.stlouisfed.org/publications/review/2014/q4/friedline.pdf ER - TY - THES T1 - Towards an Understanding of Dynamics Between Race, Population Movement, and the Built Environment of American Cities (undergraduate honors thesis) Y1 - 2014 A1 - Bellman, B. PB - University of Colorado at Boulder ER - TY - RPRT T1 - Twitter, Big Data, and Jobs Numbers Y1 - 2014 A1 - Hudomiet, Peter JF - LSA Today UR - http://www.lsa.umich.edu/lsa/ci.twitterbigdataandjobsnumbers_ci.detail ER - TY - RPRT T1 - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates Y1 - 2014 A1 - Folch, David C. A1 - Arribas-Bel, Daniel A1 - Koschinsky, Julia A1 - Spielman, Seth E. AB - Uncertain Uncertainty: Spatial Variation in the Quality of American Community Survey Estimates Folch, David C.; Arribas-Bel, Daniel; Koschinsky, Julia; Spielman, Seth E. The U.S. Census Bureau's American Community Survey (ACS) is the foundation of social science research, much federal resource allocation and the development of public policy and private sector decisions. However, the high uncertainty associated with some of the ACS's most frequently used estimates can jeopardize the accuracy of inferences based on these data. While there is high level understanding in the research community that problems exist in the data, the sources and implications of these problems have been largely overlooked. Using 2006-2010 ACS median household income at the census tract scale as the test case (where a third of small-area estimates have higher than recommend errors), we explore the patterns in the uncertainty of ACS data. We consider various potential sources of uncertainty in the data, ranging from response level to geographic location to characteristics of the place. We find that there exist systematic patterns in the uncertainty in both the spatial and attribute dimensions. Using a regression framework, we identify the factors that are most frequently correlated with the error at national, regional and metropolitan area scales, and find these correlates are not consistent across the various locations tested. The implication is that data quality varies in different places, making cross-sectional analysis both within and across regions less reliable. We also present general advice for data users and potential solutions to the challenges identified. PB - University of Colorado at Boulder / University of Tennessee UR - http://hdl.handle.net/1813/38122 ER - TY - CHAP T1 - The Untold Story of Multi-Mode (Online and Mail) Consumer Panels: From Optimal Recruitment to Retention and Attrition T2 - Online Panel Surveys: An Interdisciplinary Approach Y1 - 2014 A1 - McCutcheon, Allan L. A1 - Rao, K., A1 - Kaminska, O. ED - Callegaro, M. ED - Baker, R. ED - Bethlehem, J. ED - Göritz, A. ED - Krosnick, J. ED - Lavrakas, P. JF - Online Panel Surveys: An Interdisciplinary Approach PB - Wiley ER - TY - JOUR T1 - An updated method for calculating income and payroll taxes from PSID data using the NBER’s TAXSIM, for PSID survey years 1999 through 2011 JF - Unpublished manuscript, University of Michigan. Accessed May Y1 - 2014 A1 - Kimberlin, Sara A1 - Kim, Jiyoun A1 - Shaefer, Luke AB - This paper describes a method to calculate income and payroll taxes from Panel Study of Income Dynamics data using the NBERʼs Internet TAXSIM version 9 (http://users.nber.org/~taxsim/taxsim9/), for PSID survey years 1999, 2001, 2003, 2005. 2007, 2009, and 2011 (tax years n-1). These methods are implemented in two Stata programs, designed to be used with the PSID public-use zipped Main Interview data files: PSID_TAXSIM_1of2.do and PSID_TAXSIM_2of2.do. The main program (2of2) was written by Sara Kimberlin (skimberlin@berkeley.edu) and generates all TAXSIM input variables, runs TAXSIM, adjusts tax estimates using additional information available in PSID data, and calculates total PSID family unit taxes. A separate program (1of2) was written by Jiyoon (June) Kim (junekim@umich.edu) in collaboration with Luke Shaefer (lshaefer@umich.edu) to calculate mortgage interest for itemized deductions; this program needs to be run first, before the main program. Jonathan Latner contributed code to use the programs with the PSID zipped data. The overall methods build on the strategy for using TAXSIM with PSID data outlined by Butrica & Burkhauser (1997), with some expansions and modifications. Note that the methods described below are designed to prioritize accuracy of income taxes calculated for low-income households, particularly refundable tax credits such as the Earned Income Tax Credit (EITC) and the Additional Child Tax Credit. Income tax liability is generally low for low-income households, and the amount of refundable tax credits is often substantially larger than tax liabilities for this population. Payroll tax can also be substantial for low-income households. Thus the methods below focus on maximizing accuracy of income tax and payroll tax calculations for low-income families, with less attention to tax items that largely impact higher-income households (e.g. the treatment of capital gains). VL - 6 ER - TY - CONF T1 - The use of paradata (in time use surveys) to better evaluate data quality T2 - American Association for Public Opinion Research 2014 Annual Conference Y1 - 2014 A1 - Cordova-Cazar, A.L. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2014 Annual Conference CY - Anaheim, CA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Using partially synthetic data to replace suppression in the Business Dynamics Statistics: early results Y1 - 2014 A1 - Miranda, Javier A1 - Vilhuber, Lars AB - Using partially synthetic data to replace suppression in the Business Dynamics Statistics: early results Miranda, Javier; Vilhuber, Lars The Business Dynamics Statistics is a product of the U.S. Census Bureau that provides measures of business openings and closings, and job creation and destruction, by a variety of cross-classifications (firm and establishment age and size, industrial sector, and geography). Sensitive data are currently protected through suppression. However, as additional tabulations are being developed, at ever more detailed geographic levels, the number of suppressions increases dramatically. This paper explores the option of providing public-use data that are analytically valid and without suppressions, by leveraging synthetic data to replace observations in sensitive cells. PB - Cornell University UR - http://hdl.handle.net/1813/40852 ER - TY - JOUR T1 - Using Partially Synthetic Data to Replace Suppression in the Business Dynamics Statistics: Early Results JF - Privacy in Statistical Databases Y1 - 2014 A1 - J. Miranda A1 - L. Vilhuber AB - The Business Dynamics Statistics is a product of the U.S. Census Bureau that provides measures of business openings and closings, and job creation and destruction, by a variety of cross-classifications (firm and establishment age and size, industrial sector, and geography). Sensitive data are currently protected through suppression. However, as additional tabulations are being developed, at ever more detailed geographic levels, the number of suppressions increases dramatically. This paper explores the option of providing public-use data that are analytically valid and without suppressions, by leveraging synthetic data to replace observations in sensitive cells. SN - 978-3-319-11256-5 UR - http://dx.doi.org/10.1007/978-3-319-11257-2_18 ER - TY - RPRT T1 - Using Social Media to Measure Labor Market Flows Y1 - 2014 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew UR - http://www-personal.umich.edu/~shapiro/papers/LaborFlowsSocialMedia.pdf ER - TY - CONF T1 - Web Surveys, Online Panels, and Paradata: Automating Adaptive Design T2 - NSF-Census Research Network (NCRN) Spring Meeting Y1 - 2014 A1 - McCutcheon, A.L. JF - NSF-Census Research Network (NCRN) Spring Meeting CY - Washington, DC UR - http://www.ncrn.info/event/ncrn-meeting-spring-2014 N1 - Conference on Methodological Innovations in the Study of Elections in Europe and Beyond. Presented at Texas A&M University ER - TY - JOUR T1 - What are You Doing Now? Activity Level Responses and Errors in the American Time Use Survey JF - Journal of Survey Statistics and Methodology Y1 - 2014 A1 - T. Al Baghal A1 - Belli, R.F. A1 - Phillips, A.L. A1 - Ruther, N. VL - 2 IS - 4 ER - TY - JOUR T1 - Why data availability is such a hard problem JF - Statistical Journal of the International Association for Official Statistics Y1 - 2014 A1 - A. F. Karr KW - Data Archive KW - Data availability KW - public good KW - replicability KW - reproducibility AB - If data availability were a simple problem, it would already have been resolved. In this paper, I argue that by viewing data availability as a public good, it is possible to both understand the complexities with which it is fraught and identify a path to a solution. VL - 30 IS - 2 ER - TY - CONF T1 - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences T2 - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2014 A1 - Woodruff, A. A1 - Pihur, V. A1 - Acquisti, A. A1 - Consolvo, S. A1 - Schmidt, L. A1 - Brandimarte, L. JF - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY UR - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff N1 - IAPP SOUPS Privacy Award Winner ER - TY - JOUR T1 - Are independent parameter draws necessary for multiple imputation? JF - The American Statistician Y1 - 2013 A1 - Hu, J. A1 - Mitra, R. A1 - Reiter, J.P. VL - 67 UR - http://www.tandfonline.com/doi/full/10.1080/00031305.2013.821953 ER - TY - ABST T1 - A Bayesian Approach to Estimating Agricultural Yield Based on Multiple Repeated Surveys, Institute of Public Policy and the Truman School of Public Affairs Y1 - 2013 A1 - Holan, S.H. ER - TY - RPRT T1 - A Bayesian Approach to Graphical Record Linkage and De-duplication Y1 - 2013 A1 - Steorts, Rebecca C. A1 - Hall, Rob A1 - Fienberg, Stephen E. AB - We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online. JF - arXiv UR - https://arxiv.org/abs/1312.4645 ER - TY - ABST T1 - Bayesian inference for the Spatial Random Effects Model Y1 - 2013 A1 - Cressie, N. JF - Department of Statistics, Macquarie University PB - Macquarie University ER - TY - CONF T1 - Bayesian learning of joint distributions of objects T2 - Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS) 2013 Y1 - 2013 A1 - Banerjee, A. A1 - Murray, J. A1 - Dunson, D. B. AB -

There is increasing interest in broad application areas in defining flexible joint models for data having a variety of measurement scales, while also allowing data of complex types, such as functions, images and documents. We consider a general framework for nonparametric Bayes joint modeling through mixture models that incorporate dependence across data types through a joint mixing measure. The mixing measure is assigned a novel infinite tensor factorization (ITF) prior that allows flexible dependence in cluster allocation across data types. The ITF prior is formulated as a tensor product of stick-breaking processes. Focusing on a convenient special case corresponding to a Parafac factorization, we provide basic theory justifying the flexibility of the proposed prior and resulting asymptotic properties. Focusing on ITF mixtures of product kernels, we develop a new Gibbs sampling algorithm for routine implementation relying on slice sampling. The methods are compared with alternative joint mixture models based on Dirichlet processes and related approaches through simulations and real data applications.

N1 - NCRN ER - TY - CONF T1 - Ecological Prediction with Nonlinear Multivariate Time-Frequency Functional Data Models T2 - Joint Statistical Meetings 2013 Y1 - 2013 A1 - Wikle, C.K. JF - Joint Statistical Meetings 2013 CY - Montreal, Canada ER - TY - JOUR T1 - Ecological Prediction With Nonlinear Multivariate Time-Frequency Functional Data Models JF - Journal of Agricultural, Biological, and Environmental Statistics Y1 - 2013 A1 - Yang, W.H., A1 - Wikle, C.K. A1 - Holan, S.H. A1 - Wildhaber, M.L. VL - 18 UR - http://link.springer.com/article/10.1007/s13253-013-0142-1 ER - TY - JOUR T1 - Empirical Analysis of Data Breach Litigation JF - Journal of Empirical Legal Studies Y1 - 2013 A1 - Romanosky, A. A1 - Hoffman, D. A1 - Acquisti, A. VL - 11 ER - TY - CONF T1 - Encoding Provenance Metadata for Social Science Datasets T2 - Metadata and Semantics Research Y1 - 2013 A1 - Lagoze, Carl A1 - Willliams, Jeremy A1 - Vilhuber, Lars ED - Garoufallou, Emmanouel ED - Greenberg, Jane KW - DDI KW - eSocial Science KW - Metadata KW - Provenance JF - Metadata and Semantics Research T3 - Communications in Computer and Information Science PB - Springer International Publishing VL - 390 SN - 978-3-319-03436-2 UR - http://dx.doi.org/10.1007/978-3-319-03437-9_13 ER - TY - RPRT T1 - Encoding Provenance of Social Science Data: Integrating PROV with DDI Y1 - 2013 A1 - Lagoze, Carl A1 - Block, William C A1 - Williams, Jeremy A1 - Abowd, John A1 - Vilhuber, Lars AB - Encoding Provenance of Social Science Data: Integrating PROV with DDI Lagoze, Carl; Block, William C; Williams, Jeremy; Abowd, John; Vilhuber, Lars Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. Submitted to EDDI13 5th Annual European DDI User Conference December 2013, Paris, France PB - Cornell University UR - http://hdl.handle.net/1813/34443 ER - TY - CONF T1 - Encoding Provenance of Social Science Data: Integrating PROV with DDI T2 - 5th Annual European DDI User Conference Y1 - 2013 A1 - Carl Lagoze A1 - William C. Block A1 - Jeremy Williams A1 - Lars Vilhuber KW - DDI KW - eSocial Science KW - Metadata KW - Provenance AB - Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. JF - 5th Annual European DDI User Conference ER - TY - JOUR T1 - On estimation of mean squared errors of benchmarked and empirical bayes estimators JF - Statistica Sinica Y1 - 2013 A1 - Rebecca C. Steorts A1 - Malay Ghosh VL - 23 ER - TY - CONF T1 - Exact Sparse Recovery with L0 Projections T2 - 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Y1 - 2013 A1 - Ping Li A1 - Cun-Hui Zhang JF - 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ER - TY - CONF T1 - Examining item nonresponse through paradata and respondent characteristics: A multilevel approach T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Cordova-Cazar, A.L. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Examining response time outliers through paradata in Online Panel Surveys T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Lee, J. A1 - T. Al Baghal JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Examining the relationship between error and behavior in the American Time Use Survey using audit trail paradata T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Ruther, N. A1 - T. Al Baghal A1 - A. Eck A1 - L. Stuart A1 - L. Phillips A1 - R. Belli A1 - Soh, L-K JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Fast Near Neighbor Search in High-Dimensional Binary Data Y1 - 2013 A1 - Shrivastava, Anshumali A1 - Li, Ping AB - Fast Near Neighbor Search in High-Dimensional Binary Data Shrivastava, Anshumali; Li, Ping Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections. PB - Cornell University UR - http://hdl.handle.net/1813/37987 ER - TY - CONF T1 - Flexible Semiparametric Hierarchical Spatial Models T2 - Joint Statistical Meetings 2013 Y1 - 2013 A1 - Porter, A.T. JF - Joint Statistical Meetings 2013 CY - Montreal, Canada ER - TY - JOUR T1 - From Facebook Regrets to Facebook Privacy Nudges JF - Ohio State Law Journal Y1 - 2013 A1 - Wang, Y. A1 - Leon, P. G. A1 - Chen, X. A1 - Komanduri, S. A1 - Norcie, G. A1 - Scott, K. A1 - Acquisti, A. A1 - Cranor, L. F. A1 - Sadeh, N. N1 - Invited paper ER - TY - JOUR T1 - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Record Systems JF - Journal of the American Statistical Association Y1 - 2013 A1 - Sadinle, M. A1 - Fienberg, S. E. VL - 108 UR - http://dx.doi.org/10.1080/01621459.2012.757231 ER - TY - JOUR T1 - Gone in 15 Seconds: The Limits of Privacy Transparency and Control JF - IEEE Security & Privacy Y1 - 2013 A1 - Acquisti, A. A1 - Adjerid, I. A1 - Brandimarte, L. VL - 11 ER - TY - JOUR T1 - Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples JF - Statist. Sci. Y1 - 2013 A1 - Deng, Yiting A1 - Hillygus, D. Sunshine A1 - Reiter, Jerome P. A1 - Si, Yajuan A1 - Zheng, Siyu AB - Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples—new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel—offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007–2008 Associated Press–Yahoo! News Election Poll. VL - 28 UR - http://dx.doi.org/10.1214/13-STS414 ER - TY - JOUR T1 - Hierarchical Bayesian Spatio-Temporal Conway-Maxwell Poisson Models with Dynamic Dispersion JF - Journal of Agricultural, Biological, and Environmental Statistics Y1 - 2013 A1 - Wu, G. A1 - Holan, S.H. A1 - Wikle, C.K. CY - Anchorage, Alaska VL - 18 UR - http://link.springer.com/article/10.1007/s13253-013-0141-2 ER - TY - JOUR T1 - Hierarchical Spatio-Temporal Models and Survey Research JF - Statistics Views Y1 - 2013 A1 - Wikle, C. A1 - Holan, S. A1 - Cressie, N. UR - http://www.statisticsviews.com/details/feature/4730991/Hierarchical-Spatio-Temporal-Models-and-Survey-Research.html ER - TY - JOUR T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions JF - Spatial Statistics Y1 - 2013 A1 - Sengupta, A. A1 - Cressie, N. KW - EM algorithm KW - Empirical Bayes KW - Geostatistical process KW - Maximum likelihood estimation KW - MCMC KW - SRE model VL - 4 UR - http://www.sciencedirect.com/science/article/pii/S2211675313000055 ER - TY - ABST T1 - How can survey estimates of small areas be improved by leveraging social-media data? Y1 - 2013 A1 - Cressie, N. A1 - Holan, S. A1 - Wikle, C. JF - The Survey Statistician UR - http://isi.cbs.nl/iass/N68.pdf ER - TY - JOUR T1 - Identifying Neighborhoods Using High Resolution Population Data JF - Annals of the Association of American Geographers Y1 - 2013 A1 - S.E. Spielman A1 - J. Logan VL - 103 ER - TY - RPRT T1 - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files Y1 - 2013 A1 - Block, William C. A1 - Williams, Jeremy A1 - Vilhuber, Lars A1 - Lagoze, Carl A1 - Brown, Warren A1 - Abowd, John M. AB - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files Block, William C.; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl; Brown, Warren; Abowd, John M. Presentation at NADDI 2013 This record has also been archived at http://kuscholarworks.ku.edu/dspace/handle/1808/11093 . PB - Cornell University UR - http://hdl.handle.net/1813/33362 ER - TY - CONF T1 - Is it the Typeset or the Type of Statistics? Disfluent Font and Self-Disclosure T2 - Proceedings of Learning from Authoritative Security Experiment Results (LASER) Y1 - 2013 A1 - Balebako, R. A1 - Pe'er, E. A1 - Brandimarte, L. A1 - Cranor, L. F. A1 - Acquisti, A. JF - Proceedings of Learning from Authoritative Security Experiment Results (LASER) PB - USENIX Association CY - New York, NY UR - https://www.usenix.org/laser2013/program/balebako ER - TY - RPRT T1 - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Y1 - 2013 A1 - Vilhuber, Lars A1 - Abowd, John A1 - Block, William A1 - Lagoze, Carl A1 - Williams, Jeremy AB - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Vilhuber, Lars; Abowd, John; Block, William; Lagoze, Carl; Williams, Jeremy Social science researchers are increasingly interested in making use of confidential micro-data that contains linkages to the identities of people, corporations, etc. The value of this linking lies in the potential to join these identifiable entities with external data such as genome data, geospatial information, and the like. Leveraging these linkages is an essential aspect of “big data” scholarship. However, the utility of these confidential data for scholarship is compromised by the complex nature of their management and curation. This makes it difficult to fulfill US federal data management mandates and interferes with basic scholarly practices such as validation and reuse of existing results. We describe in this paper our work on the CED2AR prototype, a first step in providing researchers with a tool that spans the confidential/publicly-accessible divide, making it possible for researchers to identify, search, access, and cite those data. The particular points of interest in our work are the cloaking of metadata fields and the expression of provenance chains. For the former, we make use of existing fields in the DDI (Data Description Initiative) specification and suggest some minor changes to the specification. For the latter problem, we investigate the integration of DDI with recent work by the W3C PROV working group that has developed a generalizable and extensible model for expressing data provenance. PB - Cornell University UR - http://hdl.handle.net/1813/34534 ER - TY - JOUR T1 - Memory, communication, and data quality in calendar interviews JF - Public Opinion Quarterly Y1 - 2013 A1 - Belli, R. F., A1 - Bilgen, I., A1 - T. Al Baghal VL - 77 ER - TY - THES T1 - Mental Disorders and Inequality in the United States: Intersection of race, gender, and disability on employment and income T2 - Social Work Y1 - 2013 A1 - Camp, J. JF - Social Work PB - Wayne State University VL - Ph.D. ER - TY - JOUR T1 - Misplaced confidences: Privacy and the control paradox JF - Social Psychological and Personality Science Y1 - 2013 A1 - Laura Brandimarte A1 - Alessandro Acquisti A1 - George Loewenstein VL - 4 ER - TY - RPRT T1 - NCRN Meeting Spring 2013 Y1 - 2013 A1 - Vilhuber, Lars AB - NCRN Meeting Spring 2013 Vilhuber, Lars Taken place at the NISS Headquarters, Research Triangle Park, NC. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45870 ER - TY - RPRT T1 - NCRN Newsletter: Volume 1 - Issue 1 Y1 - 2013 A1 - Vilhuber, Lars A1 - Karr, Alan A1 - Reiter, Jerome A1 - Abowd, John A1 - Nunnelly, Jamie AB - NCRN Newsletter: Volume 1 - Issue 1 Vilhuber, Lars; Karr, Alan; Reiter, Jerome; Abowd, John; Nunnelly, Jamie Overview of activities at NSF-Census Research Network nodes from July 2013 to November 2013. NCRN Newsletter Vol. 1, Issue 1: November 17, 2013 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40232 ER - TY - JOUR T1 - Neighborhood contexts, health, and behavior: understanding the role of scale and residential sorting JF - Environment and Planning B Y1 - 2013 A1 - Spielman, S. E. A1 - Linkletter, C. A1 - Yoo, E.-H. VL - 3 ER - TY - CONF T1 - Nonlinear Dynamic Spatio-Temporal Statistical Models T2 - Southern Regional Council on Statistics Summer Research Conference Y1 - 2013 A1 - Wikle, C.K. JF - Southern Regional Council on Statistics Summer Research Conference ER - TY - JOUR T1 - Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Si, Y. A1 - Reiter, J.P. VL - 38 UR - http://www.stat.duke.edu/ jerry/Papers/StatinMed14.pdf ER - TY - CONF T1 - Paradata for Measurement Error Evaluation T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Olson, K. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Predicting survey breakoff in Internet survey panels T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - McCutcheon, A.L. A1 - T. Al Baghal JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of autobiographical recall in surveys T2 - Biennial conference of the Society for Applied Research in Memory and Cognition Y1 - 2013 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L-K A1 - T. Al Baghal JF - Biennial conference of the Society for Applied Research in Memory and Cognition CY - Rotterdam, Netherlands UR - http://static1.squarespace.com/static/504170d6e4b0b97fe5a59760/t/52457a8be4b0012b7a5f462a/1380285067247/SARMAC_X_PaperJune27.pdf ER - TY - CONF T1 - Predicting the occurrence of respondent retrieval strategies in calendar interviewing: The quality of retrospective reports T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Belli, R.F. A1 - Miller, L.D. A1 - Soh, L-K A1 - T. Al Baghal JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - RPRT T1 - Presentation: Predicting Multiple Responses with Boosting and Trees Y1 - 2013 A1 - Li, Ping A1 - Abowd, John AB - Presentation: Predicting Multiple Responses with Boosting and Trees Li, Ping; Abowd, John Presentation by Ping Li and John Abowd at FCSM on November 4, 2013 PB - Cornell University UR - http://hdl.handle.net/1813/40255 ER - TY - CONF T1 - The process of turning audit trails from a CATI survey into useful data: Interviewer behavior paradata in the American Time Use Survey T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Ruther, N. A1 - Phipps, P. A1 - Belli, R.F. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - ABST T1 - Recent Advances in Spatial Methods for Federal Surveys Y1 - 2013 A1 - Holan, S.H. ER - TY - RPRT T1 - Reconsidering the Consequences of Worker Displacements: Survey versus Administrative Measurements Y1 - 2013 A1 - Flaaen, Aaron A1 - Shapiro, Matthew A1 - Isaac Sorkin AB - Displaced workers suffer persistent earnings losses. This stark finding has been established by following workers in administrative data after mass layoffs under the presumption that these are involuntary job losses owing to economic distress. Using linked survey and administrative data, this paper examines this presumption by matching worker-supplied reasons for separations with what is happening at the firm. The paper documents substantially different earnings dynamics in mass layoffs depending on the reason the worker gives for the separation. Using a new methodology for accounting for the increase in the probability of separation among all types of survey response during in a mass layoff, the paper finds earnings loss estimates that are surprisingly close to those using only administrative data. Finally, the survey-administrative link allows the decomposition of earnings losses due to subsequent nonemployment into non-participation and unemployment. Including the zero earnings of those identified as being unemployed substantially increases the estimate of earnings losses. PB - University of Michigan UR - http://www-personal.umich.edu/~shapiro/papers/ReconsideringDisplacements.pdf ER - TY - ABST T1 - A Reduced Rank Model for Analyzing Multivariate Spatial Datasets Y1 - 2013 A1 - Bradley, J.R. JF - University of Missouri-Kansas City PB - University of Missouri-Kansas City ER - TY - JOUR T1 - Ringtail: a generalized nowcasting system. JF - WebDB Y1 - 2013 A1 - Antenucci, Dolan A1 - Li, Erdong A1 - Liu, Shaobo A1 - Zhang, Bochun A1 - Cafarella, Michael J A1 - Ré, Christopher AB - Social media nowcasting—using online user activity to de- scribe real-world phenomena—is an active area of research to supplement more traditional and costly data collection methods such as phone surveys. Given the potential impact of such research, we would expect general-purpose nowcast- ing systems to quickly become a standard tool among non- computer scientists, yet it has largely remained a research topic. We believe a major obstacle to widespread adoption is the nowcasting feature selection problem. Typical now- casting systems require the user to choose a handful of social media objects from a pool of billions of potential candidates, which can be a time-consuming and error-prone process. We have built Ringtail, a nowcasting system that helps the user by automatically suggesting high-quality signals. We demonstrate that Ringtail can make nowcasting easier by suggesting relevant features for a range of topics. The user provides just a short topic query (e.g., unemployment) and a small conventional dataset in order for Ringtail to quickly return a usable predictive nowcasting model. VL - 6 UR - http://cs.stanford.edu/people/chrismre/papers/Ringtail-VLDB-demo.pdf ER - TY - JOUR T1 - Ringtail: Feature Selection for Easier Nowcasting. JF - WebDB Y1 - 2013 A1 - Antenucci, Dolan A1 - Cafarella, Michael J A1 - Levenstein, Margaret C. A1 - Ré, Christopher A1 - Shapiro, Matthew AB - In recent years, social media “nowcasting”—the use of on- line user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We be- lieve a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical back- ground that users may not have. We propose Ringtail, which helps the user choose rele- vant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user. UR - http://www.cs.stanford.edu/people/chrismre/papers/webdb_ringtail.pdf ER - TY - JOUR T1 - Rising extreme poverty in the United States and the response of means-tested transfers. JF - Social Service Review Y1 - 2013 A1 - H. Luke Shaefer A1 - Edin, K. AB - This study documents an increase in the prevalence of extreme poverty among US households with children between 1996 and 2011 and assesses the response of major federal means-tested transfer programs. Extreme poverty is defined using a World Bank metric of global poverty: \$2 or less, per person, per day. Using the 1996–2008 panels of the Survey of Income and Program Participation (SIPP), we estimate that in mid-2011, 1.65 million households with 3.55 million children were living in extreme poverty in a given month, based on cash income, constituting 4.3 percent of all nonelderly households with children. The prevalence of extreme poverty has risen sharply since 1996, particularly among those most affected by the 1996 welfare reform. Adding SNAP benefits to household income reduces the number of extremely poor households with children by 48.0 percent in mid-2011. Adding SNAP, refundable tax credits, and housing subsidies reduces it by 62.8 percent. VL - 87 UR - http://www.jstor.org/stable/10.1086/671012 IS - 2 ER - TY - CONF T1 - Sleights of Privacy: Framing, Disclosures, and the Limits of Transparency T2 - Proceedings of the Ninth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2013 A1 - Adjerid, I. A1 - Acquisti, A. A1 - Loewenstein, G. JF - Proceedings of the Ninth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY ER - TY - ABST T1 - Some Historical Remarks on Spatial Statistics, Spatio-Temporal Statistics Y1 - 2013 A1 - Cressie, N. JF - Reading Group, University of Missouri ER - TY - THES T1 - Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables (Ph.D. Thesis) T2 - Department of Statistical Science Y1 - 2013 A1 - Jared S. Murray AB - This thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm. The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science. The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks. The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index. JF - Department of Statistical Science PB - Duke University UR - http://dukespace.lib.duke.edu/dspace/handle/10161/8253 ER - TY - ABST T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates Y1 - 2013 A1 - Porter, A.T. ER - TY - CHAP T1 - Spatio-temporal Design: Advances in Efficient Data Acquisition T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2013 A1 - Holan, S. A1 - Wikle, C. ED - Jorge Mateu ED - Werner Muller KW - semiparametric dynamic design for non-Gaussian spatio-temporal data JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley SN - 9780470974292 ER - TY - ABST T1 - Statistics and the Environment: Overview and Challenges Y1 - 2013 A1 - Wikle, C.K. N1 - Invited Introductory Overview Lecture ER - TY - ABST T1 - Statistics for Spatio-Temporal Data Y1 - 2013 A1 - Cressie, N. JF - Invited One-Day Short Course at the U.S. Census Bureau ER - TY - CONF T1 - Troubles with time-use: Examining potential indicators of error in the American Time Use Survey T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Phillips, A.L. A1 - T. Al Baghal A1 - Belli, R.F. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Two-stage Bayesian benchmarking as applied to small area estimation JF - TEST Y1 - 2013 A1 - Rebecca C. Steorts A1 - Malay Ghosh KW - small area estimation VL - 22 IS - 4 ER - TY - THES T1 - User Modeling via Machine Learning and Rule-based Reasoning to Understand and Predict Errors in Survey Systems Y1 - 2013 A1 - Stuart, Leonard Cleve PB - University of Nebraska-Lincoln UR - http://digitalcommons.unl.edu/computerscidiss/70/ ER - TY - JOUR T1 - Using High Resolution Population Data to Identify Neighborhoods and Determine their Boundaries JF - Annals of the Association of American Geographers Y1 - 2013 A1 - Spielman, S. E. A1 - Logan, J. VL - 103 UR - http://www.tandfonline.com/doi/abs/10.1080/00045608.2012.685049 ER - TY - THES T1 - Using Satellite Imagery to Evaluate and Analyze Socioeconomic Changes Observed with Census Data Y1 - 2013 A1 - Wilson, C. R. N1 - NCRN ER - TY - CONF T1 - What are you doing now?: Audit trails, Activity level responses and error in the American Time Use Survey T2 - American Association for Public Opinion Research Y1 - 2013 A1 - T. Al Baghal A1 - Phillips, A.L. A1 - Ruther, N. A1 - Belli, R.F. A1 - Stuart, L. A1 - Eck, A. A1 - Soh, L-K JF - American Association for Public Opinion Research CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - What is Privacy Worth? JF - Journal of Legal Studies Y1 - 2013 A1 - Acquisti, A. A1 - John, L. A1 - Loewenstein, G. VL - 42 N1 - Leading paper, 2010 Future of Privacy Forum's Best Privacy Papers for Policy Makers'' Competition ER - TY - JOUR T1 - Achieving both valid and secure logistic regression analysis on aggregated data from different private sources JF - Journal of Privacy and Confidentiality Y1 - 2012 A1 - Yuval Nardi A1 - Robert Hall A1 - Stephen E. Fienberg VL - 4 ER - TY - JOUR T1 - An Approach for Identifying and Predicting Economic Recessions in Real-Time Using Time-Frequency Functional Models JF - Applied Stochastic Models in Business and Industry Y1 - 2012 A1 - Holan, S. A1 - Yang, W. A1 - Matteson, D. A1 - Wikle, C.K. KW - Bayesian model averaging KW - business cycles KW - empirical orthogonal functions KW - functional data KW - MIDAS KW - spectrogram KW - stochastic search variable selection VL - 28 UR - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1954/full N1 - DOI: 10.1002/asmb.1954 ER - TY - ABST T1 - Asymptotic Theory of Cepstral Random Fields Y1 - 2012 A1 - McElroy, T. A1 - Holan, S. PB - University of Missouri N1 - Arxiv Preprint arXiv:1112.1977 ER - TY - RPRT T1 - Asymptotic Theory of Cepstral Random Fields Y1 - 2012 A1 - McElroy, T.S. A1 - Holan, S.H. AB - Asymptotic Theory of Cepstral Random Fields McElroy, T.S.; Holan, S.H. Random fields play a central role in the analysis of spatially correlated data and, as a result,have a significant impact on a broad array of scientific applications. Given the importance of this topic, there has been a substantial amount of research devoted to this area. However, the cepstral random field model remains largely underdeveloped outside the engineering literature. We provide a comprehensive treatment of the asymptotic theory for two-dimensional random field models. In particular, we provide recursive formulas that connect the spatial cepstral coefficients to an equivalent moving-average random field, which facilitates easy computation of the necessary autocovariance matrix. Additionally, we establish asymptotic consistency results for Bayesian, maximum likelihood, and quasi-maximum likelihood estimation of random field parameters and regression parameters. Further, in both the maximum and quasi-maximum likelihood frameworks, we derive the asymptotic distribution of our estimator. The theoretical results are presented generally and are of independent interest,pertaining to a wide class of random field models. The results for the cepstral model facilitate model-building: because the cepstral coefficients are unconstrained in practice, numerical optimization is greatly simplified, and we are always guaranteed a positive definite covariance matrix. We show that inference for individual coefficients is possible, and one can refine models in a disciplined manner. Finally, our results are illustrated through simulation and the analysis of straw yield data in an agricultural field experiment. http://arxiv.org/pdf/1112.1977.pdf PB - University of Missouri UR - http://hdl.handle.net/1813/34461 ER - TY - JOUR T1 - Bayesian Multi-Regime Smooth Transition Regression with Ordered Categorical Variables JF - Computational Statistics and Data Analysis Y1 - 2012 A1 - Wang, J. A1 - Holan, S. VL - 56 UR - http://dx.doi.org/10.1016/j.csda.2012.04.018 N1 - http://dx.doi.org/10.1016/j.csda.2012.04.018 ER - TY - ABST T1 - Bayesian Multiscale Multiple Imputation With Implications to Data Confidentiality Y1 - 2012 A1 - Holan, S.H. N1 - Texas A&M University, January 2012; Duke University (Hosted by Duke Node), February 2012; Rice University, March 2012; Clemson University, April 2012 ER - TY - CONF T1 - Bayesian Parametric and Nonparametric Inference for Multiple Record Likage T2 - Modern Nonparametric Methods in Machine Learning Workshop Y1 - 2012 A1 - Hall, R. A1 - Steorts, R. A1 - Fienberg, S. E. JF - Modern Nonparametric Methods in Machine Learning Workshop PB - NIPS UR - http://www.stat.cmu.edu/NCRN/PUBLIC/files/beka_nips_finalsub4.pdf ER - TY - CONF T1 - Calendar interviewing in life course research: Associations between verbal behaviors and data quality T2 - Eighth International Conference on Social Science Methodology Y1 - 2012 A1 - Belli, R.F. A1 - Bilgen, I. A1 - T. Al Baghal JF - Eighth International Conference on Social Science Methodology CY - Sydney Australia UR - https://conference.acspri.org.au/index.php/rc33/2012/paper/view/366 ER - TY - CONF T1 - Change of Support in Spatio-Temporal Dynamical Models T2 - Joint Statistical Meetings Y1 - 2012 A1 - Wikle, C.K. JF - Joint Statistical Meetings CY - Montreal, Canada ER - TY - ABST T1 - Confidentiality and Privacy Protection in a Non-US Census Context Y1 - 2012 A1 - Anne-Sophie Charest PB - Carnegie Mellon University ER - TY - CONF T1 - Counting the people T2 - Nathan and Beatrice Keyfitz Lecture in Mathematics and the Social Sciences Y1 - 2012 A1 - Stephen E. Fienberg JF - Nathan and Beatrice Keyfitz Lecture in Mathematics and the Social Sciences PB - Fields Institute CY - Toronto, Canada ER - TY - THES T1 - Creation and Analysis of Differentially-Private Synthesis Datasets Y1 - 2012 A1 - Anne-Sophie Charest PB - Carnegie Mellon University N1 - PhD Thesis, Department of Statistics ER - TY - RPRT T1 - Data Management of Confidential Data Y1 - 2012 A1 - Lagoze, Carl A1 - Block, William C. A1 - Williams, Jeremy A1 - Abowd, John M. A1 - Vilhuber, Lars AB - Data Management of Confidential Data Lagoze, Carl; Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data. PB - Cornell University UR - http://hdl.handle.net/1813/30924 ER - TY - JOUR T1 - Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications JF - Journal of Privacy and Confidentiality Y1 - 2012 A1 - Yang Xiaolin A1 - Stephen E. Fienberg A1 - Alessandro Rinaldo VL - 4 ER - TY - CONF T1 - Differential Privacy for Synthetic Datasets T2 - Proceedings of the Survey Research Section of the SSC Y1 - 2012 A1 - Anne-Sophie Charest JF - Proceedings of the Survey Research Section of the SSC CY - Guelph, Ontario N1 - Invited session on Confidentiality of the Annual Meeting of the Statistical Society of Canada ER - TY - CONF T1 - Disambiguating USPTO Inventors with Classification Models Trained on Comparisons of Labeled Inventor Records T2 - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University Y1 - 2012 A1 - Samuel Ventura A1 - Rebecca Nugent A1 - Erich R.H. Fuchs JF - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University ER - TY - RPRT T1 - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) Y1 - 2012 A1 - Block, William C. A1 - Williams, Jeremy A1 - Abowd, John M. A1 - Vilhuber, Lars A1 - Lagoze, Carl AB - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars; Lagoze, Carl This presentation will demonstrate the latest DDI-related technological developments of Cornell University’s$3 million NSF-Census Research Network (NCRN) award, dedicated to improving the documentation, discoverability, and accessibility of public and restricted data from the federal statistical system in the United States. The current internal name for our DDI-based system is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). CED²AR ingests metadata from heterogeneous sources and supports filtered synchronization between restricted and public metadata holdings. Currently-supported CED²AR “connector workflows” include mechanisms to ingest IPUMS, zero-observation files from the American Community Survey (DDI 2.1), and SIPP Synthetic Beta (DDI 1.2). These disparate metadata sources are all transformed into a DDI 2.5 compliant form and stored in a single repository. In addition, we will demonstrate an extension to DDI 2.5 that allows for the labeling of elements within the schema to indicate confidentiality. This metadata can then be filtered, allowing the creation of derived public use metadata from an original confidential source. This repository is currently searchable online through a prototype application demonstrating the ability to search across previously heterogeneous metadata sources. Presentation at the 4th Annual European DDI User Conference (EDDI12), Norwegian Social Science Data Services, Bergen, Norway, 3 December, 2012 PB - Cornell University UR - http://hdl.handle.net/1813/30922 ER - TY - CONF T1 - The Economics of Privacy T2 - The Oxford Handbook of the Digital Economy Y1 - 2012 A1 - Laura Brandimarte A1 - Alessandro Acquisti ED - Martin Peitz ED - Joel Waldfogel JF - The Oxford Handbook of the Digital Economy PB - Oxford University Press SN - 9780195397840 ER - TY - ABST T1 - Efficient Time-Frequency Representations in High-Dimensional Spatial and Spatio-Temporal Models Y1 - 2012 A1 - Wikle, C.K. ER - TY - CONF T1 - Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables T2 - Privacy in Statistical Databases Y1 - 2012 A1 - Anne-Sophie Charest ED - Josep Domingo-Ferrer ED - Ilenia Tinnirello JF - Privacy in Statistical Databases PB - Springer VL - 7556 SN - 978-3-642-33627-0 N1 - Print ISBN is 978-3-642-33626-3 ER - TY - RPRT T1 - Encoding Provenance Metadata for Social Science Datasets Y1 - 2012 A1 - Lagoze, Carl A1 - Williams, Jeremy A1 - Vilhuber, Lars AB - Encoding Provenance Metadata for Social Science Datasets Lagoze, Carl; Williams, Jeremy; Vilhuber, Lars Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and re- produce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard. Submitted to Metadata and Semantics Research (MTSR 2013) conference. PB - Cornell University UR - http://hdl.handle.net/1813/55327 ER - TY - CHAP T1 - Entropy Estimations Using Correlated Symmetric Stable Random Projections T2 - Advances in Neural Information Processing Systems 25 Y1 - 2012 A1 - Ping Li A1 - Cun-Hui Zhang ED - P. Bartlett ED - F.C.N. Pereira ED - C.J.C. Burges ED - L. Bottou ED - K.Q. Weinberger JF - Advances in Neural Information Processing Systems 25 UR - http://books.nips.cc/papers/files/nips25/NIPS2012_1456.pdf ER - TY - JOUR T1 - Estimating identification disclosure risk using mixed membership models JF - Journal of the American Statistical Association Y1 - 2012 A1 - Manrique-Vallier, D. A1 - Reiter, J.P. VL - 107 ER - TY - CONF T1 - On Estimation of Mean Squared Errors of Benchmarked and Empirical Bayes Estimators T2 - 2012 Joint Statistical Meetings Y1 - 2012 A1 - Rebecca C. Steorts A1 - Malay Ghosh JF - 2012 Joint Statistical Meetings CY - San Diego, CA ER - TY - CONF T1 - Exploring interviewer and respondent interactions: An innovative behavior coding approach T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Walton, L. A1 - Stange, M. A1 - Powell, R. A1 - Belli, R.F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - ABST T1 - Extreme Poverty in the United States, 1996 to 2011 Y1 - 2012 A1 - Shaefer, H. Luke A1 - Edin, Kathryn PB - University of Michigan UR - http://www.npc.umich.edu/publications/policy_briefs/brief28/policybrief28.pdf N1 - NCRN ER - TY - CONF T1 - Fast Multi-task Learning for Query Spelling Correction T2 - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012) Y1 - 2012 A1 - Xu Sun A1 - Anshumali Shrivastava A1 - Ping Li JF - The 21$^{st}$ ACM International Conference on Information and Knowledge Management (CIKM 2012) UR - http://dx.doi.org/10.1145/2396761.2396800 ER - TY - CONF T1 - Fast Near Neighbor Search in High-Dimensional Binary Data T2 - The European Conference on Machine Learning (ECML 2012) Y1 - 2012 A1 - Anshumali Shrivastava A1 - Ping Li JF - The European Conference on Machine Learning (ECML 2012) ER - TY - CONF T1 - Flexible Spectral Models for Multivariate Time Series T2 - Joint Statistical Meetings 2012 Y1 - 2012 A1 - Holan, S.H. JF - Joint Statistical Meetings 2012 ER - TY - RPRT T1 - A Generalized Fellegi-Sunter Framework for Multiple Record Linkage with Application to Homicide Records Systems Y1 - 2012 A1 - Mauricio Sadinle A1 - Stephen E. Fienberg JF - arXiv UR - https://arxiv.org/abs/1205.3217 ER - TY - CONF T1 - GPU-based minwise hashing: GPU-based minwise hashing T2 - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume) Y1 - 2012 A1 - Ping Li A1 - Anshumali Shrivastava A1 - Arnd Christian König JF - Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume) UR - http://doi.acm.org/10.1145/2187980.2188129 ER - TY - CONF T1 - Hierarchical General Quadratic Nonlinear Models for Spatio-Temporal Dynamics T2 - Red Raider Conference Y1 - 2012 A1 - Wikle, C.K. JF - Red Raider Conference PB - Texas Tech University CY - Lubbock, TX ER - TY - ABST T1 - Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions Y1 - 2012 A1 - Sengupta, A. A1 - Cressie, N. PB - The Ohio State University ER - TY - ABST T1 - Inference for Count Data using the Spatial Random Effects Model Y1 - 2012 A1 - Cressie, N. ER - TY - JOUR T1 - Inferentially valid partially synthetic data: Generating from posterior predictive distributions not necessary JF - Journal of Official Statistics Y1 - 2012 A1 - Reiter, J.P. A1 - Kinney, S.K. VL - 28 ER - TY - CONF T1 - Interviewer variance of interviewer and respondent behaviors: A new frontier in analyzing the interviewer-respondent interaction T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Charoenruk, N. A1 - Parkhurst, B. A1 - Ay, M. A1 - Belli, R. F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html N1 - Annual conference of the Midwest Association for Public Opinion Research, Chicago, Illinois. ER - TY - CONF T1 - Logit-Based Confidence Intervals for Single Capture-Recapture Estimation T2 - American Statistical Association Pittsburgh Chapter Banquet Y1 - 2012 A1 - Mauricio Sadinle JF - American Statistical Association Pittsburgh Chapter Banquet CY - Pittsburgh, PA N1 - April 9, 2012 ER - TY - CONF T1 - Maintaining Quality in the Face of Rapid Program Expansion T2 - 2012 Joint Statistical Meetings Y1 - 2012 A1 - Cosma Shalizi A1 - Rebecca Nugent JF - 2012 Joint Statistical Meetings CY - San Diego, CA ER - TY - CONF T1 - Methods Matter: Revamping Inventor Disambiguation Algorithms with Classification Models and Labeled Inventor Records T2 - Conference Presentation Academy of Management Annual Meeting Y1 - 2012 A1 - Samuel Ventura A1 - Rebecca Nugent A1 - Erich R.H. Fuchs JF - Conference Presentation Academy of Management Annual Meeting CY - Boston, MA ER - TY - CONF T1 - MulFiles Record Linkage Using a Generalized Fellegi-Sunter Framework T2 - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University Y1 - 2012 A1 - Mauricio Sadinle JF - Conference Presentation Classification Society Annual Meeting, Carnegie Mellon University ER - TY - RPRT T1 - NCRN Meeting Fall 2012 Y1 - 2012 A1 - Vilhuber, Lars AB - NCRN Meeting Fall 2012 Vilhuber, Lars Taken place at the Census Bureau Headquarters, Suitland, MD. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/45884 ER - TY - RPRT T1 - The NSF-Census Research Network: Cornell Node Y1 - 2012 A1 - Block, William C. A1 - Lagoze, Carl A1 - Vilhuber, Lars A1 - Brown, Warren A. A1 - Williams, Jeremy A1 - Arguillas, Florio AB - The NSF-Census Research Network: Cornell Node Block, William C.; Lagoze, Carl; Vilhuber, Lars; Brown, Warren A.; Williams, Jeremy; Arguillas, Florio Cornell University has received a \$3M NSF-Census Research Network (NCRN) award to improve the documentation and discoverability of both public and restricted data from the federal statistical system. The current internal name for this project is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). The diagram to the right provides a high level architectural overview of the system to be implemented. The CED²AR will be based upon leading metadata standards such as the Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX) and be flexibly designed to ingest documentation from a variety of source files. It will permit synchronization between the public and confidential instances of the repository. The scholarly community will be able to use the CED²AR as it would a conventional metadata repository, deprived only of the values of certain confidential information, but not their metadata. The authorized user, working on the secure Census Bureau network, could use the CED²AR with full information in authorized domains. PB - Cornell University UR - http://hdl.handle.net/1813/30925 ER - TY - CHAP T1 - One Permutation Hashing T2 - Advances in Neural Information Processing Systems 25 Y1 - 2012 A1 - Ping Li A1 - Art Owen A1 - Cun-Hui Zhang ED - P. Bartlett ED - F.C.N. Pereira ED - C.J.C. Burges ED - L. Bottou ED - K.Q. Weinberger JF - Advances in Neural Information Processing Systems 25 UR - http://books.nips.cc/papers/files/nips25/NIPS2012_1436.pdf ER - TY - RPRT T1 - Presentation: Revisiting the Economics of Privacy: Population Statistics and Privacy as Public Goods Y1 - 2012 A1 - Abowd, John AB - Presentation: Revisiting the Economics of Privacy: Population Statistics and Privacy as Public Goods Abowd, John Anonymization and data quality are intimately linked. Although this link has been properly acknowledged in the Computer Science and Statistical Disclosure Limitation literatures, economics offers a framework for formalizing the linkage and analyzing optimal decisions and equilibrium outcomes. The opinions expressed in this presentation are those of the author and neither the National Science Foundation nor the Census Bureau. PB - Cornell University UR - http://hdl.handle.net/1813/30937 ER - TY - JOUR T1 - Privacy in a world of electronic data: Whom should you trust? JF - Notices of the AMS Y1 - 2012 A1 - Stephen E. Fienberg VL - 59 ER - TY - JOUR T1 - Privacy-preserving data sharing in high dimensional regression and classification settings JF - Journal of Privacy and Confidentiality Y1 - 2012 A1 - Stephen E. Fienberg A1 - Jiashun Jin VL - 4 ER - TY - CHAP T1 - A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs T2 - Privacy in Statistical Databases Y1 - 2012 A1 - Abowd, John M. A1 - Vilhuber, Lars A1 - Block, William ED - Domingo-Ferrer, Josep ED - Tinnirello, Ilenia KW - Data Archive KW - Data Curation KW - Privacy-preserving Datamining KW - Statistical Disclosure Limitation JF - Privacy in Statistical Databases T3 - Lecture Notes in Computer Science PB - Springer Berlin Heidelberg VL - 7556 SN - 978-3-642-33626-3 UR - http://dx.doi.org/10.1007/978-3-642-33627-0_17 ER - TY - CONF T1 - Query spelling correction using multi-task learning T2 - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume) Y1 - 2012 A1 - Xu Sun A1 - Anshumali Shrivastava A1 - Ping Li JF - Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume) UR - http://doi.acm.org/10.1145/2187980.2188153 ER - TY - JOUR T1 - Rejoinder: An approach for identifying and predicting economic recessions in real time using time frequency functional models JF - Applied Stochastic Models in Business and Industry Y1 - 2012 A1 - Holan, S. A1 - Yang, W. A1 - Matteson, D. A1 - Wikle, C. VL - 28 UR - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1955/full ER - TY - CHAP T1 - Semiparametric Dynamic Design of Monitoring Networks for Non-Gaussian Spatio-Temporal Data T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2012 A1 - Holan, S. A1 - Wikle, C.K. ED - Jorge Mateu ED - Werner Muller JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley CY - Chichester, UK UR - http://onlinelibrary.wiley.com/doi/10.1002/9781118441862.ch12/summary ER - TY - CONF T1 - Sleight of Privacy T2 - Conference on Web Privacy Measurement Y1 - 2012 A1 - Idris Adjerid A1 - Alessandro Acquisti A1 - Laura Brandimarte JF - Conference on Web Privacy Measurement ER - TY - THES T1 - Smooth Post-Stratification in Multiple Capture Recapture Y1 - 2012 A1 - Zachary Kurtz PB - Carnegie Mellon University N1 - Department of Statistics ER - TY - ABST T1 - Spatio-Temporal Statistics at Mizzou, Truman School of Public Affairs Y1 - 2012 A1 - Wikle, C.K. ER - TY - CONF T1 - Statistics in Service to the Nation T2 - Presentation Samuel S. Wilks Lecture Y1 - 2012 A1 - Stephen E. Fienberg JF - Presentation Samuel S. Wilks Lecture CY - Princeton, NJ N1 - April 23, 2012 ER - TY - CONF T1 - Teaching about Big Data: Curricular Issues T2 - 2012 Joint Statistical Meetings Y1 - 2012 A1 - Stephen E. Fienberg JF - 2012 Joint Statistical Meetings CY - San Diego, CA ER - TY - JOUR T1 - Testing for Membership to the IFRA and the NBU Classes of Distributions JF - Journal of Machine Learning Research - Proceedings Track for the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012) Y1 - 2012 A1 - Radhendushka Srivastava A1 - Ping Li A1 - Debasis Sengupta VL - 22 UR - http://jmlr.csail.mit.edu/proceedings/papers/v22/srivastava12.html ER - TY - CONF T1 - Thinking inside the box: Mapping the microstructure of urban environment (and why it matters) T2 - AutoCarto 2012 Y1 - 2012 A1 - Seth Spielman A1 - David Folch A1 - John Logan A1 - Nicholas Nagle KW - cartography JF - AutoCarto 2012 CY - Columbus, Ohio UR - http://www.cartogis.org/docs/proceedings/2012/Spielman_etal_AutoCarto2012.pdf ER - TY - CONF T1 - Troubles with time-use: Examining potential indicators of error in the ATUS T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Phillips, A. L., A1 - T. Al Baghal A1 - Belli, R. F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html N1 - Presented at the annual conference of the Midwest Association for Public Opinion Research, Chicago, Illinois ER - TY - CONF T1 - Valid Statistical Inference on Automatically Matched Files T2 - Privacy in Statistical Databases Y1 - 2012 A1 - Robert Hall A1 - Stephen E. Fienberg ED - Josep Domingo-Ferrer ED - Ilenia Tinnirello JF - Privacy in Statistical Databases PB - Springer ER - TY - JOUR T1 - The welfare reforms of the 1990s and the stratification of material well-being among low-income households with children JF - Children and Youth Services Review Y1 - 2012 A1 - Shaefer, H. Luke A1 - Ybarra, Marci AB -