TY - JOUR T1 - Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion JF - Review of Economics and Statistics Y1 - Submitted A1 - T. Kirk White A1 - Jerome P. Reiter A1 - Amil Petrin AB - In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures 79% and 73% of observations respectively have imputed data for at least one variable used to compute total factor productivity. The Bureau primarily imputes for missing values using mean-imputation methods which can reduce the true underlying variance of the imputed variables. For every variable entering TFP in 2002 and 2007 we show the dispersion is significantly smaller in the Census mean-imputed versus the Census non-imputed data. As an alternative to mean imputation we show how to use classification and regression trees (CART) to allow for a distribution of multiple possible impute values based on other plants that are CART-algorithmically determined to be similar based on other observed variables. For 90% of the 473 industries in 2002 and the 84% of the 471 industries in 2007 we find that TFP dispersion increases as we move from Census mean-imputed data to Census non-imputed data to the CART-imputed data. UR - http://www.mitpressjournals.org/doi/abs/10.1162/REST_a_00678 ER - TY - JOUR T1 - The Earned Income Tax Credit and Food Insecurity: Who Benefits? Y1 - forthcoming A1 - Shaefer, H.L. A1 - Wilson, R. ER - TY - JOUR T1 - Adaptively-Tuned Particle Swarm Optimization with Application to Spatial Design JF - Stat Y1 - 2017 A1 - Simpson, M. A1 - Wikle, C.K. A1 - Holan, S.H. AB - Particle swarm optimization (PSO) algorithms are a class of heuristic optimization algorithms that are attractive for complex optimization problems. We propose using PSO to solve spatial design problems, e.g. choosing new locations to add to an existing monitoring network. Additionally, we introduce two new classes of PSO algorithms that perform well in a wide variety of circumstances, called adaptively tuned PSO and adaptively tuned bare bones PSO. To illustrate these algorithms, we apply them to a common spatial design problem: choosing new locations to add to an existing monitoring network. Specifically, we consider a network in the Houston, TX, area for monitoring ambient ozone levels, which have been linked to out-of-hospital cardiac arrest rates. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA VL - 6 UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.142/abstract IS - 1 ER - TY - JOUR T1 - Bayesian Hierarchical Multi-Population Multistate Jolly-Seber Models with Covariates: Application to the Pallid Sturgeon Population Assessment Program JF - Journal of the American Statistical Association Y1 - 2017 A1 - Wu, G. A1 - Holan, S.H. AB - Estimating abundance for multiple populations is of fundamental importance to many ecological monitoring programs. Equally important is quantifying the spatial distribution and characterizing the migratory behavior of target populations within the study domain. To achieve these goals, we propose a Bayesian hierarchical multi-population multistate Jolly–Seber model that incorporates covariates. The model is proposed using a state-space framework and has several distinct advantages. First, multiple populations within the same study area can be modeled simultaneously. As a consequence, it is possible to achieve improved parameter estimation by “borrowing strength” across different populations. In many cases, such as our motivating example involving endangered species, this borrowing of strength is crucial, as there is relatively less information for one of the populations under consideration. Second, in addition to accommodating covariate information, we develop a computationally efficient Markov chain Monte Carlo algorithm that requires no tuning. Importantly, the model we propose allows us to draw inference on each population as well as on multiple populations simultaneously. Finally, we demonstrate the effectiveness of our method through a motivating example of estimating the spatial distribution and migration of hatchery and wild populations of the endangered pallid sturgeon (Scaphirhynchus albus), using data from the Pallid Sturgeon Population Assessment Program on the Lower Missouri River. Supplementary materials for this article are available online. VL - 112 UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2016.1211531 IS - 518 ER - TY - JOUR T1 - The Cepstral Model for Multivariate Time Series: The Vector Exponential Model JF - Statistica Sinica Y1 - 2017 A1 - Holan, S.H. A1 - McElroy, T.S. A1 - Wu, G. KW - Autocovariance matrix KW - Bayesian estimation KW - Cepstral KW - Coherence KW - Spectral density matrix KW - stochastic search variable selection KW - Wold coefficients. AB - Vector autoregressive (VAR) models have become a staple in the analysis of multivariate time series and are formulated in the time domain as difference equations, with an implied covariance structure. In many contexts, it is desirable to work with a stable, or at least stationary, representation. To fit such models, one must impose restrictions on the coefficient matrices to ensure that certain determinants are nonzero; which, except in special cases, may prove burdensome. To circumvent these difficulties, we propose a flexible frequency domain model expressed in terms of the spectral density matrix. Specifically, this paper treats the modeling of covariance stationary vector-valued (i.e., multivariate) time series via an extension of the exponential model for the spectrum of a scalar time series. We discuss the modeling advantages of the vector exponential model and its computational facets, such as how to obtain Wold coefficients from given cepstral coefficients. Finally, we demonstrate the utility of our approach through simulation as well as two illustrative data examples focusing on multi-step ahead forecasting and estimation of squared coherence. VL - 27 UR - http://www3.stat.sinica.edu.tw/statistica/J27N1/J27N12/J27N12.html ER - TY - RPRT T1 - Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data. (With Discussion). Y1 - 2017 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K. KW - Aggregation KW - American Community Survey KW - Bayesian hierarchical model KW - Big Data KW - Longitudinal Employer-Household Dynamics (LEHD) program KW - Markov chain Monte Carlo KW - Non-Gaussian. KW - Quarterly Workforce Indicators AB - We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensional count-valued data. Our primary interest is when there are possibly millions of data points referenced over different variables, geographic regions, and times. This problem requires extensive methodological advancements, as jointly modeling correlated data of this size leads to the so-called "big n problem." The computational complexity of prediction in this setting is further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we develop a new computationally efficient distribution theory for this setting. In particular, we introduce a multivariate log-gamma distribution and provide substantial theoretical development including: results regarding conditional distributions, marginal distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. The results in this manuscript are extremely general, and can be used for data that exhibit fewer sources of dependency than what we consider (e.g., multivariate, spatial-only, or spatio-temporal-only data). Hence, the implications of our modeling framework may have a large impact on the general problem of jointly modeling correlated count-valued data. We show the effectiveness of our approach through a simulation study. Additionally, we demonstrate our proposed methodology with an important application analyzing data obtained from the Longitudinal Employer-Household Dynamics (LEHD) program, which is administered by the U.S. Census Bureau. JF - arXiv UR - https://arxiv.org/abs/1512.07273 ER - TY - JOUR T1 - Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data JF - Bayesian Analysis Y1 - 2017 A1 - Hu, Jingchen A1 - Reiter, Jerome P A1 - Wang, Quanli AB - We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online. UR - http://projecteuclid.org/euclid.ba/1485227030 ER - TY - RPRT T1 - Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Y1 - 2017 A1 - Weinberg, Daniel A1 - Abowd, John M. A1 - Belli, Robert F. A1 - Cressie, Noel A1 - Folch, David C. A1 - Holan, Scott H. A1 - Levenstein, Margaret C. A1 - Olson, Kristen M. A1 - Reiter, Jerome P. A1 - Shapiro, Matthew D. A1 - Smyth, Jolene A1 - Soh, Leen-Kiat A1 - Spencer, Bruce A1 - Spielman, Seth E. A1 - Vilhuber, Lars A1 - Wikle, Christopher AB -

Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong

PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52650 ER - TY - RPRT T1 - Formal Privacy Models and Title 13 Y1 - 2017 A1 - Nissim, Kobbi A1 - Gasser, Urs A1 - Smith, Adam A1 - Vadhan, Salil A1 - O'Brien, David A1 - Wood, Alexandra AB - Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52164 ER - TY - RPRT T1 - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Y1 - 2017 A1 - Nissim, Kobbi A1 - Gasser, Urs A1 - Smith, Adam A1 - Vadhan, Salil A1 - O'Brien, David A1 - Wood, Alexandra AB - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models. PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/52164 ER - TY - JOUR T1 - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error JF - Journal of the Royal Statistical Society -- Series B. Y1 - 2017 A1 - Bradley, J.R. A1 - Wikle, C.K. A1 - Holan, S.H. KW - American Community Survey KW - empirical orthogonal functions KW - MAUP KW - Reduced rank KW - Spatial basis functions KW - Survey data AB - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds. UR - https://arxiv.org/abs/1502.01974 ER - TY - JOUR T1 - Visualizing uncertainty in areal data estimates with bivariate choropleth maps, map pixelation, and glyph rotation JF - Stat Y1 - 2017 A1 - Lucchesi, L.R. A1 - Wikle, C.K. AB - In statistics, we quantify uncertainty to help determine the accuracy of estimates, yet this crucial piece of information is rarely included on maps visualizing areal data estimates. We develop and present three approaches to include uncertainty on maps: (1) the bivariate choropleth map repurposed to visualize uncertainty; (2) the pixelation of counties to include values within an estimate's margin of error; and (3) the rotation of a glyph, located at a county's centroid, to represent an estimate's uncertainty. The second method is presented as both a static map and visuanimation. We use American Community Survey estimates and their corresponding margins of error to demonstrate the methods and highlight the importance of visualizing uncertainty in areal data. An extensive online supplement provides the R code necessary to produce the maps presented in this article as well as alternative versions of them. VL - 6 UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.150/abstract IS - 1 ER - TY - JOUR T1 - Bayesian Hierarchical Models with Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family JF - Journal of the American Statistical Association - T&M. Y1 - 2016 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K. AB - We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce something we call the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. The results in this manuscript are extremely general, and can be adapted to many different settings. We demonstrate the proposed methodology through simulated examples and analyses based on estimates obtained from the US Census Bureaus' American Community Survey (ACS). UR - https://arxiv.org/abs/1701.07506 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis JF - Bayesian Analysis Y1 - 2016 A1 - Yang, W.H. A1 - Holan, S.H. A1 - Wikle, C.K. AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - https://arxiv.org/abs/1408.2757 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2016 A1 - Bradley, J.R. A1 - Wikle, C.K. A1 - Holan, S.H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - https://arxiv.org/abs/1405.7227 ER - TY - JOUR T1 - Generating Partially Synthetic Geocoded Public Use Data with Decreased Disclosure Risk Using Differential Smoothing JF - Journal of the Royal Statistical Society - Series A Y1 - 2016 A1 - Quick, H. A1 - Holan, S.H. A1 - Wikle, C.K. AB - When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies prior to making data publicly available due to data privacy obligations. An alternative to releasing aggregated and/or perturbed data is to release multiply-imputed synthetic data, where sensitive values are replaced with draws from statistical models designed to capture important distributional features in the collected data. One issue that has received relatively little attention, however, is how to handle spatially outlying observations in the collected data, as common spatial models often have a tendency to overfit these observations. The goal of this work is to bring this issue to the forefront and propose a solution, which we refer to as "differential smoothing." After implementing our method on simulated data, highlighting the effectiveness of our approach under various scenarios, we illustrate the framework using data consisting of sale prices of homes in San Francisco. UR - https://arxiv.org/abs/1507.05529 ER - TY - JOUR T1 - Multivariate Spatio-Temporal Survey Fusion with Application to the American Community Survey and Local Area Unemployment Statistics JF - Stat Y1 - 2016 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K AB - There are often multiple surveys available that estimate and report related demographic variables of interest that are referenced over space and/or time. Not all surveys produce the same information, and thus, combining these surveys typically leads to higher quality estimates. That is, not every survey has the same level of precision nor do they always provide estimates of the same variables. In addition, various surveys often produce estimates with incomplete spatio-temporal coverage. By combining surveys using a Bayesian approach, we can account for different margins of error and leverage dependencies to produce estimates of every variable considered at every spatial location and every time point. Specifically, our strategy is to use a hierarchical modelling approach, where the first stage of the model incorporates the margin of error associated with each survey. Then, in a lower stage of the hierarchical model, the multivariate spatio-temporal mixed effects model is used to incorporate multivariate spatio-temporal dependencies of the processes of interest. We adopt a fully Bayesian approach for combining surveys; that is, given all of the available surveys, the conditional distributions of the latent processes of interest are used for statistical inference. To demonstrate our proposed methodology, we jointly analyze period estimates from the US Census Bureau's American Community Survey, and estimates obtained from the Bureau of Labor Statistics Local Area Unemployment Statistics program. Copyright © 2016 John Wiley & Sons, Ltd. UR - http://onlinelibrary.wiley.com/doi/10.1002/sta4.120/full ER - TY - RPRT T1 - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach Y1 - 2016 A1 - Redding, Stephen J. A1 - Weinstein, David E. AB - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach Redding, Stephen J.; Weinstein, David E. PB - University of Michigan UR - http://hdl.handle.net/1813/45821 ER - TY - RPRT T1 - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study Y1 - 2016 A1 - Mccue, Kristin A1 - Abowd, John A1 - Levenstein, Margaret A1 - Patki, Dhiren A1 - Rodgers, Ann A1 - Shapiro, Matthew A1 - Wasi, Nada AB - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting PB - University of Michigan UR - http://hdl.handle.net/1813/43895 ER - TY - JOUR T1 - Releasing synthetic magnitude micro data constrained to fixed marginal totals JF - Statistical Journal of the International Association for Official Statistics Y1 - 2016 A1 - Wei, Lan A1 - Reiter, Jerome P. KW - Confidential KW - Disclosure KW - establishment KW - mixture KW - poisson KW - risk AB - We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariate distribution, and release draws from the posterior predictive distribution of the model. We develop approaches that guarantee the synthetic data sum to marginal totals computed from the original data, as well approaches that do not enforce this equality. For both cases, we present methods for assessing disclosure risks inherent in releasing synthetic magnitude microdata. We illustrate the methodology using economic data from a survey of manufacturing establishments. VL - 32 UR - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji959 IS - 1 ER - TY - JOUR T1 - Bayesian Analysis of Spatially-Dependent Functional Responses with Spatially-Dependent Multi-Dimensional Functional Predictors JF - Statistica Sinica Y1 - 2015 A1 - Yang, W. H. A1 - Wikle, C.K. A1 - Holan, S.H. A1 - Sudduth, K. A1 - Meyers, D.B. VL - 25 UR - http://www3.stat.sinica.edu.tw/preprint/SS-13-245w_Preprint.pdf ER - TY - JOUR T1 - Bayesian Binomial Mixture Models for Estimating Abundance in Ecological Monitoring Studies JF - Annals of Applied Statistics Y1 - 2015 A1 - Wu, G. A1 - Holan, S.H. A1 - Nilon, C.H. A1 - Wikle, C.K. VL - 9 UR - http://projecteuclid.org/euclid.aoas/1430226082 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis JF - ArXiv Y1 - 2015 A1 - Yang, W. H. A1 - Holan, S. H. A1 - Wikle, C.K. AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - http://arxiv.org/abs/1408.2757 IS - 1408.2757 ER - TY - JOUR T1 - Bayesian Lattice Filters for Time-Varying Autoregression and Time–Frequency Analysis JF - Project Euclid Y1 - 2015 A1 - Yang, W. H. A1 - Holan, Scott H. A1 - Wikle, Christopher K. KW - locally stationary KW - model selection KW - nonstationary partial autocorrelation KW - piecewise stationary KW - sequential estimation KW - time-varying spectral density AB - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time–frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption). UR - http://projecteuclid.org/euclid.ba/1445263834 ER - TY - JOUR T1 - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography JF - Spatial Statistics Y1 - 2015 A1 - Quick, Harrison A1 - Holan, Scott H. A1 - Wikle, Christopher K. A1 - Reiter, Jerome P. VL - 14 UR - http://www.sciencedirect.com/science/article/pii/S2211675315000718 ER - TY - JOUR T1 - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography JF - ArXiv Y1 - 2015 A1 - Quick, H. A1 - Holan, S. H. A1 - Wikle, C. K. A1 - Reiter, J. P. AB - Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protecting the confidentiality of data subjects' identities and attributes. Typically, data stewards meet this challenge by coarsening the resolution of the released geography and, as needed, perturbing the confidential attributes. When done with high intensity, these redaction strategies can result in released data with poor analytic quality. We propose an alternative dissemination approach based on fully synthetic data. We generate data using marked point process models that can maintain both the statistical properties and the spatial dependence structure of the confidential data. We illustrate the approach using data consisting of mortality records from Durham, North Carolina. UR - http://arxiv.org/abs/1407.7795 IS - 1407.7795 ER - TY - JOUR T1 - Bayesian Semiparametric Hierarchical Empirical Likelihood Spatial Models JF - Journal of Statistical Planning and Inference Y1 - 2015 A1 - Porter, A.T. A1 - Holan, S.H. A1 - Wikle, C.K. VL - 165 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2015 A1 - Bradley, Jonathan A1 - Wikle, C.K. A1 - Holan, S. H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey JF - Journal of the American Statistical Association Y1 - 2015 A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. A1 - Holan, Scott H. KW - Aggregation KW - American Community Survey KW - Bayesian hierarchical model KW - Givens angle prior KW - Markov chain Monte Carlo KW - Multiscale model KW - Non-Gaussian. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471 ER - TY - JOUR T1 - Bayesian Spatial Change of Support for Count–Valued Survey Data JF - ArXiv Y1 - 2015 A1 - Bradley, J. R. A1 - Wikle, C.K. A1 - Holan, S. H. AB - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data. UR - http://arxiv.org/abs/1405.7227 IS - 1405.7227 ER - TY - JOUR T1 - Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the “Great Recession:” Spatial Differentiation in Remotely Sensed Land-Cover Dynamics JF - Population and Environment Y1 - 2015 A1 - Wilson, C. R. A1 - Brown, D. G. VL - 36 UR - http://link.springer.com/article/10.1007%2Fs11111-014-0219-y IS - 3 ER - TY - JOUR T1 - Comment on ``Semiparametric Bayesian Density Estimation with Disparate Data Sources: A Meta-Analysis of Global Childhood Undernutrition" by Finncane, M. M., Paciorek, C. J., Stevens, G. A., and Ezzati, M. JF - Journal of the American Statistical Association Y1 - 2015 A1 - Wikle, C.K. A1 - Holan, S.H. ER - TY - CONF T1 - Determining Potential for Breakoff in Time Diary Survey Using Paradata T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Wettlaufer, D. A1 - Arunachalam, H. A1 - Atkin, G. A1 - Eck, A. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Dirichlet Process Mixture Models for Nested Categorical Data JF - ArXiv Y1 - 2015 A1 - Hu, J. A1 - Reiter, J.P. A1 - Wang, Q. AB - We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files with high analytic validity and low disclosure risks. Supplementary materials for this article are available online. UR - http://arxiv.org/pdf/1412.2282v3.pdf IS - 1412.2282 ER - TY - JOUR T1 - Expanding the Discourse on Antipoverty Policy: Reconsidering a Negative Income Tax JF - Journal of Poverty Y1 - 2015 A1 - Jessica Wiederspan A1 - Elizabeth Rhodes A1 - H. Luke Shaefer KW - economic well-being KW - poverty alleviation KW - public policy KW - social welfare policy AB - This article proposes that advocates for the poor consider the replacement of the current means-tested safety net in the United States with a Negative Income Tax (NIT), a guaranteed income program that lifts families’ incomes above a minimum threshold. The article highlights gaps in service provision that leave millions in poverty, explains how a NIT could help fill those gaps, and compares current expenditures on major means-tested programs to estimated expenditures necessary for a NIT. Finally, it addresses the financial and political concerns that are likely to arise in the event that a NIT proposal gains traction among policy makers. VL - 19 UR - http://dx.doi.org/10.1080/10875549.2014.991889 ER - TY - CONF T1 - Grids and Online Panels: A Comparison of Device Type from a Survey Quality Perspective T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Wang, Mengyang A1 - McCutcheon, Allan L. A1 - Allen, Laura JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CHAP T1 - Hierarchcial models for uncertainty quantification: An overview T2 - Handbook of Uncertainty Quantification Y1 - 2015 A1 - Wikle, C.K. ED - Ghanem, R. ED - Higdon, D. ED - Owhadi, H. JF - Handbook of Uncertainty Quantification PB - Springer ER - TY - CHAP T1 - Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete Valued Data T2 - Handbook of Discrete-Valued Time Series Y1 - 2015 A1 - Wikle, C.K. A1 - Hooten, M.B. ED - Davis, R. ED - Holan, S. ED - Lund, R. ED - Ravishanker, N. JF - Handbook of Discrete-Valued Time Series PB - Chapman and Hall/CRC Press CY - Boca Raton, FL. UR - http://www.crcpress.com/product/isbn/9781466577732 ER - TY - CHAP T1 - Hierarchical Dynamic Generalized Linear Mixed Models for Discrete-Valued Spatio-Temporal Data T2 - Handbook of Discrete-Valued Time Series Y1 - 2015 A1 - Holan, S.H. A1 - Wikle, C.K. ED - Davis, R. ED - Holan, S. ED - Lund, R. ED - Ravishanker, N JF - Handbook of Discrete-Valued Time Series PB - Chapman and Hall/CRC Press CY - Boca Raton, FL SN - ISBN 9781466577732 UR - http://www.crcpress.com/product/isbn/9781466577732 N1 - to appear in "Handbook of Discrete-Valued Time Series ER - TY - CHAP T1 - Hierarchical Dynamic Generalized Linear Mixed Models for Discrete--Valued Spatio-Temporal Data T2 - Handbook of Discrete--Valued Time Series Y1 - 2015 A1 - Holan, S.H. A1 - Wikle, C.K. JF - Handbook of Discrete--Valued Time Series ER - TY - CHAP T1 - Hierarchical Spatial Models T2 - Encyclopedia of Geographical Information Science Y1 - 2015 A1 - Arab, A. A1 - Hooten, M.B. A1 - Wikle, C.K. JF - Encyclopedia of Geographical Information Science PB - Springer ER - TY - JOUR T1 - Hierarchical, stochastic modeling across spatiotemporal scales of large river ecosystems and somatic growth in fish populations under various climate models: Missouri River sturgeon example JF - Geological Society Y1 - 2015 A1 - Wildhaber, M.L. A1 - Wikle, C.K. A1 - Moran, E.H. A1 - Anderson, C.J. A1 - Franz, K.J. A1 - Dey, R. ER - TY - CONF T1 - I Know What You Did Next: Predicting Respondent’s Next Activity Using Machine Learning T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Arunachalam, H. A1 - Atkin, G. A1 - Eck, A. A1 - Wettlaufer, D. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Modern Perspectives on Statistics for Spatio-Temporal Data JF - WIRES Computational Statistics Y1 - 2015 A1 - Wikle, C.K. VL - 7 UR - http://dx.doi.org/10.1002/wics.1341 IS - 1 ER - TY - ICOMM T1 - Multiscale Analysis of Survey Data: Recent Developments and Exciting Prospects Y1 - 2015 A1 - Bradley, J.R. A1 - Wikle, C.K. A1 - Holan, S.H. JF - Statistics Views ER - TY - JOUR T1 - Multivariate Spatial Hierarchical Bayesian Empirical Likelihood Methods for Small Area Estimation JF - STAT Y1 - 2015 A1 - Porter, A.T. A1 - Holan, S.H. A1 - Wikle, C.K. VL - 4 UR - http://dx.doi.org/10.1002/sta4.81 IS - 1 ER - TY - JOUR T1 - Multivariate Spatio-Temporal Models for High-Dimensional Areal Data with Application to Longitudinal Employer-Household Dynamics JF - ArXiv Y1 - 2015 A1 - Bradley, J. R. A1 - Holan, S. H. A1 - Wikle, C.K. AB - Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal datasets are extremely high-dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal datasets. The proposed MSTM extends the notion of Moran's I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using {a novel} parameter model. UR - http://arxiv.org/abs/1503.00982 IS - 1503.00982 ER - TY - JOUR T1 - Multivariate Spatio-Temporal Models for High-Dimensional Areal Data with Application to Longitudinal Employer-Household Dynamics JF - Annals of Applied Statistics Y1 - 2015 A1 - Bradley, J.R. A1 - Holan, S.H. A1 - Wikle, C.K. AB - Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal datasets are extremely high-dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal datasets. The proposed MSTM extends the notion of Moran’s I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using a novel parameter model. VL - 9 IS - 4 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data Y1 - 2015 A1 - Holan, Scott A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. AB - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data Holan, Scott; Bradley, Jonathan R.; Wikle, Christopher K. Presentation at the NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40176 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error Y1 - 2015 A1 - Wikle, Christopher K. A1 - Bradley, Jonathan A1 - Holan, Scott AB - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error Wikle, Christopher K.; Bradley, Jonathan; Holan, Scott Develop and implement a statistical criterion to diagnose spatial aggregation error that can facilitate the choice of regionalizations of spatial data. Presentation at NCRN Meeting Spring 2015 PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40177 ER - TY - RPRT T1 - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Y1 - 2015 A1 - Cressie, Noel A1 - Holan, Scott H. A1 - Wikle, Christopher K. AB - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Cressie, Noel; Holan, Scott H.; Wikle, Christopher K. Presentation at the NCRN Spring 2015 Meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/40179 ER - TY - JOUR T1 - Record Linkage using STATA: Pre-processing, Linking and Reviewing Utilities JF - The Stata Journal Y1 - 2015 A1 - Wasi, Nada A1 - Flaaen, Aaron AB - In this article, we describe Stata utilities that facilitate probabilistic record linkage—the technique typically used for merging two datasets with no common record identifier. While the preprocessing tools are developed specifically for linking two company databases, the other tools can be used for many different types of linkage. Specifically, the stnd_compname and stnd_address commands parse and standardize company names and addresses to improve the match quality when linking. The reclink2 command is a generalized version of Blasnik's reclink (2010, Statistical Software Components S456876, Department of Economics, Boston College) that allows for many-to-one matching. Finally, clrevmatch is an interactive tool that allows the user to review matched results in an efficient and seamless manner. Rather than exporting results to another file format (for example, Excel), inputting clerical reviews, and importing back into Stata, one can use the clrevmatch tool to conduct all of these steps within Stata. This helps improve the speed and flexibility of matching, which often involves multiple runs. VL - 15 UR - http://www.stata-journal.com/article.html?article=dm0082 IS - 3 ER - TY - JOUR T1 - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error JF - ArXiv Y1 - 2015 A1 - Bradley, J. R. A1 - Wikle, C.K. A1 - Holan, S. H. AB - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds. UR - http://arxiv.org/abs/1502.01974 IS - 1502.01974 ER - TY - JOUR T1 - Simultaneous Edit-Imputation for Continuous Microdata JF - Journal of the American Statistical Association Y1 - 2015 A1 - Kim, H. J. A1 - Cox, L. H. A1 - Karr, A. F. A1 - Reiter, J. P. A1 - Wang, Q. VL - 110 UR - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881 ER - TY - JOUR T1 - Small Area Estimation via Multivariate Fay-Herriot Models With Latent Spatial Dependence JF - Australian & New Zealand Journal of Statistics Y1 - 2015 A1 - Porter, A.T. A1 - Wikle, C.K. A1 - Holan, S.H. VL - 57 UR - http://arxiv.org/abs/1310.7211 ER - TY - JOUR T1 - Spatio-temporal change of support with application to American Community Survey multi-year period estimates JF - Stat Y1 - 2015 A1 - Bradley, Jonathan R. A1 - Wikle, Christopher K. A1 - Holan, Scott H. KW - Bayesian KW - change-of-support KW - dynamical KW - hierarchical models KW - mixed-effects model KW - Moran's I KW - multi-year period estimate AB - We present hierarchical Bayesian methodology to perform spatio-temporal change of support (COS) for survey data with Gaussian sampling errors. This methodology is motivated by the American Community Survey (ACS), which is an ongoing survey administered by the US Census Bureau that provides timely information on several key demographic variables. The ACS has published 1-year, 3-year, and 5-year period estimates, and margins of errors, for demographic and socio-economic variables recorded over predefined geographies. The spatio-temporal COS methodology considered here provides data users with a way to estimate ACS variables on customized geographies and time periods while accounting for sampling errors. Additionally, 3-year ACS period estimates are to be discontinued, and this methodology can provide predictions of ACS variables for 3-year periods given the available period estimates. The methodology is based on a spatio-temporal mixed-effects model with a low-dimensional spatio-temporal basis function representation, which provides multi-resolution estimates through basis function aggregation in space and time. This methodology includes a novel parameterization that uses a target dynamical process and recently proposed parsimonious Moran's I propagator structures. Our approach is demonstrated through two applications using public-use ACS estimates and is shown to produce good predictions on a hold-out set of 3-year period estimates. Copyright © 2015 John Wiley & Sons, Ltd. VL - 4 UR - http://dx.doi.org/10.1002/sta4.94 ER - TY - JOUR T1 - A stochastic bioenergetics model based approach to translating large river flow and temperature in to fish population responses: the pallid sturgeon example JF - Geological Society Y1 - 2015 A1 - Wildhaber, M.L. A1 - Dey, R. A1 - Wikle, C.K. A1 - Anderson, C.J. A1 - Moran, E.H. A1 - Franz, K.J. VL - 408 ER - TY - CONF T1 - Using Machine Learning Techniques to Predict Respondent Type from A Priori Demographic Information T2 - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) Y1 - 2015 A1 - Atkin, G. A1 - Arunachalam, H. A1 - Eck, A. A1 - Wettlaufer, D. A1 - Soh, L.-K. A1 - Belli, R.F. JF - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR) CY - Hollywood, Florida UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - JOUR T1 - Agent Based Models: Statistical Challenges and Opportunities JF - Statistics Views Y1 - 2014 A1 - Wikle, C.K. PB - Wiley UR - http://www.statisticsviews.com/details/feature/6354691/Agent-Based-Models-Statistical-Challenges-and-Opportunities.html ER - TY - JOUR T1 - Bayesian estimation of disclosure risks for multiply imputed, synthetic data JF - Journal of Privacy and Confidentiality Y1 - 2014 A1 - Reiter, J. P. A1 - Wang, Q. A1 - Zhang, B. AB -

Agencies seeking to disseminate public use microdata, i.e., data on individual records, can replace confidential values with multiple draws from statistical models estimated with the collected data. We present a famework for evaluating disclosure risks inherent in releasing multiply-imputed, synthetic data. The basic idea is to mimic an intruder who computes posterior distributions of confidential values given the released synthetic data and prior knowledge. We illustrate the methodology with artificial fully synthetic data and with partial synthesis of the Survey of Youth in Custody.

VL - 6 UR - http://repository.cmu.edu/jpc/vol6/iss1/2 IS - 1 ER - TY - RPRT T1 - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository Y1 - 2014 A1 - Lagoze, Carl A1 - Vilhuber, Lars A1 - Williams, Jeremy A1 - Perry, Benjamin A1 - Block, William C. AB - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository Lagoze, Carl; Vilhuber, Lars; Williams, Jeremy; Perry, Benjamin; Block, William C. We describe the design, implementation, and deployment of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR). This is a metadata repository system that allows researchers to search, browse, access, and cite confidential data and metadata through either a web-based user interface or programmatically through a search API, all the while re-reusing and linking to existing archive and provider generated metadata. CED 2 AR is distinguished from other metadata repository-based applications due to requirements that derive from its social science context. These include the need to cloak confidential data and metadata and manage complex provenance chains Presented at 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Sept 8-12, 2014 PB - Cornell University UR - http://hdl.handle.net/1813/44702 ER - TY - RPRT T1 - The Cepstral Model for Multivariate Time Series: The Vector Exponential Model. Y1 - 2014 A1 - Holan, S.H. A1 - McElroy, T.S. A1 - Wu, G. AB -

Vector autoregressive (VAR) models have become a staple in the analysis of multivariate time series and are formulated in the time domain as difference equations, with an implied covariance structure. In many contexts, it is desirable to work with a stable, or at least stationary, representation. To fit such models, one must impose restrictions on the coefficient matrices to ensure that certain determinants are nonzero; which, except in special cases, may prove burdensome. To circumvent these difficulties, we propose a flexible frequency domain model expressed in terms of the spectral density matrix. Specifically, this paper treats the modeling of covariance stationary vector-valued (i.e., multivariate) time series via an extension of the exponential model for the spectrum of a scalar time series. We discuss the modeling advantages of the vector exponential model and its computational facets, such as how to obtain Wold coefficients from given cepstral coefficients. Finally, we demonstrate the utility of our approach through simulation as well as two illustrative data examples focusing on multi-step ahead forecasting and estimation of squared coherence.

PB - arXiv UR - http://arxiv.org/abs/1406.0801 ER - TY - CONF T1 - Data Quality among Devices to Complete Surveys: Comparing Personal Computers, Smartphones and Tablets T2 - Midwest Association for Public Opinion Research Annual Meeting Y1 - 2014 A1 - Wang, Mengyang A1 - McCutcheon, Allan L. JF - Midwest Association for Public Opinion Research Annual Meeting CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - CHAP T1 - Disclosure risk evaluation for fully synthetic data T2 - Privacy in Statistical Databases Y1 - 2014 A1 - J. Hu A1 - J.P. Reiter A1 - Q. Wang JF - Privacy in Statistical Databases PB - Springer CY - Heidelberg VL - 8744 ER - TY - JOUR T1 - Multiple imputation of missing or faulty values under linear constraints JF - Journal of Business and Economic Statistics Y1 - 2014 A1 - Kim, H. J. A1 - Reiter, J. P. A1 - Wang, Q. A1 - Cox, L. H. A1 - Karr, A. F. AB -

Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.

VL - 32 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography Y1 - 2014 A1 - Quick, Harrison A1 - Holan, Scott A1 - Wikle, Christopher A1 - Reiter, Jerry AB - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography Quick, Harrison; Holan, Scott; Wikle, Christopher; Reiter, Jerry Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37750 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the "Great Recession" Y1 - 2014 A1 - Wilson, Courtney A1 - Brown, Daniel G. AB - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the "Great Recession" Wilson, Courtney; Brown, Daniel G. Presentation at Fall 2014 NCRN meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37446 ER - TY - RPRT T1 - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data Y1 - 2014 A1 - Bradley, Jonathan A1 - Holan, Scott A1 - Wikle, Christopher AB - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data Bradley, Jonathan; Holan, Scott; Wikle, Christopher Presentation from NCRN Fall 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/37749 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau Y1 - 2014 A1 - Block, William A1 - Brown, Warren A1 - Williams, Jeremy A1 - Vilhuber, Lars A1 - Lagoze, Carl AB - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau Block, William; Brown, Warren; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl presentation at NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36392 ER - TY - RPRT T1 - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL) Y1 - 2014 A1 - Gathright, Graton A1 - Kutzbach, Mark A1 - Mccue, Kristin A1 - McEntarfer, Erika A1 - Monti, Holly A1 - Trageser, Kelly A1 - Vilhuber, Lars A1 - Wasi, Nada A1 - Wignall, Christopher AB - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL) Gathright, Graton; Kutzbach, Mark; Mccue, Kristin; McEntarfer, Erika; Monti, Holly; Trageser, Kelly; Vilhuber, Lars; Wasi, Nada; Wignall, Christopher Presentation for NCRN Spring 2014 meeting PB - NCRN Coordinating Office UR - http://hdl.handle.net/1813/36396 ER - TY - JOUR T1 - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates JF - Spatial Statistics Y1 - 2014 A1 - Porter, A. T., A1 - Holan, S.H., A1 - Wikle, C.K., A1 - Cressie, N. VL - 10 UR - http://arxiv.org/pdf/1303.6668v3.pdf ER - TY - CONF T1 - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences T2 - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) Y1 - 2014 A1 - Woodruff, A. A1 - Pihur, V. A1 - Acquisti, A. A1 - Consolvo, S. A1 - Schmidt, L. A1 - Brandimarte, L. JF - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS) PB - ACM CY - New York, NY UR - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff N1 - IAPP SOUPS Privacy Award Winner ER - TY - CONF T1 - Bayesian Modeling in the Era of Big Data: the Role of High-Throughput and High-Performance Computing T2 - The Extreme Science and Engineering Discovery Environment Conference Y1 - 2013 A1 - Wu, G. JF - The Extreme Science and Engineering Discovery Environment Conference CY - San Diego, CA ER - TY - CONF T1 - Binomial Mixture Models for Urban Ecological Monitoring Studies Using American Community Survey Demographic Covariates T2 - Joint Statistical Meetings 2013 Y1 - 2013 A1 - Wu, G. JF - Joint Statistical Meetings 2013 CY - Montreal, Canada ER - TY - JOUR T1 - Data Management of Confidential Data JF - International Journal of Digital Curation Y1 - 2013 A1 - Carl Lagoze A1 - William C. Block A1 - Jeremy Williams A1 - John M. Abowd A1 - Lars Vilhuber AB - Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data. VL - 8 N1 - Presented at 8th International Digital Curation Conference 2013, Amsterdam. See also http://hdl.handle.net/1813/30924 ER - TY - CONF T1 - Do ‘Don’t Know’ Responses = Survey Satisficing? Evidence from the Gallup Panel Paradata T2 - American Association for Public Opinion Research 2013 Annual Conference Y1 - 2013 A1 - Wang, Mengyang A1 - Ruppanner, Leah A1 - McCutcheon, Allan L. JF - American Association for Public Opinion Research 2013 Annual Conference CY - Boston, MA UR - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx ER - TY - CONF T1 - Ecological Prediction with Nonlinear Multivariate Time-Frequency Functional Data Models T2 - Joint Statistical Meetings 2013 Y1 - 2013 A1 - Wikle, C.K. JF - Joint Statistical Meetings 2013 CY - Montreal, Canada ER - TY - JOUR T1 - Ecological Prediction With Nonlinear Multivariate Time-Frequency Functional Data Models JF - Journal of Agricultural, Biological, and Environmental Statistics Y1 - 2013 A1 - Yang, W.H., A1 - Wikle, C.K. A1 - Holan, S.H. A1 - Wildhaber, M.L. VL - 18 UR - http://link.springer.com/article/10.1007/s13253-013-0142-1 ER - TY - CONF T1 - Encoding Provenance Metadata for Social Science Datasets T2 - Metadata and Semantics Research Y1 - 2013 A1 - Lagoze, Carl A1 - Willliams, Jeremy A1 - Vilhuber, Lars ED - Garoufallou, Emmanouel ED - Greenberg, Jane KW - DDI KW - eSocial Science KW - Metadata KW - Provenance JF - Metadata and Semantics Research T3 - Communications in Computer and Information Science PB - Springer International Publishing VL - 390 SN - 978-3-319-03436-2 UR - http://dx.doi.org/10.1007/978-3-319-03437-9_13 ER - TY - RPRT T1 - Encoding Provenance of Social Science Data: Integrating PROV with DDI Y1 - 2013 A1 - Lagoze, Carl A1 - Block, William C A1 - Williams, Jeremy A1 - Abowd, John A1 - Vilhuber, Lars AB - Encoding Provenance of Social Science Data: Integrating PROV with DDI Lagoze, Carl; Block, William C; Williams, Jeremy; Abowd, John; Vilhuber, Lars Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. Submitted to EDDI13 5th Annual European DDI User Conference December 2013, Paris, France PB - Cornell University UR - http://hdl.handle.net/1813/34443 ER - TY - CONF T1 - Encoding Provenance of Social Science Data: Integrating PROV with DDI T2 - 5th Annual European DDI User Conference Y1 - 2013 A1 - Carl Lagoze A1 - William C. Block A1 - Jeremy Williams A1 - Lars Vilhuber KW - DDI KW - eSocial Science KW - Metadata KW - Provenance AB - Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. JF - 5th Annual European DDI User Conference ER - TY - JOUR T1 - From Facebook Regrets to Facebook Privacy Nudges JF - Ohio State Law Journal Y1 - 2013 A1 - Wang, Y. A1 - Leon, P. G. A1 - Chen, X. A1 - Komanduri, S. A1 - Norcie, G. A1 - Scott, K. A1 - Acquisti, A. A1 - Cranor, L. F. A1 - Sadeh, N. N1 - Invited paper ER - TY - JOUR T1 - Hierarchical Bayesian Spatio-Temporal Conway-Maxwell Poisson Models with Dynamic Dispersion JF - Journal of Agricultural, Biological, and Environmental Statistics Y1 - 2013 A1 - Wu, G. A1 - Holan, S.H. A1 - Wikle, C.K. CY - Anchorage, Alaska VL - 18 UR - http://link.springer.com/article/10.1007/s13253-013-0141-2 ER - TY - JOUR T1 - Hierarchical Spatio-Temporal Models and Survey Research JF - Statistics Views Y1 - 2013 A1 - Wikle, C. A1 - Holan, S. A1 - Cressie, N. UR - http://www.statisticsviews.com/details/feature/4730991/Hierarchical-Spatio-Temporal-Models-and-Survey-Research.html ER - TY - ABST T1 - How can survey estimates of small areas be improved by leveraging social-media data? Y1 - 2013 A1 - Cressie, N. A1 - Holan, S. A1 - Wikle, C. JF - The Survey Statistician UR - http://isi.cbs.nl/iass/N68.pdf ER - TY - RPRT T1 - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files Y1 - 2013 A1 - Block, William C. A1 - Williams, Jeremy A1 - Vilhuber, Lars A1 - Lagoze, Carl A1 - Brown, Warren A1 - Abowd, John M. AB - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files Block, William C.; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl; Brown, Warren; Abowd, John M. Presentation at NADDI 2013 This record has also been archived at http://kuscholarworks.ku.edu/dspace/handle/1808/11093 . PB - Cornell University UR - http://hdl.handle.net/1813/33362 ER - TY - RPRT T1 - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Y1 - 2013 A1 - Vilhuber, Lars A1 - Abowd, John A1 - Block, William A1 - Lagoze, Carl A1 - Williams, Jeremy AB - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Vilhuber, Lars; Abowd, John; Block, William; Lagoze, Carl; Williams, Jeremy Social science researchers are increasingly interested in making use of confidential micro-data that contains linkages to the identities of people, corporations, etc. The value of this linking lies in the potential to join these identifiable entities with external data such as genome data, geospatial information, and the like. Leveraging these linkages is an essential aspect of “big data” scholarship. However, the utility of these confidential data for scholarship is compromised by the complex nature of their management and curation. This makes it difficult to fulfill US federal data management mandates and interferes with basic scholarly practices such as validation and reuse of existing results. We describe in this paper our work on the CED2AR prototype, a first step in providing researchers with a tool that spans the confidential/publicly-accessible divide, making it possible for researchers to identify, search, access, and cite those data. The particular points of interest in our work are the cloaking of metadata fields and the expression of provenance chains. For the former, we make use of existing fields in the DDI (Data Description Initiative) specification and suggest some minor changes to the specification. For the latter problem, we investigate the integration of DDI with recent work by the W3C PROV working group that has developed a generalizable and extensible model for expressing data provenance. PB - Cornell University UR - http://hdl.handle.net/1813/34534 ER - TY - CONF T1 - Nonlinear Dynamic Spatio-Temporal Statistical Models T2 - Southern Regional Council on Statistics Summer Research Conference Y1 - 2013 A1 - Wikle, C.K. JF - Southern Regional Council on Statistics Summer Research Conference ER - TY - CHAP T1 - Spatio-temporal Design: Advances in Efficient Data Acquisition T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2013 A1 - Holan, S. A1 - Wikle, C. ED - Jorge Mateu ED - Werner Muller KW - semiparametric dynamic design for non-Gaussian spatio-temporal data JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley SN - 9780470974292 ER - TY - ABST T1 - Statistics and the Environment: Overview and Challenges Y1 - 2013 A1 - Wikle, C.K. N1 - Invited Introductory Overview Lecture ER - TY - THES T1 - Using Satellite Imagery to Evaluate and Analyze Socioeconomic Changes Observed with Census Data Y1 - 2013 A1 - Wilson, C. R. N1 - NCRN ER - TY - JOUR T1 - An Approach for Identifying and Predicting Economic Recessions in Real-Time Using Time-Frequency Functional Models JF - Applied Stochastic Models in Business and Industry Y1 - 2012 A1 - Holan, S. A1 - Yang, W. A1 - Matteson, D. A1 - Wikle, C.K. KW - Bayesian model averaging KW - business cycles KW - empirical orthogonal functions KW - functional data KW - MIDAS KW - spectrogram KW - stochastic search variable selection VL - 28 UR - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1954/full N1 - DOI: 10.1002/asmb.1954 ER - TY - JOUR T1 - Bayesian Multi-Regime Smooth Transition Regression with Ordered Categorical Variables JF - Computational Statistics and Data Analysis Y1 - 2012 A1 - Wang, J. A1 - Holan, S. VL - 56 UR - http://dx.doi.org/10.1016/j.csda.2012.04.018 N1 - http://dx.doi.org/10.1016/j.csda.2012.04.018 ER - TY - CONF T1 - Change of Support in Spatio-Temporal Dynamical Models T2 - Joint Statistical Meetings Y1 - 2012 A1 - Wikle, C.K. JF - Joint Statistical Meetings CY - Montreal, Canada ER - TY - RPRT T1 - Data Management of Confidential Data Y1 - 2012 A1 - Lagoze, Carl A1 - Block, William C. A1 - Williams, Jeremy A1 - Abowd, John M. A1 - Vilhuber, Lars AB - Data Management of Confidential Data Lagoze, Carl; Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data. PB - Cornell University UR - http://hdl.handle.net/1813/30924 ER - TY - RPRT T1 - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) Y1 - 2012 A1 - Block, William C. A1 - Williams, Jeremy A1 - Abowd, John M. A1 - Vilhuber, Lars A1 - Lagoze, Carl AB - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars; Lagoze, Carl This presentation will demonstrate the latest DDI-related technological developments of Cornell University’s $3 million NSF-Census Research Network (NCRN) award, dedicated to improving the documentation, discoverability, and accessibility of public and restricted data from the federal statistical system in the United States. The current internal name for our DDI-based system is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). CED²AR ingests metadata from heterogeneous sources and supports filtered synchronization between restricted and public metadata holdings. Currently-supported CED²AR “connector workflows” include mechanisms to ingest IPUMS, zero-observation files from the American Community Survey (DDI 2.1), and SIPP Synthetic Beta (DDI 1.2). These disparate metadata sources are all transformed into a DDI 2.5 compliant form and stored in a single repository. In addition, we will demonstrate an extension to DDI 2.5 that allows for the labeling of elements within the schema to indicate confidentiality. This metadata can then be filtered, allowing the creation of derived public use metadata from an original confidential source. This repository is currently searchable online through a prototype application demonstrating the ability to search across previously heterogeneous metadata sources. Presentation at the 4th Annual European DDI User Conference (EDDI12), Norwegian Social Science Data Services, Bergen, Norway, 3 December, 2012 PB - Cornell University UR - http://hdl.handle.net/1813/30922 ER - TY - CONF T1 - The Economics of Privacy T2 - The Oxford Handbook of the Digital Economy Y1 - 2012 A1 - Laura Brandimarte A1 - Alessandro Acquisti ED - Martin Peitz ED - Joel Waldfogel JF - The Oxford Handbook of the Digital Economy PB - Oxford University Press SN - 9780195397840 ER - TY - ABST T1 - Efficient Time-Frequency Representations in High-Dimensional Spatial and Spatio-Temporal Models Y1 - 2012 A1 - Wikle, C.K. ER - TY - RPRT T1 - Encoding Provenance Metadata for Social Science Datasets Y1 - 2012 A1 - Lagoze, Carl A1 - Williams, Jeremy A1 - Vilhuber, Lars AB - Encoding Provenance Metadata for Social Science Datasets Lagoze, Carl; Williams, Jeremy; Vilhuber, Lars Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and re- produce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard. Submitted to Metadata and Semantics Research (MTSR 2013) conference. PB - Cornell University UR - http://hdl.handle.net/1813/55327 ER - TY - CHAP T1 - Entropy Estimations Using Correlated Symmetric Stable Random Projections T2 - Advances in Neural Information Processing Systems 25 Y1 - 2012 A1 - Ping Li A1 - Cun-Hui Zhang ED - P. Bartlett ED - F.C.N. Pereira ED - C.J.C. Burges ED - L. Bottou ED - K.Q. Weinberger JF - Advances in Neural Information Processing Systems 25 UR - http://books.nips.cc/papers/files/nips25/NIPS2012_1456.pdf ER - TY - CONF T1 - Exploring interviewer and respondent interactions: An innovative behavior coding approach T2 - Midwest Association for Public Opinion Research 2012 Annual Conference Y1 - 2012 A1 - Walton, L. A1 - Stange, M. A1 - Powell, R. A1 - Belli, R.F. JF - Midwest Association for Public Opinion Research 2012 Annual Conference CY - Chicago, IL UR - http://www.mapor.org/conferences.html ER - TY - CONF T1 - Hierarchical General Quadratic Nonlinear Models for Spatio-Temporal Dynamics T2 - Red Raider Conference Y1 - 2012 A1 - Wikle, C.K. JF - Red Raider Conference PB - Texas Tech University CY - Lubbock, TX ER - TY - RPRT T1 - The NSF-Census Research Network: Cornell Node Y1 - 2012 A1 - Block, William C. A1 - Lagoze, Carl A1 - Vilhuber, Lars A1 - Brown, Warren A. A1 - Williams, Jeremy A1 - Arguillas, Florio AB - The NSF-Census Research Network: Cornell Node Block, William C.; Lagoze, Carl; Vilhuber, Lars; Brown, Warren A.; Williams, Jeremy; Arguillas, Florio Cornell University has received a $3M NSF-Census Research Network (NCRN) award to improve the documentation and discoverability of both public and restricted data from the federal statistical system. The current internal name for this project is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). The diagram to the right provides a high level architectural overview of the system to be implemented. The CED²AR will be based upon leading metadata standards such as the Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX) and be flexibly designed to ingest documentation from a variety of source files. It will permit synchronization between the public and confidential instances of the repository. The scholarly community will be able to use the CED²AR as it would a conventional metadata repository, deprived only of the values of certain confidential information, but not their metadata. The authorized user, working on the secure Census Bureau network, could use the CED²AR with full information in authorized domains. PB - Cornell University UR - http://hdl.handle.net/1813/30925 ER - TY - CHAP T1 - One Permutation Hashing T2 - Advances in Neural Information Processing Systems 25 Y1 - 2012 A1 - Ping Li A1 - Art Owen A1 - Cun-Hui Zhang ED - P. Bartlett ED - F.C.N. Pereira ED - C.J.C. Burges ED - L. Bottou ED - K.Q. Weinberger JF - Advances in Neural Information Processing Systems 25 UR - http://books.nips.cc/papers/files/nips25/NIPS2012_1436.pdf ER - TY - JOUR T1 - Rejoinder: An approach for identifying and predicting economic recessions in real time using time frequency functional models JF - Applied Stochastic Models in Business and Industry Y1 - 2012 A1 - Holan, S. A1 - Yang, W. A1 - Matteson, D. A1 - Wikle, C. VL - 28 UR - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1955/full ER - TY - CHAP T1 - Semiparametric Dynamic Design of Monitoring Networks for Non-Gaussian Spatio-Temporal Data T2 - Spatio-temporal Design: Advances in Efficient Data Acquisition Y1 - 2012 A1 - Holan, S. A1 - Wikle, C.K. ED - Jorge Mateu ED - Werner Muller JF - Spatio-temporal Design: Advances in Efficient Data Acquisition PB - Wiley CY - Chichester, UK UR - http://onlinelibrary.wiley.com/doi/10.1002/9781118441862.ch12/summary ER - TY - ABST T1 - Spatio-Temporal Statistics at Mizzou, Truman School of Public Affairs Y1 - 2012 A1 - Wikle, C.K. ER - TY - JOUR T1 - An ensemble quadratic echo state network for nonlinear spatio-temporal forecasting JF - Stat Y1 - 0 A1 - McDermott, P.L. A1 - Wikle, C.K. AB - Spatio-temporal data and processes are prevalent across a wide variety of scientific disciplines. These processes are often characterized by nonlinear time dynamics that include interactions across multiple scales of spatial and temporal variability. The data sets associated with many of these processes are increasing in size due to advances in automated data measurement, management, and numerical simulator output. Non- linear spatio-temporal models have only recently seen interest in statistics, but there are many classes of such models in the engineering and geophysical sciences. Tradi- tionally, these models are more heuristic than those that have been presented in the statistics literature, but are often intuitive and quite efficient computationally. We show here that with fairly simple, but important, enhancements, the echo state net- work (ESN) machine learning approach can be used to generate long-lead forecasts of nonlinear spatio-temporal processes, with reasonable uncertainty quantification, and at only a fraction of the computational expense of a traditional parametric nonlinear spatio-temporal models. UR - https://arxiv.org/abs/1708.05094 ER -