TY  - JOUR
T1  - Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion
JF  - Review of Economics and Statistics
Y1  - Submitted
A1  - T. Kirk White
A1  - Jerome P. Reiter
A1  - Amil Petrin
AB  - In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures 79% and 73% of observations respectively have imputed data for at least one variable used to compute total factor productivity. The Bureau primarily imputes for missing values using mean-imputation methods which can reduce the true underlying variance of the imputed variables. For every variable entering TFP in 2002 and 2007 we show the dispersion is significantly smaller in the Census mean-imputed versus the Census non-imputed data. As an alternative to mean imputation we show how to use classification and regression trees (CART) to allow for a distribution of multiple possible impute values based on other plants that are CART-algorithmically determined to be similar based on other observed variables. For 90% of the 473 industries in 2002 and the 84% of the 471 industries in 2007 we find that TFP dispersion increases as we move from Census mean-imputed data to Census non-imputed data to the CART-imputed data.
UR  - http://www.mitpressjournals.org/doi/abs/10.1162/REST_a_00678
ER  - 

TY  - JOUR
T1  - The Earned Income Tax Credit and Food Insecurity: Who Benefits?
Y1  - forthcoming
A1  - Shaefer, H.L.
A1  - Wilson, R.
ER  - 

TY  - JOUR
T1  - Adaptively-Tuned Particle Swarm Optimization with Application to Spatial Design
JF  - Stat
Y1  - 2017
A1  - Simpson, M.
A1  - Wikle, C.K.
A1  - Holan, S.H.
AB  - Particle swarm optimization (PSO) algorithms are a class of heuristic optimization algorithms that are attractive for complex optimization problems. We propose using PSO to solve spatial design problems, e.g. choosing new locations to add to an existing monitoring network. Additionally, we introduce two new classes of PSO algorithms that perform well in a wide variety of circumstances, called adaptively tuned PSO and adaptively tuned bare bones PSO. To illustrate these algorithms, we apply them to a common spatial design problem: choosing new locations to add to an existing monitoring network. Specifically, we consider a network in the Houston, TX, area for monitoring ambient ozone levels, which have been linked to out-of-hospital cardiac arrest rates. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA
VL  - 6
UR  - http://onlinelibrary.wiley.com/doi/10.1002/sta4.142/abstract
IS  - 1
ER  - 

TY  - JOUR
T1  - Bayesian Hierarchical Multi-Population Multistate Jolly-Seber Models with Covariates: Application to the Pallid Sturgeon Population Assessment Program
JF  - Journal of the American Statistical Association
Y1  - 2017
A1  - Wu, G.
A1  - Holan, S.H.
AB  - Estimating abundance for multiple populations is of fundamental importance to many ecological monitoring programs. Equally important is quantifying the spatial distribution and characterizing the migratory behavior of target populations within the study domain. To achieve these goals, we propose a Bayesian hierarchical multi-population multistate Jolly–Seber model that incorporates covariates. The model is proposed using a state-space framework and has several distinct advantages. First, multiple populations within the same study area can be modeled simultaneously. As a consequence, it is possible to achieve improved parameter estimation by “borrowing strength” across different populations. In many cases, such as our motivating example involving endangered species, this borrowing of strength is crucial, as there is relatively less information for one of the populations under consideration. Second, in addition to accommodating covariate information, we develop a computationally efficient Markov chain Monte Carlo algorithm that requires no tuning. Importantly, the model we propose allows us to draw inference on each population as well as on multiple populations simultaneously. Finally, we demonstrate the effectiveness of our method through a motivating example of estimating the spatial distribution and migration of hatchery and wild populations of the endangered pallid sturgeon (Scaphirhynchus albus), using data from the Pallid Sturgeon Population Assessment Program on the Lower Missouri River. Supplementary materials for this article are available online.
VL  - 112
UR  - http://www.tandfonline.com/doi/abs/10.1080/01621459.2016.1211531
IS  - 518
ER  - 

TY  - JOUR
T1  - The Cepstral Model for Multivariate Time Series: The Vector Exponential Model
JF  - Statistica Sinica
Y1  - 2017
A1  - Holan, S.H.
A1  - McElroy, T.S.
A1  - Wu, G.
KW  - Autocovariance matrix
KW  - Bayesian estimation
KW  - Cepstral
KW  - Coherence
KW  - Spectral density matrix
KW  - stochastic search variable selection
KW  - Wold coefficients.
AB  - Vector autoregressive (VAR) models have become a staple in the analysis of multivariate time series and are formulated in the time domain as difference equations, with an implied covariance structure. In many contexts, it is desirable to work with a stable, or at least stationary, representation. To fit such models, one must impose restrictions on the coefficient matrices to ensure that certain determinants are nonzero; which, except in special cases, may prove burdensome. To circumvent these difficulties, we propose a flexible frequency domain model expressed in terms of the spectral density matrix. Specifically, this paper treats the modeling of covariance stationary vector-valued (i.e., multivariate) time series via an extension of the exponential model for the spectrum of a scalar time series. We discuss the modeling advantages of the vector exponential model and its computational facets, such as how to obtain Wold coefficients from given cepstral coefficients. Finally, we demonstrate the utility of our approach through simulation as well as two illustrative data examples focusing on multi-step ahead forecasting and estimation of squared coherence.
VL  - 27
UR  - http://www3.stat.sinica.edu.tw/statistica/J27N1/J27N12/J27N12.html
ER  - 

TY  - RPRT
T1  - Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data. (With Discussion).
Y1  - 2017
A1  - Bradley, J.R.
A1  - Holan, S.H.
A1  - Wikle, C.K.
KW  - Aggregation
KW  - American Community Survey
KW  - Bayesian hierarchical model
KW  - Big Data
KW  - Longitudinal Employer-Household Dynamics (LEHD) program
KW  - Markov chain Monte Carlo
KW  - Non-Gaussian.
KW  - Quarterly Workforce Indicators
AB  - We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensional count-valued data. Our primary interest is when there are possibly millions of data points referenced over different variables, geographic regions, and times. This problem requires extensive methodological advancements, as jointly modeling correlated data of this size leads to the so-called "big n problem." The computational complexity of prediction in this setting is further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we develop a new computationally efficient distribution theory for this setting. In particular, we introduce a multivariate log-gamma distribution and provide substantial theoretical development including: results regarding conditional distributions, marginal distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. The results in this manuscript are extremely general, and can be used for data that exhibit fewer sources of dependency than what we consider (e.g., multivariate, spatial-only, or spatio-temporal-only data). Hence, the implications of our modeling framework may have a large impact on the general problem of jointly modeling correlated count-valued data. We show the effectiveness of our approach through a simulation study. Additionally, we demonstrate our proposed methodology with an important application analyzing data obtained from the Longitudinal Employer-Household Dynamics (LEHD) program, which is administered by the U.S. Census Bureau.
JF  - arXiv
UR  - https://arxiv.org/abs/1512.07273
ER  - 

TY  - JOUR
T1  - Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data
JF  - Bayesian Analysis
Y1  - 2017
A1  - Hu, Jingchen
A1  - Reiter, Jerome P
A1  - Wang, Quanli
AB  - We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online.
UR  - http://projecteuclid.org/euclid.ba/1485227030
ER  - 

TY  - RPRT
T1  - Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System?
Y1  - 2017
A1  - Weinberg, Daniel
A1  - Abowd, John M.
A1  - Belli, Robert F.
A1  - Cressie, Noel
A1  - Folch, David C.
A1  - Holan, Scott H.
A1  - Levenstein, Margaret C.
A1  - Olson, Kristen M.
A1  - Reiter, Jerome P.
A1  - Shapiro, Matthew D.
A1  - Smyth, Jolene
A1  - Soh, Leen-Kiat
A1  - Spencer, Bruce
A1  - Spielman, Seth E.
A1  - Vilhuber, Lars
A1  - Wikle, Christopher
AB  - <p>Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Secure the Future of the Federal Statistical System? Weinberg, Daniel; Abowd, John M.; Belli, Robert F.; Cressie, Noel; Folch, David C.; Holan, Scott H.; Levenstein, Margaret C.; Olson, Kristen M.; Reiter, Jerome P.; Shapiro, Matthew D.; Smyth, Jolene; Soh, Leen-Kiat; Spencer, Bruce; Spielman, Seth E.; Vilhuber, Lars; Wikle, Christopher The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN’s research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives. This paper began as a May 8, 2015 presentation to the National Academies of Science’s Committee on National Statistics by two of the principal investigators of the National Science Foundation-Census Bureau Research Network (NCRN) – John Abowd and the late Steve Fienberg (Carnegie Mellon University). The authors acknowledge the contributions of the other principal investigators of the NCRN who are not co-authors of the paper (William Block, William Eddy, Alan Karr, Charles Manski, Nicholas Nagle, and Rebecca Nugent), the co- principal investigators, and the comments of Patrick Cantwell, Constance Citro, Adam Eck, Brian Harris-Kojetin, and Eloise Parker. We note with sorrow the deaths of Stephen Fienberg and Allan McCutcheon, two of the original NCRN principal investigators. The principal investigators also wish to acknowledge Cheryl Eavey’s sterling grant administration on behalf of the NSF. The conclusions reached in this paper are not the responsibility of the National Science Foundation (NSF), the Census Bureau, or any of the institutions to which the authors belong</p>
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52650
ER  - 

TY  - RPRT
T1  - Formal Privacy Models and Title 13
Y1  - 2017
A1  - Nissim, Kobbi
A1  - Gasser, Urs
A1  - Smith, Adam
A1  - Vadhan, Salil
A1  - O'Brien, David
A1  - Wood, Alexandra
AB  - Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52164
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13
Y1  - 2017
A1  - Nissim, Kobbi
A1  - Gasser, Urs
A1  - Smith, Adam
A1  - Vadhan, Salil
A1  - O'Brien, David
A1  - Wood, Alexandra
AB  - NCRN Meeting Spring 2017: Formal Privacy Models and Title 13 Nissim, Kobbi; Gasser, Urs; Smith, Adam; Vadhan, Salil; O'Brien, David; Wood, Alexandra A new collaboration between academia and the Census Bureau to further the Bureau’s use of formal privacy models.
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/52164
ER  - 

TY  - JOUR
T1  - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error
JF  - Journal of the Royal Statistical Society -- Series B.
Y1  - 2017
A1  - Bradley, J.R.
A1  - Wikle, C.K.
A1  - Holan, S.H.
KW  - American Community Survey
KW  - empirical orthogonal functions
KW  - MAUP
KW  - Reduced rank
KW  - Spatial basis functions
KW  - Survey data
AB  - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds.
UR  - https://arxiv.org/abs/1502.01974
ER  - 

TY  - JOUR
T1  - Visualizing uncertainty in areal data estimates with bivariate choropleth maps, map pixelation, and glyph rotation
JF  - Stat
Y1  - 2017
A1  - Lucchesi, L.R.
A1  - Wikle, C.K.
AB  - In statistics, we quantify uncertainty to help determine the accuracy of estimates, yet this crucial piece of information is rarely included on maps visualizing areal data estimates. We develop and present three approaches to include uncertainty on maps: (1) the bivariate choropleth map repurposed to visualize uncertainty; (2) the pixelation of counties to include values within an estimate's margin of error; and (3) the rotation of a glyph, located at a county's centroid, to represent an estimate's uncertainty. The second method is presented as both a static map and visuanimation. We use American Community Survey estimates and their corresponding margins of error to demonstrate the methods and highlight the importance of visualizing uncertainty in areal data. An extensive online supplement provides the R code necessary to produce the maps presented in this article as well as alternative versions of them.
VL  - 6
UR  - http://onlinelibrary.wiley.com/doi/10.1002/sta4.150/abstract
IS  - 1
ER  - 

TY  - JOUR
T1  - Bayesian Hierarchical Models with Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family
JF  - Journal of the American Statistical Association - T&M.
Y1  - 2016
A1  - Bradley, J.R.
A1  - Holan, S.H.
A1  - Wikle, C.K.
AB  - We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce something we call the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. The results in this manuscript are extremely general, and can be adapted to many different settings. We demonstrate the proposed methodology through simulated examples and analyses based on estimates obtained from the US Census Bureaus' American Community Survey (ACS).
UR  - https://arxiv.org/abs/1701.07506
ER  - 

TY  - JOUR
T1  - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis
JF  - Bayesian Analysis
Y1  - 2016
A1  - Yang, W.H.
A1  - Holan, S.H.
A1  - Wikle, C.K.
AB  - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption).
UR  - https://arxiv.org/abs/1408.2757
ER  - 

TY  - JOUR
T1  - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey
JF  - Journal of the American Statistical Association
Y1  - 2016
A1  - Bradley, J.R.
A1  - Wikle, C.K.
A1  - Holan, S.H.
AB  - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data.
UR  - https://arxiv.org/abs/1405.7227
ER  - 

TY  - JOUR
T1  - Generating Partially Synthetic Geocoded Public Use Data with Decreased Disclosure Risk Using Differential Smoothing
JF  - Journal of the Royal Statistical Society - Series A
Y1  - 2016
A1  - Quick, H.
A1  - Holan, S.H.
A1  - Wikle, C.K.
AB  - When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies prior to making data publicly available due to data privacy obligations. An alternative to releasing aggregated and/or perturbed data is to release multiply-imputed synthetic data, where sensitive values are replaced with draws from statistical models designed to capture important distributional features in the collected data. One issue that has received relatively little attention, however, is how to handle spatially outlying observations in the collected data, as common spatial models often have a tendency to overfit these observations. The goal of this work is to bring this issue to the forefront and propose a solution, which we refer to as "differential smoothing." After implementing our method on simulated data, highlighting the effectiveness of our approach under various scenarios, we illustrate the framework using data consisting of sale prices of homes in San Francisco.
UR  - https://arxiv.org/abs/1507.05529
ER  - 

TY  - JOUR
T1  - Multivariate Spatio-Temporal Survey Fusion with Application to the American Community Survey and Local Area Unemployment Statistics
JF  - Stat
Y1  - 2016
A1  - Bradley, J.R.
A1  - Holan, S.H.
A1  - Wikle, C.K
AB  - There are often multiple surveys available that estimate and report related demographic variables of interest that are referenced over space and/or time. Not all surveys produce the same information, and thus, combining these surveys typically leads to higher quality estimates. That is, not every survey has the same level of precision nor do they always provide estimates of the same variables. In addition, various surveys often produce estimates with incomplete spatio-temporal coverage. By combining surveys using a Bayesian approach, we can account for different margins of error and leverage dependencies to produce estimates of every variable considered at every spatial location and every time point. Specifically, our strategy is to use a hierarchical modelling approach, where the first stage of the model incorporates the margin of error associated with each survey. Then, in a lower stage of the hierarchical model, the multivariate spatio-temporal mixed effects model is used to incorporate multivariate spatio-temporal dependencies of the processes of interest. We adopt a fully Bayesian approach for combining surveys; that is, given all of the available surveys, the conditional distributions of the latent processes of interest are used for statistical inference. To demonstrate our proposed methodology, we jointly analyze period estimates from the US Census Bureau's American Community Survey, and estimates obtained from the Bureau of Labor Statistics Local Area Unemployment Statistics program. Copyright © 2016 John Wiley & Sons, Ltd.
UR  - http://onlinelibrary.wiley.com/doi/10.1002/sta4.120/full
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach
Y1  - 2016
A1  - Redding, Stephen J.
A1  - Weinstein, David E.
AB  - NCRN Meeting Fall 2016: Scanner Data and Economic Statistics: A Unified Approach Redding, Stephen J.; Weinstein, David E.
PB  - University of Michigan
UR  - http://hdl.handle.net/1813/45821
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study
Y1  - 2016
A1  - Mccue, Kristin
A1  - Abowd, John
A1  - Levenstein, Margaret
A1  - Patki, Dhiren
A1  - Rodgers, Ann
A1  - Shapiro, Matthew
A1  - Wasi, Nada
AB  - NCRN Meeting Spring 2016: Developing job linkages for the Health and Retirement Study McCue, Kristin; Abowd, John; Levenstein, Margaret; Patki, Dhiren; Rodgers, Ann; Shapiro, Matthew; Wasi, Nada This paper documents work using probabilistic record linkage to create a crosswalk between jobs reported in the Health and Retirement Study (HRS) and the list of workplaces on Census Bureau’s Business Register. Matching job records provides an opportunity to join variables that occur uniquely in separate datasets, to validate responses, and to develop missing data imputation models. Identifying the respondent’s workplace (“establishment”) is valuable for HRS because it allows researchers to incorporate the effects of particular social, economic, and geospatial work environments in studies of respondent health and retirement behavior. The linkage makes use of name and address standardizing techniques tailored to business data that were recently developed in a collaboration between researchers at Census, Cornell, and the University of Michigan. The matching protocol makes no use of the identity of the HRS respondent and strictly protects the confidentiality of information about the respondent’s employer. The paper first describes the clerical review process used to create a set of human-reviewed candidate pairs, and use of that set to train matching models. It then describes and compares several linking strategies that make use of employer name, address, and phone number. Finally it discusses alternative ways of incorporating information on match uncertainty into estimates based on the linked data, and illustrates their use with a preliminary sample of matched HRS jobs. Presented at the NCRN Meeting Spring 2016 in Washington DC on May 9-10, 2016; see http://www.ncrn.info/event/ncrn-spring-2016-meeting
PB  - University of Michigan
UR  - http://hdl.handle.net/1813/43895
ER  - 

TY  - JOUR
T1  - Releasing synthetic magnitude micro data constrained to fixed marginal totals
JF  - Statistical Journal of the International Association for Official Statistics
Y1  - 2016
A1  - Wei, Lan
A1  - Reiter, Jerome P.
KW  - Confidential
KW  - Disclosure
KW  - establishment
KW  - mixture
KW  - poisson
KW  - risk
AB  - We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariate distribution, and release draws from the posterior predictive distribution of the model. We develop approaches that guarantee the synthetic data sum to marginal totals computed from the original data, as well approaches that do not enforce this equality. For both cases, we present methods for assessing disclosure risks inherent in releasing synthetic magnitude microdata. We illustrate the methodology using economic data from a survey of manufacturing establishments.
VL  - 32
UR  - http://content.iospress.com/download/statistical-journal-of-the-iaos/sji959
IS  - 1
ER  - 

TY  - JOUR
T1  - Bayesian Analysis of Spatially-Dependent Functional Responses with 	Spatially-Dependent Multi-Dimensional Functional Predictors
JF  - Statistica Sinica
Y1  - 2015
A1  - Yang, W. H.
A1  - Wikle, C.K.
A1  - Holan, S.H.
A1  - Sudduth, K.
A1  - Meyers, D.B.
VL  - 25
UR  - http://www3.stat.sinica.edu.tw/preprint/SS-13-245w_Preprint.pdf
ER  - 

TY  - JOUR
T1  - Bayesian Binomial Mixture Models for Estimating Abundance in Ecological Monitoring Studies
JF  - Annals of Applied Statistics
Y1  - 2015
A1  - Wu, G.
A1  - Holan, S.H.
A1  - Nilon, C.H.
A1  - Wikle, C.K.
VL  - 9
UR  - http://projecteuclid.org/euclid.aoas/1430226082
ER  - 

TY  - JOUR
T1  - Bayesian Lattice Filters for Time-Varying Autoregression and Time-Frequency Analysis
JF  - ArXiv
Y1  - 2015
A1  - Yang, W. H.
A1  - Holan, S. H.
A1  - Wikle, C.K.
AB  - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time-frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption).
UR  - http://arxiv.org/abs/1408.2757
IS  - 1408.2757
ER  - 

TY  - JOUR
T1  - Bayesian Lattice Filters for Time-Varying Autoregression and Time–Frequency Analysis
JF  - Project Euclid
Y1  - 2015
A1  - Yang, W. H.
A1  - Holan, Scott H.
A1  - Wikle, Christopher K.
KW  - locally stationary
KW  - model selection
KW  - nonstationary partial autocorrelation
KW  - piecewise stationary
KW  - sequential estimation
KW  - time-varying spectral density
AB  - Modeling nonstationary processes is of paramount importance to many scientific disciplines including environmental science, ecology, and finance, among others. Consequently, flexible methodology that provides accurate estimation across a wide range of processes is a subject of ongoing interest. We propose a novel approach to model-based time–frequency estimation using time-varying autoregressive models. In this context, we take a fully Bayesian approach and allow both the autoregressive coefficients and innovation variance to vary over time. Importantly, our estimation method uses the lattice filter and is cast within the partial autocorrelation domain. The marginal posterior distributions are of standard form and, as a convenient by-product of our estimation method, our approach avoids undesirable matrix inversions. As such, estimation is extremely computationally efficient and stable. To illustrate the effectiveness of our approach, we conduct a comprehensive simulation study that compares our method with other competing methods and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Lastly, we demonstrate our methodology through three modeling applications; namely, insect communication signals, environmental data (wind components), and macroeconomic data (US gross domestic product (GDP) and consumption).
UR  - http://projecteuclid.org/euclid.ba/1445263834
ER  - 

TY  - JOUR
T1  - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography
JF  - Spatial Statistics
Y1  - 2015
A1  - Quick, Harrison
A1  - Holan, Scott H.
A1  - Wikle, Christopher K.
A1  - Reiter, Jerome P.
VL  - 14
UR  - http://www.sciencedirect.com/science/article/pii/S2211675315000718
ER  - 

TY  - JOUR
T1  - Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography
JF  - ArXiv
Y1  - 2015
A1  - Quick, H.
A1  - Holan, S. H.
A1  - Wikle, C. K.
A1  - Reiter, J. P.
AB  - Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protecting the confidentiality of data subjects' identities and attributes. Typically, data stewards meet this challenge by coarsening the resolution of the released geography and, as needed, perturbing the confidential attributes. When done with high intensity, these redaction strategies can result in released data with poor analytic quality. We propose an alternative dissemination approach based on fully synthetic data. We generate data using marked point process models that can maintain both the statistical properties and the spatial dependence structure of the confidential data. We illustrate the approach using data consisting of mortality records from Durham, North Carolina.
UR  - http://arxiv.org/abs/1407.7795
IS  - 1407.7795
ER  - 

TY  - JOUR
T1  - Bayesian Semiparametric Hierarchical Empirical Likelihood Spatial Models
JF  - Journal of Statistical Planning and Inference
Y1  - 2015
A1  - Porter, A.T.
A1  - Holan, S.H.
A1  - Wikle, C.K.
VL  - 165
ER  - 

TY  - JOUR
T1  - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey
JF  - Journal of the American Statistical Association
Y1  - 2015
A1  - Bradley, Jonathan
A1  - Wikle, C.K.
A1  - Holan, S. H.
AB  - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data.
UR  - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471
ER  - 

TY  - JOUR
T1  - Bayesian Spatial Change of Support for Count-Valued Survey Data with Application to the American Community Survey
JF  - Journal of the American Statistical Association
Y1  - 2015
A1  - Bradley, Jonathan R.
A1  - Wikle, Christopher K.
A1  - Holan, Scott H.
KW  - Aggregation
KW  - American Community Survey
KW  - Bayesian hierarchical model
KW  - Givens angle prior
KW  - Markov chain Monte Carlo
KW  - Multiscale model
KW  - Non-Gaussian.
AB  - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data-users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on “new” spatial supports in “real-time.” This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in “real-time.” We show the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data.
UR  - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1117471
ER  - 

TY  - JOUR
T1  - Bayesian Spatial Change of Support for Count–Valued Survey Data
JF  - ArXiv
Y1  - 2015
A1  - Bradley, J. R.
A1  - Wikle, C.K.
A1  - Holan, S. H.
AB  - We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data.
UR  - http://arxiv.org/abs/1405.7227
IS  - 1405.7227
ER  - 

TY  - JOUR
T1  - Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the “Great Recession:” Spatial Differentiation in Remotely Sensed Land-Cover Dynamics
JF  - Population and Environment
Y1  - 2015
A1  - Wilson, C. R.
A1  - Brown, D. G.
VL  - 36
UR  - http://link.springer.com/article/10.1007%2Fs11111-014-0219-y
IS  - 3
ER  - 

TY  - JOUR
T1  - Comment on ``Semiparametric Bayesian Density Estimation with Disparate Data Sources: A Meta-Analysis of Global Childhood Undernutrition" by Finncane, M. M., Paciorek, C. J., Stevens, G. A., and Ezzati, M.
JF  - Journal of the American Statistical Association
Y1  - 2015
A1  - Wikle, C.K.
A1  - Holan, S.H.
ER  - 

TY  - CONF
T1  - Determining Potential for Breakoff in Time Diary Survey Using Paradata
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Wettlaufer, D.
A1  - Arunachalam, H.
A1  - Atkin, G.
A1  - Eck, A.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Dirichlet Process Mixture Models for Nested Categorical Data
JF  - ArXiv
Y1  - 2015
A1  - Hu, J.
A1  - Reiter, J.P.
A1  - Wang, Q.
AB  - We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files with high analytic validity and low disclosure risks. Supplementary materials for this article are available online.
UR  - http://arxiv.org/pdf/1412.2282v3.pdf
IS  - 1412.2282
ER  - 

TY  - JOUR
T1  - Expanding the Discourse on Antipoverty Policy: Reconsidering a Negative Income Tax
JF  - Journal of Poverty
Y1  - 2015
A1  - Jessica Wiederspan
A1  - Elizabeth Rhodes
A1  - H. Luke Shaefer
KW  - economic well-being
KW  - poverty alleviation
KW  - public policy
KW  - social welfare policy
AB  - This article proposes that advocates for the poor consider the replacement of the current means-tested safety net in the United States with a Negative Income Tax (NIT), a guaranteed income program that lifts families’ incomes above a minimum threshold. The article highlights gaps in service provision that leave millions in poverty, explains how a NIT could help fill those gaps, and compares current expenditures on major means-tested programs to estimated expenditures necessary for a NIT. Finally, it addresses the financial and political concerns that are likely to arise in the event that a NIT proposal gains traction among policy makers.
VL  - 19
UR  - http://dx.doi.org/10.1080/10875549.2014.991889
ER  - 

TY  - CONF
T1  - Grids and Online Panels: A Comparison of Device Type from a Survey Quality Perspective
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Wang, Mengyang
A1  - McCutcheon, Allan L.
A1  - Allen, Laura
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CHAP
T1  - Hierarchcial models for uncertainty quantification: An overview
T2  - Handbook of Uncertainty Quantification
Y1  - 2015
A1  - Wikle, C.K.
ED  - Ghanem, R.
ED  - Higdon, D.
ED  - Owhadi, H.
JF  - Handbook of Uncertainty Quantification
PB  - Springer
ER  - 

TY  - CHAP
T1  - Hierarchical Agent-Based Spatio-Temporal Dynamic Models for Discrete Valued Data
T2  - Handbook of Discrete-Valued Time Series
Y1  - 2015
A1  - Wikle, C.K.
A1  - Hooten, M.B.
ED  - Davis, R.
ED  - Holan, S.
ED  - Lund, R.
ED  - Ravishanker, N.
JF  - Handbook of Discrete-Valued Time Series
PB  - Chapman and Hall/CRC Press
CY  - Boca Raton, FL.
UR  - http://www.crcpress.com/product/isbn/9781466577732
ER  - 

TY  - CHAP
T1  - Hierarchical Dynamic Generalized Linear Mixed Models for Discrete-Valued Spatio-Temporal Data
T2  - Handbook of Discrete-Valued Time Series
Y1  - 2015
A1  - Holan, S.H.
A1  - Wikle, C.K.
ED  - Davis, R.
ED  - Holan, S.
ED  - Lund, R.
ED  - Ravishanker, N
JF  - Handbook of Discrete-Valued Time Series
PB  - Chapman and Hall/CRC Press
CY  - Boca Raton, FL
SN  - ISBN 9781466577732
UR  - http://www.crcpress.com/product/isbn/9781466577732
N1  - to appear in "Handbook of Discrete-Valued Time Series
ER  - 

TY  - CHAP
T1  - Hierarchical Dynamic Generalized Linear Mixed Models for Discrete--Valued Spatio-Temporal Data
T2  - Handbook of Discrete--Valued Time Series
Y1  - 2015
A1  - Holan, S.H.
A1  - Wikle, C.K.
JF  - Handbook of Discrete--Valued Time Series
ER  - 

TY  - CHAP
T1  - Hierarchical Spatial Models
T2  - Encyclopedia of Geographical Information Science
Y1  - 2015
A1  - Arab, A.
A1  - Hooten, M.B.
A1  - Wikle, C.K.
JF  - Encyclopedia of Geographical Information Science
PB  - Springer
ER  - 

TY  - JOUR
T1  - Hierarchical, stochastic modeling across spatiotemporal scales of large river ecosystems and somatic growth in fish populations under various climate models: Missouri River sturgeon example
JF  - Geological Society
Y1  - 2015
A1  - Wildhaber, M.L.
A1  - Wikle, C.K.
A1  - Moran, E.H.
A1  - Anderson, C.J.
A1  - Franz, K.J.
A1  - Dey, R.
ER  - 

TY  - CONF
T1  - I Know What You Did Next: Predicting Respondent’s Next Activity Using Machine Learning
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Arunachalam, H.
A1  - Atkin, G.
A1  - Eck, A.
A1  - Wettlaufer, D.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Modern Perspectives on Statistics for Spatio-Temporal Data
JF  - WIRES Computational Statistics
Y1  - 2015
A1  - Wikle, C.K.
VL  - 7
UR  - http://dx.doi.org/10.1002/wics.1341
IS  - 1
ER  - 

TY  - ICOMM
T1  - Multiscale Analysis of Survey Data: Recent Developments and Exciting Prospects
Y1  - 2015
A1  - Bradley, J.R.
A1  - Wikle, C.K.
A1  - Holan, S.H.
JF  - Statistics Views
ER  - 

TY  - JOUR
T1  - Multivariate Spatial Hierarchical Bayesian Empirical Likelihood Methods for Small Area Estimation
JF  - STAT
Y1  - 2015
A1  - Porter, A.T.
A1  - Holan, S.H.
A1  - Wikle, C.K.
VL  - 4
UR  - http://dx.doi.org/10.1002/sta4.81
IS  - 1
ER  - 

TY  - JOUR
T1  - Multivariate Spatio-Temporal Models for High-Dimensional Areal Data with Application to Longitudinal Employer-Household Dynamics
JF  - ArXiv
Y1  - 2015
A1  - Bradley, J. R.
A1  - Holan, S. H.
A1  - Wikle, C.K.
AB  - Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal datasets are extremely high-dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal datasets. The proposed MSTM extends the notion of Moran's I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using {a novel} parameter model.
UR  - http://arxiv.org/abs/1503.00982
IS  - 1503.00982
ER  - 

TY  - JOUR
T1  - Multivariate Spatio-Temporal Models for High-Dimensional Areal Data with Application to Longitudinal Employer-Household Dynamics
JF  - Annals of Applied Statistics
Y1  - 2015
A1  - Bradley, J.R.
A1  - Holan, S.H.
A1  - Wikle, C.K.
AB  - Many data sources report related variables of interest that are also referenced over geographic regions  and time; however, there are relatively few general statistical methods that one can readily  use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate  spatio-temporal areal datasets are extremely high-dimensional, which leads to practical  issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators  (QWI) published by the US Census Bureau’s Longitudinal Employer-Household Dynamics  (LEHD) program. QWIs are available by different variables, regions, and time points, resulting  in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian  framework, the scope of the QWIs can be extended to provide estimates of missing values along  with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal  statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can  be used to efficiently model high-dimensional multivariate spatio-temporal areal datasets. The proposed  MSTM extends the notion of Moran’s I basis functions to the multivariate spatio-temporal  setting. This extension leads to several methodological contributions including extremely effective  dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and  the reduction of a high-dimensional parameter space using a novel parameter model.
VL  - 9
IS  - 4
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data
Y1  - 2015
A1  - Holan, Scott
A1  - Bradley, Jonathan R.
A1  - Wikle, Christopher K.
AB  - NCRN Meeting Spring 2015: Models for Multiscale Spatially-Referenced Count Data Holan, Scott; Bradley, Jonathan R.; Wikle, Christopher K. Presentation at the NCRN Meeting Spring 2015
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/40176
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error
Y1  - 2015
A1  - Wikle, Christopher K.
A1  - Bradley, Jonathan
A1  - Holan, Scott
AB  - NCRN Meeting Spring 2015: Regionalization of Multiscale Spatial Processes Using a Criterion for Spatial Aggregation Error Wikle, Christopher K.; Bradley, Jonathan; Holan, Scott Develop and implement a statistical criterion to diagnose spatial aggregation error that can facilitate the choice of regionalizations of spatial data. Presentation at NCRN Meeting Spring 2015
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/40177
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics
Y1  - 2015
A1  - Cressie, Noel
A1  - Holan, Scott H.
A1  - Wikle, Christopher K.
AB  - NCRN Meeting Spring 2015: Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Cressie, Noel; Holan, Scott H.; Wikle, Christopher K. Presentation at the NCRN Spring 2015 Meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/40179
ER  - 

TY  - JOUR
T1  - Record Linkage using STATA: Pre-processing, Linking and Reviewing Utilities
JF  - The Stata Journal
Y1  - 2015
A1  - Wasi, Nada
A1  - Flaaen, Aaron
AB  - In this article, we describe Stata utilities that facilitate probabilistic record linkage—the technique typically used for merging two datasets with no common record identifier. While the preprocessing tools are developed specifically for linking two company databases, the other tools can be used for many different types of linkage. Specifically, the stnd_compname and stnd_address commands parse and standardize company names and addresses to improve the match quality when linking. The reclink2 command is a generalized version of Blasnik's reclink (2010, Statistical Software Components S456876, Department of Economics, Boston College) that allows for many-to-one matching. Finally, clrevmatch is an interactive tool that allows the user to review matched results in an efficient and seamless manner. Rather than exporting results to another file format (for example, Excel), inputting clerical reviews, and importing back into Stata, one can use the clrevmatch tool to conduct all of these steps within Stata. This helps improve the speed and flexibility of matching, which often involves multiple runs.
VL  - 15
UR  - http://www.stata-journal.com/article.html?article=dm0082
IS  - 3
ER  - 

TY  - JOUR
T1  - Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error
JF  - ArXiv
Y1  - 2015
A1  - Bradley, J. R.
A1  - Wikle, C.K.
A1  - Holan, S. H.
AB  - The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loeve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds.
UR  - http://arxiv.org/abs/1502.01974
IS  - 1502.01974
ER  - 

TY  - JOUR
T1  - Simultaneous Edit-Imputation for Continuous Microdata
JF  - Journal of the American Statistical Association
Y1  - 2015
A1  - Kim, H. J.
A1  - Cox, L. H.
A1  - Karr, A. F.
A1  - Reiter, J. P.
A1  - Wang, Q.
VL  - 110
UR  - http://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1040881
ER  - 

TY  - JOUR
T1  - Small Area Estimation via Multivariate Fay-Herriot Models With Latent Spatial Dependence
JF  - Australian & New Zealand Journal of Statistics
Y1  - 2015
A1  - Porter, A.T.
A1  - Wikle, C.K.
A1  - Holan, S.H.
VL  - 57
UR  - http://arxiv.org/abs/1310.7211
ER  - 

TY  - JOUR
T1  - Spatio-temporal change of support with application to American Community Survey multi-year period estimates
JF  - Stat
Y1  - 2015
A1  - Bradley, Jonathan R.
A1  - Wikle, Christopher K.
A1  - Holan, Scott H.
KW  - Bayesian
KW  - change-of-support
KW  - dynamical
KW  - hierarchical models
KW  - mixed-effects model
KW  - Moran's I
KW  - multi-year period estimate
AB  - We present hierarchical Bayesian methodology to perform spatio-temporal change of support (COS) for survey data with Gaussian sampling errors. This methodology is motivated by the American Community Survey (ACS), which is an ongoing survey administered by the US Census Bureau that provides timely information on several key demographic variables. The ACS has published 1-year, 3-year, and 5-year period estimates, and margins of errors, for demographic and socio-economic variables recorded over predefined geographies. The spatio-temporal COS methodology considered here provides data users with a way to estimate ACS variables on customized geographies and time periods while accounting for sampling errors. Additionally, 3-year ACS period estimates are to be discontinued, and this methodology can provide predictions of ACS variables for 3-year periods given the available period estimates. The methodology is based on a spatio-temporal mixed-effects model with a low-dimensional spatio-temporal basis function representation, which provides multi-resolution estimates through basis function aggregation in space and time. This methodology includes a novel parameterization that uses a target dynamical process and recently proposed parsimonious Moran's I propagator structures. Our approach is demonstrated through two applications using public-use ACS estimates and is shown to produce good predictions on a hold-out set of 3-year period estimates. Copyright © 2015 John Wiley & Sons, Ltd.
VL  - 4
UR  - http://dx.doi.org/10.1002/sta4.94
ER  - 

TY  - JOUR
T1  - A stochastic bioenergetics model based approach to translating large river flow and temperature in to fish population responses: the pallid sturgeon example
JF  - Geological Society
Y1  - 2015
A1  - Wildhaber, M.L.
A1  - Dey, R.
A1  - Wikle, C.K.
A1  - Anderson, C.J.
A1  - Moran, E.H.
A1  - Franz, K.J.
VL  - 408
ER  - 

TY  - CONF
T1  - Using Machine Learning Techniques to Predict Respondent Type from A Priori Demographic Information
T2  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
Y1  - 2015
A1  - Atkin, G.
A1  - Arunachalam, H.
A1  - Eck, A.
A1  - Wettlaufer, D.
A1  - Soh, L.-K.
A1  - Belli, R.F.
JF  - 70th Annual Conference of the American Association for Public Opinion Research (AAPOR)
CY  - Hollywood, Florida
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - JOUR
T1  - Agent Based Models: Statistical Challenges and Opportunities
JF  - Statistics Views
Y1  - 2014
A1  - Wikle, C.K.
PB  - Wiley
UR  - http://www.statisticsviews.com/details/feature/6354691/Agent-Based-Models-Statistical-Challenges-and-Opportunities.html
ER  - 

TY  - JOUR
T1  - Bayesian estimation of disclosure risks for multiply imputed, synthetic data
JF  - Journal of Privacy and Confidentiality
Y1  - 2014
A1  - Reiter, J. P.
A1  - Wang, Q.
A1  - Zhang, B.
AB  - <p>Agencies seeking to disseminate public use microdata, i.e., data on individual records, can replace confidential values with multiple draws from statistical models estimated with the collected data. We present a famework for evaluating disclosure risks inherent in releasing multiply-imputed, synthetic data. The basic idea is to mimic an intruder who computes posterior distributions of confidential values given the released synthetic data and prior knowledge. We illustrate the methodology with artificial fully synthetic data and with partial synthesis of the Survey of Youth in Custody.</p>
VL  - 6
UR  - http://repository.cmu.edu/jpc/vol6/iss1/2
IS  - 1
ER  - 

TY  - RPRT
T1  - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository
Y1  - 2014
A1  - Lagoze, Carl
A1  - Vilhuber, Lars
A1  - Williams, Jeremy
A1  - Perry, Benjamin
A1  - Block, William C.
AB  - CED 2 AR: The Comprehensive Extensible Data Documentation and Access Repository Lagoze, Carl; Vilhuber, Lars; Williams, Jeremy; Perry, Benjamin; Block, William C. We describe the design, implementation, and deployment of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR). This is a metadata repository system that allows researchers to search, browse, access, and cite confidential data and metadata through either a web-based user interface or programmatically through a search API, all the while re-reusing and linking to existing archive and provider generated metadata. CED 2 AR is distinguished from other metadata repository-based applications due to requirements that derive from its social science context. These include the need to cloak confidential data and metadata and manage complex provenance chains Presented at 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Sept 8-12, 2014
PB  - Cornell University
UR  - http://hdl.handle.net/1813/44702
ER  - 

TY  - RPRT
T1  - The Cepstral Model for Multivariate Time Series: The Vector Exponential Model.
Y1  - 2014
A1  - Holan, S.H.
A1  - McElroy, T.S.
A1  - Wu, G.
AB  - <p><span>Vector autoregressive (VAR) models have become a staple in the analysis of multivariate time series and are formulated in the time domain as difference equations, with an implied covariance structure. In many contexts, it is desirable to work with a stable, or at least stationary, representation. To fit such models, one must impose restrictions on the coefficient matrices to ensure that certain determinants are nonzero; which, except in special cases, may prove burdensome. To circumvent these difficulties, we propose a flexible frequency domain model expressed in terms of the spectral density matrix. Specifically, this paper treats the modeling of covariance stationary vector-valued (i.e., multivariate) time series via an extension of the exponential model for the spectrum of a scalar time series. We discuss the modeling advantages of the vector exponential model and its computational facets, such as how to obtain Wold coefficients from given cepstral coefficients. Finally, we demonstrate the utility of our approach through simulation as well as two illustrative data examples focusing on multi-step ahead forecasting and estimation of squared coherence.</span></p>
PB  - arXiv
UR  - http://arxiv.org/abs/1406.0801
ER  - 

TY  - CONF
T1  - Data Quality among Devices to Complete Surveys: Comparing Personal Computers, Smartphones and Tablets
T2  - Midwest Association for Public Opinion Research Annual Meeting
Y1  - 2014
A1  - Wang, Mengyang
A1  - McCutcheon, Allan L.
JF  - Midwest Association for Public Opinion Research Annual Meeting
CY  - Chicago, IL
UR  - http://www.mapor.org/conferences.html
ER  - 

TY  - CHAP
T1  - Disclosure risk evaluation for fully synthetic data
T2  - Privacy in Statistical Databases
Y1  - 2014
A1  - J. Hu
A1  - J.P. Reiter
A1  - Q. Wang
JF  - Privacy in Statistical Databases
PB  - Springer
CY  - Heidelberg
VL  - 8744
ER  - 

TY  - JOUR
T1  - Multiple imputation of missing or faulty values under linear constraints
JF  - Journal of Business and Economic Statistics
Y1  - 2014
A1  - Kim, H. J.
A1  - Reiter, J. P.
A1  - Wang, Q.
A1  - Cox, L. H.
A1  - Karr, A. F.
AB  - <p>Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.</p>
VL  - 32
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography
Y1  - 2014
A1  - Quick, Harrison
A1  - Holan, Scott
A1  - Wikle, Christopher
A1  - Reiter, Jerry
AB  - NCRN Meeting Fall 2014: Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography Quick, Harrison; Holan, Scott; Wikle, Christopher; Reiter, Jerry Presentation from NCRN Fall 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37750
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the &quot;Great Recession&quot;
Y1  - 2014
A1  - Wilson, Courtney
A1  - Brown, Daniel G.
AB  - NCRN Meeting Fall 2014: Change in Visible Impervious Surface Area in Southeastern Michigan Before and After the &quot;Great Recession&quot; Wilson, Courtney; Brown, Daniel G. Presentation at Fall 2014 NCRN meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37446
ER  - 

TY  - RPRT
T1  - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data
Y1  - 2014
A1  - Bradley, Jonathan
A1  - Holan, Scott
A1  - Wikle, Christopher
AB  - NCRN Meeting Fall 2014: Mixed Effects Modeling for Multivariate-Spatio-Temporal Areal Data Bradley, Jonathan; Holan, Scott; Wikle, Christopher Presentation from NCRN Fall 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/37749
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau
Y1  - 2014
A1  - Block, William
A1  - Brown, Warren
A1  - Williams, Jeremy
A1  - Vilhuber, Lars
A1  - Lagoze, Carl
AB  - NCRN Meeting Spring 2014: Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau Block, William; Brown, Warren; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl presentation at NCRN Spring 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/36392
ER  - 

TY  - RPRT
T1  - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL)
Y1  - 2014
A1  - Gathright, Graton
A1  - Kutzbach, Mark
A1  - Mccue, Kristin
A1  - McEntarfer, Erika
A1  - Monti, Holly
A1  - Trageser, Kelly
A1  - Vilhuber, Lars
A1  - Wasi, Nada
A1  - Wignall, Christopher
AB  - NCRN Meeting Spring 2014: Summer Working Group for Employer List Linking (SWELL) Gathright, Graton; Kutzbach, Mark; Mccue, Kristin; McEntarfer, Erika; Monti, Holly; Trageser, Kelly; Vilhuber, Lars; Wasi, Nada; Wignall, Christopher Presentation for NCRN Spring 2014 meeting
PB  - NCRN Coordinating Office
UR  - http://hdl.handle.net/1813/36396
ER  - 

TY  - JOUR
T1  - Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates
JF  - Spatial Statistics
Y1  - 2014
A1  - Porter, A. T.,
A1  - Holan, S.H.,
A1  - Wikle, C.K.,
A1  - Cressie, N.
VL  - 10
UR  - http://arxiv.org/pdf/1303.6668v3.pdf
ER  - 

TY  - CONF
T1  - Would a Privacy Fundamentalist Sell their DNA for \$1000... if Nothing Bad Happened Thereafter? A Study of the Western Categories, Behavior Intentions, and Consequences
T2  - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS)
Y1  - 2014
A1  - Woodruff, A.
A1  - Pihur, V.
A1  - Acquisti, A.
A1  - Consolvo, S.
A1  - Schmidt, L.
A1  - Brandimarte, L.
JF  - Proceedings of the Tenth Symposium on Usable Privacy and Security (SOUPS)
PB  - ACM
CY  - New York, NY
UR  - https://www.usenix.org/conference/soups2014/proceedings/presentation/woodruff
N1  - IAPP SOUPS Privacy Award Winner
ER  - 

TY  - CONF
T1  - Bayesian Modeling in the Era of Big Data: the Role of High-Throughput and High-Performance Computing
T2  - The Extreme Science and Engineering Discovery Environment Conference
Y1  - 2013
A1  - Wu, G.
JF  - The Extreme Science and Engineering Discovery Environment Conference
CY  - San Diego, CA
ER  - 

TY  - CONF
T1  - Binomial Mixture Models for Urban Ecological Monitoring Studies Using American Community Survey Demographic Covariates
T2  - Joint Statistical Meetings 2013
Y1  - 2013
A1  - Wu, G.
JF  - Joint Statistical Meetings 2013
CY  - Montreal, Canada
ER  - 

TY  - JOUR
T1  - Data Management of Confidential Data
JF  - International Journal of Digital Curation
Y1  - 2013
A1  - Carl Lagoze
A1  - William C. Block
A1  - Jeremy Williams
A1  - John M. Abowd
A1  - Lars Vilhuber
AB  - Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data.
VL  - 8
N1  - Presented at 8th International Digital Curation Conference 2013, Amsterdam. See also http://hdl.handle.net/1813/30924
ER  - 

TY  - CONF
T1  - Do ‘Don’t Know’ Responses = Survey Satisficing? Evidence from the Gallup Panel Paradata
T2  - American Association for Public Opinion Research 2013 Annual Conference
Y1  - 2013
A1  - Wang, Mengyang
A1  - Ruppanner, Leah
A1  - McCutcheon, Allan L.
JF  - American Association for Public Opinion Research 2013 Annual Conference
CY  - Boston, MA
UR  - http://www.aapor.org/AAPORKentico/Conference/Recent-Conferences.aspx
ER  - 

TY  - CONF
T1  - Ecological Prediction with Nonlinear Multivariate Time-Frequency Functional Data Models
T2  - Joint Statistical Meetings 2013
Y1  - 2013
A1  - Wikle, C.K.
JF  - Joint Statistical Meetings 2013
CY  - Montreal, Canada
ER  - 

TY  - JOUR
T1  - Ecological Prediction With Nonlinear Multivariate Time-Frequency Functional Data Models
JF  - Journal of Agricultural, Biological, and Environmental Statistics
Y1  - 2013
A1  - Yang, W.H.,
A1  - Wikle, C.K.
A1  - Holan, S.H.
A1  - Wildhaber, M.L.
VL  - 18
UR  - http://link.springer.com/article/10.1007/s13253-013-0142-1
ER  - 

TY  - CONF
T1  - Encoding Provenance Metadata for Social Science Datasets
T2  - Metadata and Semantics Research
Y1  - 2013
A1  - Lagoze, Carl
A1  - Willliams, Jeremy
A1  - Vilhuber, Lars
ED  - Garoufallou, Emmanouel
ED  - Greenberg, Jane
KW  - DDI
KW  - eSocial Science
KW  - Metadata
KW  - Provenance
JF  - Metadata and Semantics Research
T3  - Communications in Computer and Information Science
PB  - Springer International Publishing
VL  - 390
SN  - 978-3-319-03436-2
UR  - http://dx.doi.org/10.1007/978-3-319-03437-9_13
ER  - 

TY  - RPRT
T1  - Encoding Provenance of Social Science Data: Integrating PROV with DDI
Y1  - 2013
A1  - Lagoze, Carl
A1  - Block, William C
A1  - Williams, Jeremy
A1  - Abowd, John
A1  - Vilhuber, Lars
AB  - Encoding Provenance of Social Science Data: Integrating PROV with DDI Lagoze, Carl; Block, William C; Williams, Jeremy; Abowd, John; Vilhuber, Lars Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface. Submitted to EDDI13 5th Annual European DDI User Conference December 2013, Paris, France
PB  - Cornell University
UR  - http://hdl.handle.net/1813/34443
ER  - 

TY  - CONF
T1  - Encoding Provenance of Social Science Data: Integrating PROV with DDI
T2  - 5th Annual European DDI User Conference
Y1  - 2013
A1  - Carl Lagoze
A1  - William C. Block
A1  - Jeremy Williams
A1  - Lars Vilhuber
KW  - DDI
KW  - eSocial Science
KW  - Metadata
KW  - Provenance
AB  - Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface.
JF  - 5th Annual European DDI User Conference
ER  - 

TY  - JOUR
T1  - From Facebook Regrets to Facebook Privacy Nudges
JF  - Ohio State Law Journal
Y1  - 2013
A1  - Wang, Y.
A1  - Leon, P. G.
A1  - Chen, X.
A1  - Komanduri, S.
A1  - Norcie, G.
A1  - Scott, K.
A1  - Acquisti, A.
A1  - Cranor, L. F.
A1  - Sadeh, N.
N1  - Invited paper
ER  - 

TY  - JOUR
T1  - Hierarchical Bayesian Spatio-Temporal Conway-Maxwell Poisson Models with Dynamic Dispersion
JF  - Journal of Agricultural, Biological, and Environmental Statistics
Y1  - 2013
A1  - Wu, G.
A1  - Holan, S.H.
A1  - Wikle, C.K.
CY  - Anchorage, Alaska
VL  - 18
UR  - http://link.springer.com/article/10.1007/s13253-013-0141-2
ER  - 

TY  - JOUR
T1  - Hierarchical Spatio-Temporal Models and Survey Research
JF  - Statistics Views
Y1  - 2013
A1  - Wikle, C.
A1  - Holan, S.
A1  - Cressie, N.
UR  - http://www.statisticsviews.com/details/feature/4730991/Hierarchical-Spatio-Temporal-Models-and-Survey-Research.html
ER  - 

TY  - ABST
T1  - How can survey estimates of small areas be improved by leveraging social-media data?
Y1  - 2013
A1  - Cressie, N.
A1  - Holan, S.
A1  - Wikle, C.
JF  - The Survey Statistician
UR  - http://isi.cbs.nl/iass/N68.pdf
ER  - 

TY  - RPRT
T1  - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files
Y1  - 2013
A1  - Block, William C.
A1  - Williams, Jeremy
A1  - Vilhuber, Lars
A1  - Lagoze, Carl
A1  - Brown, Warren
A1  - Abowd, John M.
AB  - Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files Block, William C.; Williams, Jeremy; Vilhuber, Lars; Lagoze, Carl; Brown, Warren; Abowd, John M. Presentation at NADDI 2013 This record has also been archived at http://kuscholarworks.ku.edu/dspace/handle/1808/11093 .
PB  - Cornell University
UR  - http://hdl.handle.net/1813/33362
ER  - 

TY  - RPRT
T1  - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata
Y1  - 2013
A1  - Vilhuber, Lars
A1  - Abowd, John
A1  - Block, William
A1  - Lagoze, Carl
A1  - Williams, Jeremy
AB  - Managing Confidentiality and Provenance across Mixed Private and Publicly-Accessed Data and Metadata Vilhuber, Lars; Abowd, John; Block, William; Lagoze, Carl; Williams, Jeremy Social science researchers are increasingly interested in making use of confidential micro-data that contains linkages to the identities of people, corporations, etc. The value of this linking lies in the potential to join these identifiable entities with external data such as genome data, geospatial information, and the like. Leveraging these linkages is an essential aspect of “big data” scholarship. However, the utility of these confidential data for scholarship is compromised by the complex nature of their management and curation. This makes it difficult to fulfill US federal data management mandates and interferes with basic scholarly practices such as validation and reuse of existing results. We describe in this paper our work on the CED2AR prototype, a first step in providing researchers with a tool that spans the confidential/publicly-accessible divide, making it possible for researchers to identify, search, access, and cite those data. The particular points of interest in our work are the cloaking of metadata fields and the expression of provenance chains. For the former, we make use of existing fields in the DDI (Data Description Initiative) specification and suggest some minor changes to the specification. For the latter problem, we investigate the integration of DDI with recent work by the W3C PROV working group that has developed a generalizable and extensible model for expressing data provenance.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/34534
ER  - 

TY  - CONF
T1  - Nonlinear Dynamic Spatio-Temporal Statistical Models
T2  - Southern Regional Council on Statistics Summer Research Conference
Y1  - 2013
A1  - Wikle, C.K.
JF  - Southern Regional Council on Statistics Summer Research Conference
ER  - 

TY  - CHAP
T1  - Spatio-temporal Design: Advances in Efficient Data Acquisition
T2  - Spatio-temporal Design: Advances in Efficient Data Acquisition
Y1  - 2013
A1  - Holan, S.
A1  - Wikle, C.
ED  - Jorge Mateu
ED  - Werner Muller
KW  - semiparametric dynamic design for non-Gaussian spatio-temporal data
JF  - Spatio-temporal Design: Advances in Efficient Data Acquisition
PB  - Wiley
SN  - 9780470974292
ER  - 

TY  - ABST
T1  - Statistics and the Environment: Overview and Challenges
Y1  - 2013
A1  - Wikle, C.K.
N1  - Invited Introductory Overview Lecture
ER  - 

TY  - THES
T1  - Using Satellite Imagery to Evaluate and Analyze Socioeconomic Changes Observed with Census Data
Y1  - 2013
A1  - Wilson, C. R.
N1  - NCRN
ER  - 

TY  - JOUR
T1  - An Approach for Identifying and Predicting Economic Recessions in Real-Time Using Time-Frequency Functional Models
JF  - Applied Stochastic Models in Business and Industry
Y1  - 2012
A1  - Holan, S.
A1  - Yang, W.
A1  - Matteson, D.
A1  - Wikle, C.K.
KW  - Bayesian model averaging
KW  - business cycles
KW  - empirical orthogonal functions
KW  - functional data
KW  - MIDAS
KW  - spectrogram
KW  - stochastic search variable selection
VL  - 28
UR  - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1954/full
N1  - DOI: 10.1002/asmb.1954
ER  - 

TY  - JOUR
T1  - Bayesian Multi-Regime Smooth Transition Regression with Ordered Categorical Variables
JF  - Computational Statistics and Data Analysis
Y1  - 2012
A1  - Wang, J.
A1  - Holan, S.
VL  - 56
UR  - http://dx.doi.org/10.1016/j.csda.2012.04.018
N1  - http://dx.doi.org/10.1016/j.csda.2012.04.018
ER  - 

TY  - CONF
T1  - Change of Support in Spatio-Temporal Dynamical Models
T2  - Joint Statistical Meetings
Y1  - 2012
A1  - Wikle, C.K.
JF  - Joint Statistical Meetings
CY  - Montreal, Canada
ER  - 

TY  - RPRT
T1  - Data Management of Confidential Data
Y1  - 2012
A1  - Lagoze, Carl
A1  - Block, William C.
A1  - Williams, Jeremy
A1  - Abowd, John M.
A1  - Vilhuber, Lars
AB  - Data Management of Confidential Data Lagoze, Carl; Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/30924
ER  - 

TY  - RPRT
T1  - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR)
Y1  - 2012
A1  - Block, William C.
A1  - Williams, Jeremy
A1  - Abowd, John M.
A1  - Vilhuber, Lars
A1  - Lagoze, Carl
AB  - An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars; Lagoze, Carl This presentation will demonstrate the latest DDI-related technological developments of Cornell University’s $3 million NSF-Census Research Network (NCRN) award, dedicated to improving the documentation, discoverability, and accessibility of public and restricted data from the federal statistical system in the United States. The current internal name for our DDI-based system is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). CED²AR ingests metadata from heterogeneous sources and supports filtered synchronization between restricted and public metadata holdings. Currently-supported CED²AR “connector workflows” include mechanisms to ingest IPUMS, zero-observation files from the American Community Survey (DDI 2.1), and SIPP Synthetic Beta (DDI 1.2). These disparate metadata sources are all transformed into a DDI 2.5 compliant form and stored in a single repository. In addition, we will demonstrate an extension to DDI 2.5 that allows for the labeling of elements within the schema to indicate confidentiality. This metadata can then be filtered, allowing the creation of derived public use metadata from an original confidential source. This repository is currently searchable online through a prototype application demonstrating the ability to search across previously heterogeneous metadata sources. Presentation at the 4th Annual European DDI User Conference (EDDI12), Norwegian Social Science Data Services, Bergen, Norway, 3 December, 2012
PB  - Cornell University
UR  - http://hdl.handle.net/1813/30922
ER  - 

TY  - CONF
T1  - The Economics of Privacy
T2  - The Oxford Handbook of the Digital Economy
Y1  - 2012
A1  - Laura Brandimarte
A1  - Alessandro Acquisti
ED  - Martin Peitz
ED  - Joel Waldfogel
JF  - The Oxford Handbook of the Digital Economy
PB  - Oxford University Press
SN  - 9780195397840
ER  - 

TY  - ABST
T1  - Efficient Time-Frequency Representations in High-Dimensional Spatial and Spatio-Temporal Models
Y1  - 2012
A1  - Wikle, C.K.
ER  - 

TY  - RPRT
T1  - Encoding Provenance Metadata for Social Science Datasets
Y1  - 2012
A1  - Lagoze, Carl
A1  - Williams, Jeremy
A1  - Vilhuber, Lars
AB  - Encoding Provenance Metadata for Social Science Datasets Lagoze, Carl; Williams, Jeremy; Vilhuber, Lars Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and re- produce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard. Submitted to Metadata and Semantics Research (MTSR 2013) conference.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/55327
ER  - 

TY  - CHAP
T1  - Entropy Estimations Using Correlated Symmetric Stable Random Projections
T2  - Advances in Neural Information Processing Systems 25
Y1  - 2012
A1  - Ping Li
A1  - Cun-Hui Zhang
ED  - P. Bartlett
ED  - F.C.N. Pereira
ED  - C.J.C. Burges
ED  - L. Bottou
ED  - K.Q. Weinberger
JF  - Advances in Neural Information Processing Systems 25
UR  - http://books.nips.cc/papers/files/nips25/NIPS2012_1456.pdf
ER  - 

TY  - CONF
T1  - Exploring interviewer and respondent interactions: An innovative behavior coding approach
T2  - Midwest Association for Public Opinion Research 2012 Annual Conference
Y1  - 2012
A1  - Walton, L.
A1  - Stange, M.
A1  - Powell, R.
A1  - Belli, R.F.
JF  - Midwest Association for Public Opinion Research 2012 Annual Conference
CY  - Chicago, IL
UR  - http://www.mapor.org/conferences.html
ER  - 

TY  - CONF
T1  - Hierarchical General Quadratic Nonlinear Models for Spatio-Temporal Dynamics
T2  - Red Raider Conference
Y1  - 2012
A1  - Wikle, C.K.
JF  - Red Raider Conference
PB  - Texas Tech University
CY  - Lubbock, TX
ER  - 

TY  - RPRT
T1  - The NSF-Census Research Network: Cornell Node
Y1  - 2012
A1  - Block, William C.
A1  - Lagoze, Carl
A1  - Vilhuber, Lars
A1  - Brown, Warren A.
A1  - Williams, Jeremy
A1  - Arguillas, Florio
AB  - The NSF-Census Research Network: Cornell Node Block, William C.; Lagoze, Carl; Vilhuber, Lars; Brown, Warren A.; Williams, Jeremy; Arguillas, Florio Cornell University has received a $3M NSF-Census Research Network (NCRN) award to improve the documentation and discoverability of both public and restricted data from the federal statistical system. The current internal name for this project is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). The diagram to the right provides a high level architectural overview of the system to be implemented. The CED²AR will be based upon leading metadata standards such as the Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX) and be flexibly designed to ingest documentation from a variety of source files. It will permit synchronization between the public and confidential instances of the repository. The scholarly community will be able to use the CED²AR as it would a conventional metadata repository, deprived only of the values of certain confidential information, but not their metadata. The authorized user, working on the secure Census Bureau network, could use the CED²AR with full information in authorized domains.
PB  - Cornell University
UR  - http://hdl.handle.net/1813/30925
ER  - 

TY  - CHAP
T1  - One Permutation Hashing
T2  - Advances in Neural Information Processing Systems 25
Y1  - 2012
A1  - Ping Li
A1  - Art Owen
A1  - Cun-Hui Zhang
ED  - P. Bartlett
ED  - F.C.N. Pereira
ED  - C.J.C. Burges
ED  - L. Bottou
ED  - K.Q. Weinberger
JF  - Advances in Neural Information Processing Systems 25
UR  - http://books.nips.cc/papers/files/nips25/NIPS2012_1436.pdf
ER  - 

TY  - JOUR
T1  - Rejoinder: An approach for identifying and predicting economic recessions in real time using time frequency functional models
JF  - Applied Stochastic Models in Business and Industry
Y1  - 2012
A1  - Holan, S.
A1  - Yang, W.
A1  - Matteson, D.
A1  - Wikle, C.
VL  - 28
UR  - http://onlinelibrary.wiley.com/doi/10.1002/asmb.1955/full
ER  - 

TY  - CHAP
T1  - Semiparametric Dynamic Design of Monitoring Networks for Non-Gaussian Spatio-Temporal Data
T2  - Spatio-temporal Design: Advances in Efficient Data Acquisition
Y1  - 2012
A1  - Holan, S.
A1  - Wikle, C.K.
ED  - Jorge Mateu
ED  - Werner Muller
JF  - Spatio-temporal Design: Advances in Efficient Data Acquisition
PB  - Wiley
CY  - Chichester, UK
UR  - http://onlinelibrary.wiley.com/doi/10.1002/9781118441862.ch12/summary
ER  - 

TY  - ABST
T1  - Spatio-Temporal Statistics at Mizzou, Truman School of Public Affairs
Y1  - 2012
A1  - Wikle, C.K.
ER  - 

TY  - JOUR
T1  - An ensemble quadratic echo state network for nonlinear spatio-temporal forecasting
JF  - Stat
Y1  - 0
A1  - McDermott, P.L.
A1  - Wikle, C.K.
AB  - Spatio-temporal data and processes are prevalent across a wide variety of scientific disciplines. These processes are often characterized by nonlinear time dynamics that include interactions across multiple scales of spatial and temporal variability. The data sets associated with many of these processes are increasing in size due to advances in automated data measurement, management, and numerical simulator output. Non- linear spatio-temporal models have only recently seen interest in statistics, but there are many classes of such models in the engineering and geophysical sciences. Tradi- tionally, these models are more heuristic than those that have been presented in the statistics literature, but are often intuitive and quite efficient computationally. We show here that with fairly simple, but important, enhancements, the echo state net- work (ESN) machine learning approach can be used to generate long-lead forecasts of nonlinear spatio-temporal processes, with reasonable uncertainty quantification, and at only a fraction of the computational expense of a traditional parametric nonlinear spatio-temporal models.
UR  - https://arxiv.org/abs/1708.05094
ER  -