Imputation in U.S. manufacturing data and its implications for productivity dispersion

Kirk K. White, Jerome P. Reiter, Amil Petrin

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures, 79% and 73% of observations, respectively, have imputed data for at least one variable used to compute total factor productivity (TFP). The bureau primarily imputes for missing values using mean-imputation methods, which can reduce the underlying variance of the imputed variables. For five variables entering TFP, we show that dispersion is significantly smaller in the Census mean-imputed versus the nonimputed data. We use classification and regression trees (CART) to produce multiple imputations with observed data for similar plants. For 90% of the 473 industries in 2002 and 84% of the 471 industries in 2007, we find that TFP dispersion increases as we move from Census mean-imputed data to nonimputed data to the CART-imputed data.

Original languageEnglish (US)
Pages (from-to)502-509
Number of pages8
JournalReview of Economics and Statistics
Volume100
Issue number3
DOIs
StatePublished - Jul 1 2018

Bibliographical note

Funding Information:
National Science Foundation grant NSF SES 1131897.

Fingerprint

Dive into the research topics of 'Imputation in U.S. manufacturing data and its implications for productivity dispersion'. Together they form a unique fingerprint.

Cite this