Drawing Statistical Inferences from Historical Census Data

J Michael Oakes; Steven J Ruggles; Michael E Davern; Tami C Swenson

Drawing Statistical Inferences from Historical Census Data

J Michael Oakes, Steven J Ruggles, Michael E Davern, Tami C Swenson

Research output: Working paper

Abstract

Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1992, 1965). Such data can yield standard error estimates that differ dramatically from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p-values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely-used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the IPUMS project from 1850-1930 in order to determine (1) the impact of sample design on standard error estimates and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation and then we apply this approach to the 1850-1870 and 1900-1930 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples, and should be applied in research analyses that have the potential for substantial clustering effects.

Original language	English (US)
State	Published - 2007

Publication series

Name	Minnesota Population Center Working Paper Series

OpenUrl availability

Full text

Cite this

@techreport{d25cb54b4e3f46c294fc203aa59c2a0e,

title = "Drawing Statistical Inferences from Historical Census Data",

abstract = "Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1992, 1965). Such data can yield standard error estimates that differ dramatically from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p-values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely-used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the IPUMS project from 1850-1930 in order to determine (1) the impact of sample design on standard error estimates and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation and then we apply this approach to the 1850-1870 and 1900-1930 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples, and should be applied in research analyses that have the potential for substantial clustering effects.",

author = "Oakes, {J Michael} and Ruggles, {Steven J} and Davern, {Michael E} and Swenson, {Tami C}",

year = "2007",

language = "English (US)",

series = "Minnesota Population Center Working Paper Series",

type = "WorkingPaper",

}

TY - UNPB

T1 - Drawing Statistical Inferences from Historical Census Data

AU - Oakes, J Michael

AU - Ruggles, Steven J

AU - Davern, Michael E

AU - Swenson, Tami C

PY - 2007

Y1 - 2007

N2 - Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1992, 1965). Such data can yield standard error estimates that differ dramatically from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p-values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely-used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the IPUMS project from 1850-1930 in order to determine (1) the impact of sample design on standard error estimates and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation and then we apply this approach to the 1850-1870 and 1900-1930 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples, and should be applied in research analyses that have the potential for substantial clustering effects.

AB - Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1992, 1965). Such data can yield standard error estimates that differ dramatically from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p-values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely-used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the IPUMS project from 1850-1930 in order to determine (1) the impact of sample design on standard error estimates and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation and then we apply this approach to the 1850-1870 and 1900-1930 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples, and should be applied in research analyses that have the potential for substantial clustering effects.

M3 - Working paper

T3 - Minnesota Population Center Working Paper Series

BT - Drawing Statistical Inferences from Historical Census Data

ER -

Drawing Statistical Inferences from Historical Census Data

Abstract

Publication series

OpenUrl availability

Fingerprint

Cite this