Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database

Robert McCaa; Krishnamurty Muralidhar; Rathindra Sarathy; Michael Comerford; Albert Esteve-Palos

doi:10.1007/978-3-319-11257-2_25

Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database

Robert McCaa, Krishnamurty Muralidhar, Rathindra Sarathy, Michael Comerford, Albert Esteve-Palos

Institute for Social Research & Data Innovation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.

Original language	English (US)
Title of host publication	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Editors	Josep Domingo-Ferrer
Publisher	Springer Verlag
Pages	326-337
Number of pages	12
ISBN (Electronic)	9783319112565
DOIs	https://doi.org/10.1007/978-3-319-11257-2_25
State	Published - 2014
Event	International Conference on Privacy in Statistical Databases, PSD 2014 - Ibiza, Spain Duration: Sep 17 2014 → Sep 19 2014

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	8744
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	International Conference on Privacy in Statistical Databases, PSD 2014
Country/Territory	Spain
City	Ibiza
Period	9/17/14 → 9/19/14

Bibliographical note

Publisher Copyright:
© IFIP International Federation for Information Processing 2011.

Keywords

Controlled shuffling
Data privacy
Data utility
IPUMS-International
Ireland
Microdata sample
Population census
Statistical disclosure controls

Access

10.1007/978-3-319-11257-2_25

OpenUrl availability

Full text

Cite this

McCaa, R., Muralidhar, K., Sarathy, R., Comerford, M., & Esteve-Palos, A. (2014). Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. In J. Domingo-Ferrer (Ed.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 326-337). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8744). Springer Verlag. https://doi.org/10.1007/978-3-319-11257-2_25

Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. / McCaa, Robert; Muralidhar, Krishnamurty; Sarathy, Rathindra et al.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). ed. / Josep Domingo-Ferrer. Springer Verlag, 2014. p. 326-337 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8744).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

McCaa, R, Muralidhar, K, Sarathy, R, Comerford, M & Esteve-Palos, A 2014, Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. in J Domingo-Ferrer (ed.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8744, Springer Verlag, pp. 326-337, International Conference on Privacy in Statistical Databases, PSD 2014, Ibiza, Spain, 9/17/14. https://doi.org/10.1007/978-3-319-11257-2_25

McCaa R, Muralidhar K, Sarathy R, Comerford M, Esteve-Palos A. Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. In Domingo-Ferrer J, editor, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag. 2014. p. 326-337. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11257-2_25

McCaa, Robert ; Muralidhar, Krishnamurty ; Sarathy, Rathindra et al. / Controlled shuffling, statistical confidentiality and microdata utility : A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). editor / Josep Domingo-Ferrer. Springer Verlag, 2014. pp. 326-337 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{37c9e656ed3c448084155656acc9dd2b,

title = "Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database",

abstract = "IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.",

keywords = "Controlled shuffling, Data privacy, Data utility, IPUMS-International, Ireland, Microdata sample, Population census, Statistical disclosure controls",

author = "Robert McCaa and Krishnamurty Muralidhar and Rathindra Sarathy and Michael Comerford and Albert Esteve-Palos",

note = "Publisher Copyright: {\textcopyright} IFIP International Federation for Information Processing 2011.; International Conference on Privacy in Statistical Databases, PSD 2014 ; Conference date: 17-09-2014 Through 19-09-2014",

year = "2014",

doi = "10.1007/978-3-319-11257-2_25",

language = "English (US)",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "326--337",

editor = "Josep Domingo-Ferrer",

booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Controlled shuffling, statistical confidentiality and microdata utility

T2 - International Conference on Privacy in Statistical Databases, PSD 2014

AU - McCaa, Robert

AU - Muralidhar, Krishnamurty

AU - Sarathy, Rathindra

AU - Comerford, Michael

AU - Esteve-Palos, Albert

N1 - Publisher Copyright: © IFIP International Federation for Information Processing 2011.

PY - 2014

Y1 - 2014

N2 - IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.

AB - IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.

KW - Controlled shuffling

KW - Data privacy

KW - Data utility

KW - IPUMS-International

KW - Ireland

KW - Microdata sample

KW - Population census

KW - Statistical disclosure controls

UR - http://www.scopus.com/inward/record.url?scp=84949128481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949128481&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-11257-2_25

DO - 10.1007/978-3-319-11257-2_25

M3 - Conference contribution

AN - SCOPUS:84949128481

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 326

EP - 337

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

A2 - Domingo-Ferrer, Josep

PB - Springer Verlag

Y2 - 17 September 2014 through 19 September 2014

ER -

Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database

Abstract

Publication series

Other

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this