TY - GEN
T1 - Controlled shuffling, statistical confidentiality and microdata utility
T2 - International Conference on Privacy in Statistical Databases, PSD 2014
AU - McCaa, Robert
AU - Muralidhar, Krishnamurty
AU - Sarathy, Rathindra
AU - Comerford, Michael
AU - Esteve-Palos, Albert
PY - 2014/1/1
Y1 - 2014/1/1
N2 - IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.
AB - IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.
KW - Controlled shuffling
KW - Data privacy
KW - Data utility
KW - IPUMS-International
KW - Ireland
KW - Microdata sample
KW - Population census
KW - Statistical disclosure controls
UR - http://www.scopus.com/inward/record.url?scp=84949128481&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949128481&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-11257-2_25
DO - 10.1007/978-3-319-11257-2_25
M3 - Conference contribution
AN - SCOPUS:84949128481
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 326
EP - 337
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A2 - Domingo-Ferrer, Josep
PB - Springer Verlag
Y2 - 17 September 2014 through 19 September 2014
ER -