TY - GEN
T1 - You are what you say
T2 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
AU - Frankowski, Dan
AU - Cosley, Dan
AU - Sen, Shilad
AU - Terveen, Loren
AU - Riedl, John
PY - 2006
Y1 - 2006
N2 - In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address. This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.
AB - In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address. This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.
KW - Datasets
KW - Mentions
KW - Privacy
KW - Re-identification
KW - Sparse relation space
KW - k-anonymity
KW - k-identification
UR - http://www.scopus.com/inward/record.url?scp=33750357681&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750357681&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33750357681
SN - 1595933697
SN - 9781595933690
T3 - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 565
EP - 572
BT - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 6 August 2006 through 11 August 2006
ER -