Achieving Anonymity Via Clustering
Achieving Anonymity via Clustering
Gagan Aggarwal1
Google Inc.
Mountain View, CA 94043
gagan@cs.stanford.edu
Toma´s Feder2
Comp. Sc. Dept.
Stanford University
Stanford, CA 94305
tomas@cs.stanford.edu
Krishnaram Kenthapadi2
Comp. Sc. Dept.
Stanford University
Stanford, CA 94305
kngk@cs.stanford.edu
Samir Khuller3
Comp. Sc. Dept.
University of Maryland
College Park, MD 20742
samir@cs.umd.edu
Rina Panigrahy2,4
Comp. Sc. Dept.
Stanford University
Stanford, CA 94305
rinap@cs.stanford.edu
Dilys Thomas2
Comp. Sc. Dept.
Stanford University
Stanford, CA 94305
dilys@cs.stanford.edu
An Zhu1
Google Inc.
Mountain View, CA 94043
anzhu@cs.stanford.edu
ABSTRACT
Publishing data for analysis from a table containing personal
records, while maintaining individual privacy, is a problem
of increasing importance today. The traditional approach of
de-identifying records is to remove identifying fields such as
social security number, name etc. However, recent research
has shown that a large fraction of the US population can be
identified using non-key attributes (called quasi-identifiers)
such as date of birth, gender, and zip code [15]. Sweeney [16]
proposed the k-anonymity model for privacy where non-key
attributes that leak information are suppressed or generalized
so that, for every record in the modified table, there are
at least k−1 other records having exactly the same values for
quasi-identifiers. We propose a new method for anonymizing
data records, where quasi-identifiers of data records are
first clustered and then cluster centers are published. To
ensure privacy of the data records, we impose the constraint
1This work...
Please login to view the full essay...