Date: Fri, 12 Jan 2007 15:21:52 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: how to replace SSNs with fake
Content-Type: text/plain; charset="us-ascii"
Best to do this very carefully.... What you call a fake ID will likely
become a surrogate ID for each person's SSN. Someone needs to take
responsibility for maintaining a 'key ring' or crosswalk dataset that
has a surrogate ID and its corresponding SSN in each row.
I'd construct the key ring in two steps. First, create a column of
distinct instances of SSN. In SAS SQL,
create table key as select distinct SSN from <dataset>;
Second, create a non-informative surrogate ID for each distinct SSN:
create table keyRing as select put(ranuni(1773)*100000000,z9.) as
ID,SSN from key;
One can then join the keyring to a dataset on SSN and substitute the ID
for the SSN in a new dataset. Reversing the process restores the SSN
when required for identification of subjects. For the ID I have used a
purely random number that will duplicate if applied to large numbers of
SSN. (Cehck for duplicated ID's.) In the event of duplicates, it will
take a somewhat more complicated process to guarantee distinct ID's.
I've mentioned a Data Privacy By Design paper that a colleague of mine
and I wrote some time back for a CDC conference. It illustrates some of
the uses of surrogate key ID's.
From: firstname.lastname@example.org [mailto:email@example.com]
On Behalf Of Jen
Sent: Friday, January 12, 2007 12:39 PM
Cc: Jennifer Sabatier
Subject: how to replace SSNs with fake
I have a file of information about people and I want to create an id
variable to replace SSN. In this file people have multiple rows, ie,
some SSNs have multiple rows, others don't.
I know this probably a simple request but I couldn't find something
similar in a search.