LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2002, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 21 Oct 2002 10:44:03 +1000
Reply-To:   Peter Baade <Peter_Baade@HEALTH.QLD.GOV.AU>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Peter Baade <Peter_Baade@HEALTH.QLD.GOV.AU>
Subject:   Re: Record linkage
Content-Type:   text/plain; charset="us-ascii"

This posting is mainly in an attempt to clarify some confusion on my part, so my apologies if I am stating the exceedingly obvious (or my ignorance).

If you have one table containing all the identifying variables (Table_ID), and then another table containing all the non-identifying variables (Table_data) , then don't you just need a single "random" variable that contains the link between Table_ID and Table_data?

Unless you have access to Table_ID, then isn't knowing how the link variable in Table_data has been randomised next to useless?

I guess it depends if you are keeping the ID variables in Table_data and just randomising their values, or splitting the data into two tables as described above.

Peter.

>>> Tim Churches <tchur@OPTUSHOME.COM.AU> 21/10/02 8:38:28 >>> John Whittington wrote: > > At 05:30 20/10/02 +1000, Tim Churches wrote (in part): > > >Tim B wrote: > > > Franck, The simplest way is to use an actual code: for each > > > identifier in your data, assign a meaningless value and keep > > > a list. Just do not lose the list. > > > >Yes, but those values need to be **really** meaningless i.e. as > >random as possible. Such a list of random numbers, used only once to > >encrypt another list, is called a one-time pad. Don't use any > >software-based (pseudo-)random number generator to generate your one-time pad. > > Tim, whilst that's obviously very correct advice in relation to > cryptography, isn't it really rather 'over the top' in the present > context. If, as I understand it, the intent is simply to 'anonymise' the > true identity of data (SAS observations), then even the 'list of true IDs' > is presumably not going to be known to third parties - so even sequential > numbering (with a securely stored 'look up list') would probably suffice, > and any form of 'erratic' (even if not truely 'random') numbers would be > more than good enough. ... or am I (as so often!) missing something?

The effort which should be put into the protection of privacy and confidentiality really depends on the hazard (i.e. consequences) associated with loss of that protection. It is dangerous to make judgements on the correct degree of protection to be afforded to other people's personal information (at least not without asking them first, and you can guess what the answer will be), so the best policy is to employ the best protection which is feasible. That does not necessarily mean huge expense or complexity.

My point was that the protection offered by XORing with a one-time pad depends entirely on the quality (randomness) of that one-time pad. If the one time-pad is predictable, then the XOR encryption can be broken. How easily depends on how predictable the one-time pad is.

The use of a high quality software-based pseudo-random number generators, such as a Mersenne Twister or similar, may well be good enough. I don't think the algorithms used by the random number generators in SAS are documented, are they? If they are not, you should not assume they are of cryptographic quality.

But high quality sources of random numbers, like those provided by /dev/random in Linux, are readily available, and thus you need to be able to justify any decision not to use them when protecting other people's privacy and confidentiality.

Tim C

> > Kind Regards > > John > > ---------------------------------------------------------------- > Dr John Whittington, Voice: +44 (0) 1296 730225 > Mediscience Services Fax: +44 (0) 1296 738893 > Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk > Buckingham MK18 4EL, UK mediscience@compuserve.com > ----------------------------------------------------------------

********************************************************************** This e-mail, including any attachments sent with it, is confidential and for the sole use of the intended recipient(s). This confidentiality is not waived or lost if you receive it and you are not the intended recipient(s), or if it is transmitted/ received in error.

Any unauthorised use, alteration, disclosure, distribution or review of this e-mail is prohibited. It may be subject to a statutory duty of confidentiality if it relates to health service matters.

If you are not the intended recipient(s), or if you have received this e-mail in error, you are asked to immediately notify the sender by telephone or by return e-mail. You should also delete this e-mail message and destroy any hard copies produced. **********************************************************************


Back to: Top of message | Previous page | Main SAS-L page