| Date: | Mon, 21 Oct 2002 10:44:03 +1000 |
| Reply-To: | Peter Baade <Peter_Baade@HEALTH.QLD.GOV.AU> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Peter Baade <Peter_Baade@HEALTH.QLD.GOV.AU> |
| Subject: | Re: Record linkage |
| Content-Type: | text/plain; charset="us-ascii" |
This posting is mainly in an attempt to clarify some confusion on my part, so my apologies if I am stating the exceedingly obvious (or my ignorance).
If you have one table containing all the identifying variables (Table_ID), and then another table containing all the non-identifying variables (Table_data) , then don't you just need a single "random" variable that contains the link between Table_ID and Table_data?
Unless you have access to Table_ID, then isn't knowing how the link variable in Table_data has been randomised next to useless?
I guess it depends if you are keeping the ID variables in Table_data and just randomising their values, or splitting the data into two tables as described above.
Peter.
>>> Tim Churches <tchur@OPTUSHOME.COM.AU> 21/10/02 8:38:28 >>>
John Whittington wrote:
>
> At 05:30 20/10/02 +1000, Tim Churches wrote (in part):
>
> >Tim B wrote:
> > > Franck, The simplest way is to use an actual code: for each
> > > identifier in your data, assign a meaningless value and keep
> > > a list. Just do not lose the list.
> >
> >Yes, but those values need to be **really** meaningless i.e. as
> >random as possible. Such a list of random numbers, used only once to
> >encrypt another list, is called a one-time pad. Don't use any
> >software-based (pseudo-)random number generator to generate your one-time pad.
>
> Tim, whilst that's obviously very correct advice in relation to
> cryptography, isn't it really rather 'over the top' in the present
> context. If, as I understand it, the intent is simply to 'anonymise' the > true identity of data (SAS observations), then even the 'list of true IDs'
> is presumably not going to be known to third parties - so even sequential > numbering (with a securely stored 'look up list') would probably suffice,
> and any form of 'erratic' (even if not truely 'random') numbers would be > more than good enough. ... or am I (as so often!) missing something?
The effort which should be put into the protection of privacy and
confidentiality really depends on the hazard (i.e. consequences)
associated
with loss of that protection. It is dangerous to make judgements on the
correct degree of protection to be afforded to other people's personal
information (at least not without asking them first, and you can guess
what
the answer will be), so the best policy is to employ the best protection
which is feasible. That does not necessarily mean huge expense or
complexity.
My point was that the protection offered by XORing with a one-time pad
depends entirely on the quality (randomness) of that one-time pad. If
the
one time-pad is predictable, then the XOR encryption can be broken. How
easily depends on how predictable the one-time pad is.
The use of a high quality software-based pseudo-random number
generators,
such as a Mersenne Twister or similar, may well be good enough. I don't
think
the algorithms used by the random number generators in SAS are
documented, are
they? If they are not, you should not assume they are of cryptographic
quality.
But high quality sources of random numbers, like those provided by
/dev/random in Linux,
are readily available, and thus you need to be able to justify any
decision
not to use them when protecting other people's privacy and
confidentiality.
Tim C
>
> Kind Regards
>
> John
>
> ----------------------------------------------------------------
> Dr John Whittington, Voice: +44 (0) 1296 730225
> Mediscience Services Fax: +44 (0) 1296 738893
> Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
> Buckingham MK18 4EL, UK mediscience@compuserve.com
> ----------------------------------------------------------------
**********************************************************************
This e-mail, including any attachments sent with it, is confidential
and for the sole use of the intended recipient(s). This confidentiality
is not waived or lost if you receive it and you are not the intended
recipient(s), or if it is transmitted/ received in error.
Any unauthorised use, alteration, disclosure, distribution or review
of this e-mail is prohibited. It may be subject to a statutory duty of
confidentiality if it relates to health service matters.
If you are not the intended recipient(s), or if you have received this
e-mail in error, you are asked to immediately notify the sender by
telephone or by return e-mail. You should also delete this e-mail
message and destroy any hard copies produced.
**********************************************************************
|