Date: Mon, 21 Oct 2002 08:38:28 +1000
Reply-To: Tim Churches <tchur@OPTUSHOME.COM.AU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Tim Churches <tchur@OPTUSHOME.COM.AU>
Subject: Re: Record linkage
Content-Type: text/plain; charset=us-ascii
John Whittington wrote:
>
> At 05:30 20/10/02 +1000, Tim Churches wrote (in part):
>
> >Tim B wrote:
> > > Franck, The simplest way is to use an actual code: for each
> > > identifier in your data, assign a meaningless value and keep
> > > a list. Just do not lose the list.
> >
> >Yes, but those values need to be **really** meaningless i.e. as
> >random as possible. Such a list of random numbers, used only once to
> >encrypt another list, is called a one-time pad. Don't use any
> >software-based (pseudo-)random number generator to generate your one-time pad.
>
> Tim, whilst that's obviously very correct advice in relation to
> cryptography, isn't it really rather 'over the top' in the present
> context. If, as I understand it, the intent is simply to 'anonymise' the
> true identity of data (SAS observations), then even the 'list of true IDs'
> is presumably not going to be known to third parties - so even sequential
> numbering (with a securely stored 'look up list') would probably suffice,
> and any form of 'erratic' (even if not truely 'random') numbers would be
> more than good enough. ... or am I (as so often!) missing something?
The effort which should be put into the protection of privacy and
confidentiality really depends on the hazard (i.e. consequences)
associated
with loss of that protection. It is dangerous to make judgements on the
correct degree of protection to be afforded to other people's personal
information (at least not without asking them first, and you can guess
what
the answer will be), so the best policy is to employ the best protection
which is feasible. That does not necessarily mean huge expense or
complexity.
My point was that the protection offered by XORing with a one-time pad
depends entirely on the quality (randomness) of that one-time pad. If
the
one time-pad is predictable, then the XOR encryption can be broken. How
easily depends on how predictable the one-time pad is.
The use of a high quality software-based pseudo-random number
generators,
such as a Mersenne Twister or similar, may well be good enough. I don't
think
the algorithms used by the random number generators in SAS are
documented, are
they? If they are not, you should not assume they are of cryptographic
quality.
But high quality sources of random numbers, like those provided by
/dev/random in Linux,
are readily available, and thus you need to be able to justify any
decision
not to use them when protecting other people's privacy and
confidentiality.
Tim C
>
> Kind Regards
>
> John
>
> ----------------------------------------------------------------
> Dr John Whittington, Voice: +44 (0) 1296 730225
> Mediscience Services Fax: +44 (0) 1296 738893
> Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
> Buckingham MK18 4EL, UK mediscience@compuserve.com
> ----------------------------------------------------------------
|