|
At 21:54 30/03/99 EST, KarlGerber@aol.com wrote:
>You're right John, the distribution of random variable is irrelevant provided
>the size of file to be sorted does not exceed the number of unique values in
>the distribution. But lets consider an extreme situation of using dichotomous
>variable as a random number generator:
> ....
>The probability of selection as "first.firm" depends here on the original
>order of data, so selection is no longer random.
Karl - Well, yes, I had 'taken it for granted' that we were talking about
continuous distributions! As you say, for the selection to be truely random
(unrelated to the orginal order of the data), every observation has to be
allocated a unique random value. In the real world, with machine precision
being what it is, unless one is dealing with an extremely large dataset (in
which case this method for obtaining a random sample is probably very
unwise, anyway), the chances of 'ties' using any computer-derived continuous
random function are pretty small. However, if that is a concern, the risk
of any ties occurring is clearly at it's least with a uniform distribution
(which is what virtually all of us would use for this purpose) - since the
values of the randome variate are then 'maximally spread out'.
My real problem with what you originally wrote was your implication that the
distribution chosen for the distribution of the random 'sort' variable was
in some way related to the nature of the data. If you recall, you wrote:
>If your data has other than normal distribution
>select any of a dozen random number functions
>that matches your distribution
Whilst, as above, there are some extreme cases (enormous data sets) in which
(because of the finite precision of a PRNG) there could be an argument for
choosing a particular random variable distribution, the best choice is
always going to be 'uniform', regardless of the distribution of the data.
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: medisci@powernet.com
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------
|