|Date: ||Fri, 22 Nov 2002 01:12:03 +0000|
|Reply-To: ||John Whittington <John.W@MEDISCIENCE.CO.UK>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||John Whittington <John.W@MEDISCIENCE.CO.UK>|
|Subject: ||Re: "Toy" SAS Problem simulates Deal of 52 Playing Cards (install
|Content-Type: ||text/plain; charset="us-ascii"; format=flowed|
At 15:44 21/11/02 -0500, Ian Whitlock wrote:
>In response to Mark's toy problem, Richard gave a simple randomization of
>the order of an array DECK.
> > *shuffle;
> > do i = 0 to 1e5;
> > index1 = 1 + 52 * ranuni (0);
> > index2 = 1 + 52 * ranuni (0);
> > work = deck[index1];
> > deck[index1] = deck[index2];
> > deck[index2] = work;
> > end;
>I am bothered by the thought of doing this "shuffle" say ten-thousand
>times and want to replace it with
> do index1 = 1 to 52;
> index2 = 1 + 52 * ranuni (0);
> work = deck[index1];
> deck[index1] = deck[index2];
> deck[index2] = work;
>My question is statistical. Is there any sense in which Richard's shuffle
>is more (or less) random than mine?
I think the first thing to be considered is what one is trying to achieve
here. If the intent is to simulate real-world card shuffling, then clearly
both of those approaches represent a poor model - and nor is the question
'which is more random' the same as 'which model is closest to real-world
However, let's forget that for a moment and assume that all you are seeking
is to achieve (or approximate to) a random ordering of the cards. As I see
it, Richard has taken a 'brute force' approach and emulated a crude sort of
shuffling, in which random pairs of cards are repeatedly swapped (including
the possibility of 'swapping with itself') - but has seemingly gone way
beyond the real-world situation by doing this 1e5 times.
You, on the other hand, have just gone through the 52 'positions' in the
deck once, swapping the card in each position with another chosen at random
(which could be itself). I may be wrong, but it looks to me as if your
approach results (assuming the PRNG to be random) in an equal probability
that any card will end up in any position in the deck, regardless of what
has happened to any other cards. If that is the case, then your 'process'
will have created random ordering.
By the same token, I think that Richard's method would also achieve the
same (albeit at much greater processing cost) - and probably would do so
with far less iterations - although it's a somewhat more complicated
process to 'get one's head around'. I therefore don't think that either of
you win in terms of 'randomness' - although you obviously win in terms of
>If it makes sense to ask such a question, then what are the statistical
>tools to quantify this difference in randomness?
>Please keep it simple, your talking to a statistically crippled
As I've said, I don't think there would be a difference. However, the
crudest test of the first requisite of randomness (equal probability of any
card being in any position in the deck) would be to run the process (yours
or Richard's) a large number of times (1e4, 1e5, 1e6 or whatever your
machine could do in a sensible time), accumulating the sum of 'card
numbers' for each of the 'positions' in the pack. Given N repetitions of
the process, the 'expected' total for each position would be 26*N if your
cards were numbered from 0 to 51, or 26.5*N if they were numbered from 1 to
52. If you then took, for each position in the deck, the absolute value of
the difference between the observed total and this expected one, and then
summed those absolute differences across all 52 positions in the deck,
you'd get a figure representing the deviation from the expected uniform
distribution of totals, a figure which would increase if your 'shuffling'
process deviated from randomness. Do the same for the other method
(Richard's or yours) and you could see which had the lower number. As I've
said, I think they would be similar.
Testing the other main requirement of randomness - the fact that the
probability of one card being in any position is independent of where any
other cards are, that's a lot more tedious!
However, as above, and subject to the agreement of those more clever
statistical minds than mine around here, I don't think that such testing is
necessary - since I think that one can deduce from the PROCESS of
determining the ordering (i.e. the algorithm), particularly for your
approach, that, subject only to the PNRG being random, you would inevitably
be achieving a 'random process'.
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK email@example.com