Date: Fri, 22 Nov 2002 01:12:03 +0000 John Whittington "SAS(r) Discussion" John Whittington Re: "Toy" SAS Problem simulates Deal of 52 Playing Cards (install ment 1) To: Ian Whitlock text/plain; charset="us-ascii"; format=flowed

At 15:44 21/11/02 -0500, Ian Whitlock wrote:

>In response to Mark's toy problem, Richard gave a simple randomization of >the order of an array DECK. > > > *shuffle; > > do i = 0 to 1e5; > > index1 = 1 + 52 * ranuni (0); > > index2 = 1 + 52 * ranuni (0); > > work = deck[index1]; > > deck[index1] = deck[index2]; > > deck[index2] = work; > > end; > >I am bothered by the thought of doing this "shuffle" say ten-thousand >times and want to replace it with > > *shuffle; > do index1 = 1 to 52; > index2 = 1 + 52 * ranuni (0); > work = deck[index1]; > deck[index1] = deck[index2]; > deck[index2] = work; > end; > >My question is statistical. Is there any sense in which Richard's shuffle >is more (or less) random than mine?

I think the first thing to be considered is what one is trying to achieve here. If the intent is to simulate real-world card shuffling, then clearly both of those approaches represent a poor model - and nor is the question 'which is more random' the same as 'which model is closest to real-world shuffling'.

However, let's forget that for a moment and assume that all you are seeking is to achieve (or approximate to) a random ordering of the cards. As I see it, Richard has taken a 'brute force' approach and emulated a crude sort of shuffling, in which random pairs of cards are repeatedly swapped (including the possibility of 'swapping with itself') - but has seemingly gone way beyond the real-world situation by doing this 1e5 times.

You, on the other hand, have just gone through the 52 'positions' in the deck once, swapping the card in each position with another chosen at random (which could be itself). I may be wrong, but it looks to me as if your approach results (assuming the PRNG to be random) in an equal probability that any card will end up in any position in the deck, regardless of what has happened to any other cards. If that is the case, then your 'process' will have created random ordering.

By the same token, I think that Richard's method would also achieve the same (albeit at much greater processing cost) - and probably would do so with far less iterations - although it's a somewhat more complicated process to 'get one's head around'. I therefore don't think that either of you win in terms of 'randomness' - although you obviously win in terms of 'efficiency'.

>If it makes sense to ask such a question, then what are the statistical >tools to quantify this difference in randomness? >Please keep it simple, your talking to a statistically crippled >non-believing programmer.

As I've said, I don't think there would be a difference. However, the crudest test of the first requisite of randomness (equal probability of any card being in any position in the deck) would be to run the process (yours or Richard's) a large number of times (1e4, 1e5, 1e6 or whatever your machine could do in a sensible time), accumulating the sum of 'card numbers' for each of the 'positions' in the pack. Given N repetitions of the process, the 'expected' total for each position would be 26*N if your cards were numbered from 0 to 51, or 26.5*N if they were numbered from 1 to 52. If you then took, for each position in the deck, the absolute value of the difference between the observed total and this expected one, and then summed those absolute differences across all 52 positions in the deck, you'd get a figure representing the deviation from the expected uniform distribution of totals, a figure which would increase if your 'shuffling' process deviated from randomness. Do the same for the other method (Richard's or yours) and you could see which had the lower number. As I've said, I think they would be similar.

Testing the other main requirement of randomness - the fact that the probability of one card being in any position is independent of where any other cards are, that's a lot more tedious!

However, as above, and subject to the agreement of those more clever statistical minds than mine around here, I don't think that such testing is necessary - since I think that one can deduce from the PROCESS of determining the ordering (i.e. the algorithm), particularly for your approach, that, subject only to the PNRG being random, you would inevitably be achieving a 'random process'.

Kind Regards

John

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk Buckingham MK18 4EL, UK mediscience@compuserve.com ----------------------------------------------------------------

Back to: Top of message | Previous page | Main SAS-L page