Date: Thu, 17 May 2001 19:58:40 +0100 John Whittington "SAS(r) Discussion" John Whittington Re: Randomly pick text/plain; charset="us-ascii"

At 12:49 17/05/01 -0500, Paul Thompson wrote:

JW| > The algorithm is 'perfect', not just good. PT| > Disagree. If by "perfect" you mean that you will get 10 if PT| > you ask for 10, it is perfect. If you mean that the algorithm PT| > implements a simple random sampling scheme for a dataset, this PT| > is patently incorrect.

I'm afraid I cannot agree. As far as I can see, the algorithm itself is a 'perfect' implementation of simple random sampling - and the only deviations from perfection will be a consequence of any imperfections in the PRNG used.

>Au contraire, the deviations from simple random selection are >intrinsic, fundamental to the algorithm, and quite important. >PRACTICALLY, they are probably not important.

Again, I disagree. I can see no intrinsic deviations from SRS.

>In looking at your proof, I have no objections to it. It is, of >course, an asymptotic proof. When working with a given, small sample, >conditional probabilities will vary ENORMOUSLY from the marginal >probabilities. This is, of course, called the binomial theorem, but >it is a very complex one, due to the varying sample sizes. Thus, the >proof holds for the LONG RUN (what doesn't in statistics). For a given >sample, it is not applicable. >It is NOT a simple random sampling scheme, if we define such a beast >as "one in which every observation has an equal chance of selection of >the stated likelihood k/n".

I'm afraid that (maybe through my ignorance) I don't understand your argument. As far as I am concerned, the algorithm DOES give an equal probability (equal to k/n) of any observation being selected - as per the 'proof by induction' which I provided.

If what you are saying is correct, it ought to be possible for you to give me an example of a simple (i.e. small!) 'extreme' case in which one or more of the observations had a different probability of being selected than the others. Can you do that.

'Probability of selection' is, of course, a concept which only has real meaning 'in the long run' - since it means that if one ran the algorithm a very large number of times, each observation would be selected an approximately equal number of times (exactly equal for an infinite number of iterations!).

>I have been enjoying reading the discussions of this algorithm. >At this point, I would characterize it as a "random technique." >On the margin, the conditional likelihood will be the stated one, >but EACH INDIVIDUAL OBSERVATION HAS A DIFFERENT PROBABILITY OF >SELECTION. Almost NO OBSERVATIONS will be selected with a likelihood >of the marginal one. It is very important to realize that the likelihood >of selection is NOT the stated marginal likelihood. Rather, the likelihood >ranges up and down, depending on the local results.

As above, I still disagree - and the 'proof' of your view which you go on to provide indicates to me that you have a totally different concept of the meaning of a 'simple random sample' than I do! The actual probability of any particular observation being selected during any particular run of the algorithm is essentially a meaningless concept - since that probability is either 0 or 1; the concept of the probability of any particular observation being selected is only meaningful in the context of a large number of re-runs of the algorithm.

That's how I see it, anyway - we need a referee!

Kind Regards,

John

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk Buckingham MK18 4EL, UK mediscience@compuserve.com ----------------------------------------------------------------

Back to: Top of message | Previous page | Main SAS-L page