Date: Tue, 20 Dec 2005 22:15:57 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: PPS sampling
In-Reply-To: <200512201542.jBKF0qVL012059@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
mehdi_soleymani@SOFTHOME.NET wrote:
>I am going to sample from a population with a positive variable as the
>sampling weight i.e, I want to sample by PPS method. but as you know the
>algorithms in sas for pps are restricted ,that is, the observation weight
>should not be grater than 1/sampsize. But I want to sample from the
>population and for some of stratum the sample weight may exceeds the
>1/sampsize. how can I do this. I don't want to use method such as pps_seq
>or... which are drawing with replacement or minimum replacement.
>it is a very usual case in application!!!
>the speed is not in consideration.
Okay, I'm concerned. I cannot tell whether the problem is one of difficulty
in
expressing yourself, or one of trouble with the sampling concepts. For lack
of
any idea which one is right, I'm going to assume that you have expressed
your
meaning exactly. So I'll speak to the problems with what you have written,
even if these are not what you meant.
[1] Sampling with strata is not the same as sampling with no strata. You
have to
re-structure your frame of reference accordingly. If you want to sample PPS
within
each stratum, then you have to treat each stratum separately when you think
about features like this. So you only need to think about the
sub-population size,
the sample size, and the relative weights within each stratum. Separately.
[2] When you do your PPS sampling, you use a SIZE variable. This is not a
weight,
or a relative weight. It is a multiplier. In fact, it turns out that your
SIZE variable
will be a constant times your inclusion probability, and will be a constant
times one
over the sampling weight. So your multiplier is very different from your
sampling
weight, and as your multiplier gets larger, your sampling weight goes down.
[3] Your statement "for some of stratum the sample weight may exceeds the
1/sampsize" seems to indicate a mistake. It's not the sample weight that
matters
here. It is the *relative* weight. That's your sampling weight divided by
the sum
of samplnig weights. In your stratum of interest. If your weight is
greater than
the sum of all the weights in the stratum divided by the sample size for the
stratum, AND you want to sample without replacement, then you have a
problem.
Is this going to be a problem once you split this out by strata and
re-consider things?
[4] If you still have the above problem, then think in terms of the task.
Do you
want to pick all such records with 'large' relative weights with 100%
certainty?
Or do you want to pick them with just a high degree of probability? In the
first
case, you have what we call 'certainty sampling'. Either way, you need to
look
at the CERTSIZE and MAXSIZE (and maybe even MINSIZE) options in PROC
SURVEYSELECT.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/