Date: Wed, 21 Aug 2002 14:27:37 -0700
Reply-To: Cassell.David@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: surveyselect
Content-type: text/plain; charset=us-ascii
Martine Ferguson <ferguson_m@BLS.GOV> wrote:
> I wish to use the PROC SURVEYSELECT procedure to program a PPS
Cluster
> Sample without Replacement (ie each cluster is selected without
replacement
> proportional to the size of that cluster).
> The problem I have is that I can only select up to a certain number of
> clusters and when I have reached my target number of units, I wish to
stop
> the procedure. For example, suppose people are split up into counties
(what
> I will call clusters) and I wish to sample a total of 100 people. I
want to
> keep selecting counties PPS without replacement until I have reached
my
> target sample size of 100. Now, I do not know ahead of time how many
> counties to sample because it all depends on which county is selected
> first, the randomness of sampling, and will vary with each sample I
select
> (in other words I cannot use the sampsize= option because I do now
know in
> advance what that number will be). Thefore, I cannot say, select say,
5
> counties because there may not be 100 people in these 5 counties or
there
> may be over 100 people in these 5 counties. I do not know how to do
this
> using SURVEYSELECT. Does anybody know of any way by which I can use
PROC
> SURVEYSELECT and tell the procedure to stop running once I have
sampled 100
> people?
Oversample, preserving the order in which the clusters appear. Have
PROC
SURVEYSELECT pull out enough clusters that you can be sure you will get
enough
people. Then sample in order, until you have reached your limit (100
people,
in your example). Stop there. You now have your desired cluster
sample.
You have selected K clusters, K unknown in advance, such that you have
sampled
WR PPS.
> In addition, I want to run this procedure for 1000 iterations using a
macro
> (ie collect 1000 different samples) and PROC SURVEYSELECT takes days
to run
> on my computer. Do you know of a way to shorten the processing time?
> I appreciate any insight anyone may have on this and I thank you in
advance
> for any help you can provide me with.
Don't do it with a macro. SURVEYSELECT has to start up and then re-read
the
whole data set in each iteration. Instead, try using the REP= option in
the
PROC SURVEYSELECT statement to get your 1000 replicates.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|