Date: Fri, 16 Mar 2001 23:21:40 GMT
Reply-To: Tzachi Zach <zach@SSB.ROCHESTER.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Tzachi Zach <zach@SSB.ROCHESTER.EDU>
Organization: Time Warner Road Runner - Rochester NY
Subject: Re: How to pick random subset of data multiple times for simul
ation
I would generate a random number from say a uniform distribution
(x=ranuni(0)) and then subset the data based on the values of the random
varable. If you want 1% of the original data set then do the following:
data aa;
set a;
xx=ranuni(0);
where xx ge 0.99;
proc univariate etc.... to perform analysis on aa.
If you want exactly 28 observations, then you will need to make some
modifications, either in the value (28/20000 instead of .99) or in the data
set.
Next, you will need to include this code inside a loop, with a macro, and
repeat the procedure however many times you'd like.
<sander.post@STATCAN.CA> wrote in message
news:3A66CAF3B5D3D4119AFD00508BC286ADC7B204@msxa4.statcan.ca...
> -----Original Message-----
> From: Laurence P Adair [mailto:ladair@SWRI.JACADS.COM]
> Sent: March 16,2001 3:38 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: How to pick random subset of data multiple times for simulation
>
>
> Hi,
>
> I would like to run simulations on the performance of a statistic using my
> data as the population. This means I need to pick a random subset of the
> data, calculate the statistic for the subset, and re-iterate a large
number
> of times. The data population size is about 20,000 and a typical subset
> size will be 28. I am running SAS 6.12 under OS2, but also have SAS 8.0
> under OS2 available. My question is, what is an efficient way of doing
> this?
>
> I'm sure this is something that's been answered before, but I'm not having
> any luck running a search on the SAS-L listserve site.
>
> Thank you,
> Laurence Adair
> ladair@swri.jacads.com
>
> -----
>
> This is why they created proc surveyselect in version 8.
>
> Just an example:
>
>
> data x;
> do x=1 to 100;
> output;
> end;
> run;
>
> proc surveyselect data=x out=y rep=50 sampsize=10 method=srs;
> run;
>
>
> This will generate 50 samples of size 10 in the data set x. An additional
> variable called "replicate" is added to each record to indicate what
sample
> it is from, which you can use as a by variable in proc summary to figure
out
> means. It's part of SAS/STAT, as you'll probably want to read some more
> documentation.
>
> Hope this helps,
>
> Sander.
|