| Date: | Fri, 2 Jan 2004 20:09:55 GMT |
| Reply-To: | Arthur Tabachneck <art297@NETSCAPE.NET> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Arthur Tabachneck <art297@NETSCAPE.NET> |
| Subject: | Re: Automation |
|---|
Ougya,
From your example, I'm not certain I understand exactly what you are trying
to accomplish. If it is simply to obtain a stratefied sample, why not just
use proc surveyselect's strata feature?
If the "perc" variable does have to be considered, you could still use
strata, but with an controlling file specified in the procedure's sampsize
option. For example:
data bb;
input category $ counts perc;
perc=perc*10;
cards;
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 2 0.1
b 5 0.4
c 3 0.2
d 2 0.3
a 4 0.1
b 4 0.4
c 3 0.2
d 2 0.3
a 4 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
a 1 0.1
b 4 0.4
c 2 0.2
d 3 0.3
;
run;
proc sort data=bb;
by category;
run;
proc summary data=bb;
var perc;
by category;
output out=bb2 mean(perc)=_nsize_ ;
run;
proc surveyselect data=bb method=srs sampsize=bb2 out=bb1;
strata category notsorted;
run;
Art
----------
"ougya" <jieguo01@yahoo.com> wrote in message
news:fa55834f.0401021030.ef87ceb@posting.google.com...
> Hi, everyone,
>
> I have a question here and appreciate your help in advance.
>
> Question/purpose:
> I have a dataset aa, which has 3 variables and the structure looks
> like
>
> category counts perc
> a 1 0.1
> b 4 0.4
> c 2 0.2
> d 3 0.3
>
> Now, I need to build a dataset bb (which has 100 observations)
> retrieved from a big dataset (which has a variable 'category'). The
> requirement is that the selected categories in bb must have the same
> rate as 'perc' in aa.
>
> That is, if I use
> proc freq data=bb;
> tables category;
>
> it would give me
> category percent
> a 0.1
> b 0.4
> c 0.2
> d 0.3
>
> Solution
>
> The silly step that I can have is
>
> proc surveyselect data=aa method=srs samsize=10 out=bb1;
> where category='a';
> proc surveyselect data=aa method=srs samsize=40 out=bb2;
> where category='b';
> proc surveyselect data=aa method=srs samsize=20 out=bb3;
> where category='c';
> proc surveyselect data=aa method=srs samsize=30 out=bb4;
> where category='d';
>
> data bb;
> set bb1 bb2 bb3 bb4;
> run;
>
>
> I am not happy with it because if the 'category' has 100 values, I
> would have to repeat 100 times of surveyselect.
> I wonder whether some experts can have a nice & concise way to
> automatically retrieve 'perc' information for each 'category' from
> data aa and use it to retrive observations from the big dataset and
> finally build dataset bb.
>
> Thanks very much and happy new year!
>
> Jay
|