| Date: | Fri, 21 Feb 1997 09:24:16 GMT |
| Reply-To: | schick@hrz.uni-marburg.de |
| Sender: | "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU> |
| From: | Arnold Schick <schick@HRZ.UNI-MARBURG.DE> |
| Organization: | HRZ Uni Marburg |
| Subject: | Re: random selecting obs. from dataset |
| Content-Type: | text/plain; charset=us-ascii |
Hello,
Kunling Lu asked:
>Does anybody have a macro that selects random number of observations from a
SAS
>dataset? I need to pick up 50 obs. from a SAS dataset of about 4,000. What
>random generating function do I go? Thank you for help.
for that purpose of a random choice of observations from
a SAS data set, there were developed a lot of SAS code and
sent on SAS-L.
Well, the SAS code that I've written, is situated on the Web
there:
http://staff-www.uni-marburg.de/~schick/sasmacros/
And you can find on that site, the appended SAS macro here.
It returns a >random< selection; this means that the exact number of
observations is accidental. Often, there equals the exact defined
pct parameter with the choice. Perhaps (by chance), more than one run
are to perform, to return a desired choice.
At the end of the sent macro, there is an example for calling.
Regards,
Arnold Schick
-------------------------------------------------------please-cut-here---
/* This macro selects from dataset IN n PCT
data and stores the choosen data into OUT.
No duplicate observation will be selected.
IN, OUT are macro call parameter for the dataset name
PCT is a macros call parameter for the numeric value of pct.
_RESERVE is an internal dataset name which does create this macro
Note: no default macro parameters are in use.
Selection of a lower number of obersations results inexactly.
Written: January 16, 1996
Author: Arnold Schick, University of Marburg/Germany
*/
options nosource;
%macro choice(in,out,pct);
options nonotes nomprint nosymbolgen nostimer nosource;
data &out (drop=stored any_more res_fact)
_reserve (drop=stored any_more res_fact);
set &in nobs=N end=last;
which = _N_;
if round(log(0.66-1/(0.01*&pct-1))*ranuni(0),1)
then do;
stored+1;
output &out;
end;
else do;
if &pct > 90 then output _reserve;
else if &pct < 84 then res_fact = 0.0065;
else res_fact = 0.0020;
if round(log(0.66-1/(res_fact*&pct-1))*ranuni(0),1)
then output _reserve;
end;
any_more = stored - round(&pct/100*N,1);
if last then call symput('diff',any_more);
run;
%if &diff > 0 %then %do;
data &out (drop=i);
set &out nobs=N;
if i < &diff
then if round(log(0.66-1/(&diff/N-1))
*ranuni(0),1) then do; i+1; delete;
end;
run;
%end;
%else %if &diff ^= 0 %then %do;
data _reserve (drop=i);
set _reserve nobs=N;
if i < N+&diff
then if round(log(0.66-1/((N+&diff)
/N-1))*ranuni(0),1) then do; i+1;
delete; end;
run;
data &out;
update &out _reserve;
by which;
run;
%end;
proc datasets nolist;
delete _reserve;
quit;
options notes;
data &out;
set &out;
run;
options stimer source;
%mend choice; options source;
*Example;
data one;
do h=1 to 4000;
p=h+h;
output;
end;
run;
%choice(one,two,1.25); *selects ~50 (1.25%) OBS from the data;
proc print data=two; run; *prints dataset TWO;
%choice(one,two,1.25);
%choice(one,two,1.25);
%choice(one,two,1.25);
|