|
I got two offline complaints that my method might not be random for data
with cyclical patterns (correct!), so how about this...for every group of
four records this pulls out one at random...
data big small;
set sashelp.class;
retain oneof4; drop oneof4;
if mod(_n_,4)=1 then oneof4=int(ranuni(13579)*4);
put _all_;
if mod(_n_,4)=oneof4 then output small;
else output big;
run;
This gets an even 25/75 random split in one pass.
(David, you're absolutely right that procs nearly invariably beat datastep
code in CPU efficiency, logic error, ease of coding, and flexibility and
power hands down - they just can't match the fun doing it yourself! %)
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Choate,
Paul@DDS
Sent: Wednesday, June 02, 2004 2:14 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SPLIT DATASET RANDOMLY
Joe -
This puts every fourth observation into one dataset and the other three in
another...like dealing from a deck of cards...
data big small;
set data;
if mod(_n_,4)=0 then output small;
else output big;
run;
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Lustig,
Roger
Sent: Wednesday, June 02, 2004 1:53 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SPLIT DATASET RANDOMLY
Do you need exactly 75/25? Or will an approximation do?
If the latter, then:
data big small;
set total;
if ranuni(987) < .75 then output big;
else output small;
run;
If you need exact numbers (rounded to the nearest integer), try:
data total2;
set total;
key=ranuni(987);
run;
proc sort data=total2 out=total3;
by key;
run;
data big small;
set total3 nobs=n_obs;
retain cutoff;
if _N_=1 then cutoff=round(.75*n_obs);
if _N_ < cutoff then output big;
else output small;
run;
OK?
Roger
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Ludwig
Sent: Wednesday, June 02, 2004 4:37 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SPLIT DATASET RANDOMLY
Hi SAS expert,
I have a SAS dataset with 6,000 records apox. I need to split this
dataset randowly into two datasets (one with 75% of the data and the
other one with the ither 25%). How can I do this randomly in SAS???
Thanks,
Joe
|