LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 2 Jun 2004 15:34:18 -0700
Reply-To:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:   Re: SPLIT DATASET RANDOMLY

I got two offline complaints that my method might not be random for data with cyclical patterns (correct!), so how about this...for every group of four records this pulls out one at random...

data big small; set sashelp.class;

retain oneof4; drop oneof4; if mod(_n_,4)=1 then oneof4=int(ranuni(13579)*4);

put _all_; if mod(_n_,4)=oneof4 then output small; else output big; run;

This gets an even 25/75 random split in one pass.

(David, you're absolutely right that procs nearly invariably beat datastep code in CPU efficiency, logic error, ease of coding, and flexibility and power hands down - they just can't match the fun doing it yourself! %)

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Choate, Paul@DDS Sent: Wednesday, June 02, 2004 2:14 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: SPLIT DATASET RANDOMLY

Joe -

This puts every fourth observation into one dataset and the other three in another...like dealing from a deck of cards...

data big small; set data; if mod(_n_,4)=0 then output small; else output big; run;

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Lustig, Roger Sent: Wednesday, June 02, 2004 1:53 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: SPLIT DATASET RANDOMLY

Do you need exactly 75/25? Or will an approximation do?

If the latter, then:

data big small; set total; if ranuni(987) < .75 then output big; else output small; run;

If you need exact numbers (rounded to the nearest integer), try:

data total2; set total; key=ranuni(987); run;

proc sort data=total2 out=total3; by key; run;

data big small; set total3 nobs=n_obs; retain cutoff; if _N_=1 then cutoff=round(.75*n_obs); if _N_ < cutoff then output big; else output small; run;

OK?

Roger

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Ludwig Sent: Wednesday, June 02, 2004 4:37 PM To: SAS-L@LISTSERV.UGA.EDU Subject: SPLIT DATASET RANDOMLY

Hi SAS expert,

I have a SAS dataset with 6,000 records apox. I need to split this dataset randowly into two datasets (one with 75% of the data and the other one with the ither 25%). How can I do this randomly in SAS???

Thanks,

Joe


Back to: Top of message | Previous page | Main SAS-L page