LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 25 Oct 2007 09:40:20 -0700
Reply-To:     "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Subject:      Re: Selecting a Random Sample
In-Reply-To:  <1193327691.207572.41890@22g2000hsm.googlegroups.com>
Content-Type: text/plain; charset=iso-8859-1

I missed the original post. I don't know if the original poster wants to get a 20% sample of just the IDs or wants all the records for the 20% sample of IDs. Here is one way of getting either.

data sample; in_sample=uniform(32751) LT .2; do until(last.ID); set a; by ID;

**if you want all records of your 20% sample, output here; if in_sample then output; end;

**if you want only the IDs, then output here; **if in_sample then output;run;

Hope this is helpful,

Dan

Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Shiling Zhang > Sent: Thursday, October 25, 2007 8:55 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Selecting a Random Sample > > On Oct 24, 3:12 am, a...@hotmail.com wrote: > > Hello, > > > > My data has the following format: > > Data A; > > ID Year Type > > 1 1999 A > > 1 2000 A > > 1 2001 B > > 1 2001 C > > 2 1988 H > > 3 1989 C > > 4 2001 G > > 4 1998 Y > > 5 2001 B > > > > I want to select a random 20% sample of the IDs. > > > > So for example, > > > > The output could be: > > > > 4 2001 G > > 4 1998 Y > > > > or the output could be: > > > > 5 2001 B > > > > The way I approach it is: > > Data B; > > set A; > > by ID; > > retain X; > > if first.ID then X = ranuni(4544); > > run; > > > > Data C; > > set B; > > if X < 0.20 then output; > > end; > > > > This way I would extract 20% of the IDs. My question is: is there a > > better/more efficient way to do this? > > > > Thanks. > > Here is a one pass in data step. I hope some one can come up with > "proc surveyselect". > > data t1; > do i = 1 to 10; > do j=1 to mod(i,3)+1; > output; > end; > end; > run; > > proc print data=t1; run; > > proc sql noprint; > select count (distinct i) into: tot_i > from t1; > quit; > > %put >>>&tot_i<<<; > **sample percent of by variable; > %let p=0.4; > > data sample; > retain p0 p &p tot_i &tot_i; > seed=90876; > n0=p0*&tot_i; > rate=(ranuni( seed )<p); > s+rate; > > do until( last.i); > set t1 nobs=n; > by i; > if rate then output; > end; > > *stop rule; > if s>=p0*&tot_i then stop; > *update p base upon the current one is select or not; > if rate then p=(p*tot_i-1)/(tot_i + (-1)); > else p=(p* tot_i)/(tot_i + (-1)) ; > tot_i + (-1); > keep i j p0 seed; > run; > > proc print data=sample; run; > > HTH. > >


Back to: Top of message | Previous page | Main SAS-L page