Date: Mon, 13 Jun 2005 09:05:33 -0700
Reply-To: SAS L <sasluser@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: SAS L <sasluser@YAHOO.COM>
Subject: Re: How to select Simple Random Sample?
In-Reply-To: <20050613154415.68594.qmail@web54607.mail.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1
FYI...
Sometimes you may be analyzing a very large data file and want to work with just a simple random sample of the data file. Other times you may want to draw a simple random sample with replacement from a small data file. Either way, SAS proc surveyselect is one way to do it and it is fairly straightforward. Let's use the following data set for the purpose of demonstration.
%srs macro at our Useful SAS Macros page. -->
DATA hsb25; INPUT id gender $ race ses schtype $ prog read write math science socst;DATALINES; 147 f 1 3 pub 1 47 62 53 53 61 108 m 1 2 pub 2 34 33 41 36 36 18 m 3 2 pub 3 50 33 49 44 36 153 m 1 2 pub 3 39 31 40 39 51 50 m 2 2 pub 2 50 59 42 53 61 51 f 2 1 pub 2 42 36 42 31 39 102 m 1 1 pub 1 52 41 51 53 56 57 f 1 2 pub 1 71 65 72 66 56 160 f 1 2 pub 1 55 65 55 50 61 136 m 1 2 pub 1 65 59 70 63 51 88 f 1 1 pub 1 68 60 64 69 66 177 m 1 2 pri 1 55 59 62 58 51 95 m 1 1 pub 1 73 60 71 61 71 144 m 1 1 pub 2 60 65 58 61 66 139 f 1 2 pub 1 68 59 61 55 71 135 f 1 3 pub 1 63 60 65 54 66 191 f 1 1 pri 1 47 52 43 48 61 171 m 1 2 pub 1 60 54 60 55 66 22 m 3 2 pub 3 42 39 39 56 46 47 f 2 3 pub 1 47 46 49 33 41 56 m 1 2 pub 3 55 45 46 58 51 128 m 1 1 pub 1 39 33 38 47 41 36 f 2 3 pub 2 44 49 44 35 51 53 m 2 2 pub 3 34 37 46 39 31 26 f 4 1 pub 1 60 59 62 61 51;RUN;
Random sampling without replacement
In a simple random sample without replacement each observation in the data set has an equal chance of being selected, once selected it can not be chosen again. The following code creates a simple random sample of size 10 from data set hsb25. Here the method option in proc surveyselect statement specifies the method to be SRS (simple random sampling). The rep (=replicate) option specifies the number of simple random samples you want create. The sampsize is a required option here specifying the size of the random sample. This number has to be smaller than the size of the original data set, since the sampling is done without replacement. You can also specify the seed so a precise replicate can be reproduced later using the same seed. The id statement is used to specify the variables to be included in the sample. Here we use the _all_ to include all the variables to be in the sample.
proc surveyselect data = hsb25 method = SRS rep = 1 sampsize = 10 seed = 12345 out = hsbs1; id _all_;run;proc print data = hsbs1 noobs;run; id gender race ses schtype prog read write math science socst108 m 1 2 pub 2 34 33 41 36 36153 m 1 2 pub 3 39 31 40 39 51 51 f 2 1 pub 2 42 36 42 31 39 95 m 1 1 pub 1 73 60 71 61 71139 f 1 2 pub 1 68 59 61 55 71135 f 1 3 pub 1 63 60 65 54 66191 f 1 1 pri 1 47 52 43 48 61 22 m 3 2 pub 3 42 39 39 56 46 47 f 2 3 pub 1 47 46 49 33 41 53 m 2 2 pub 3 34 37 46 39 31
Random sampling with replacementIn a random sample with replacement, each observation in the data set has an equal chance to be selected and can be selected over and over again. The following code creates a random sample with replacement of size 10. We can see from the output that observation with id= 22 has been selected three times because that we now allow replacement in the sampling. The method = urs (unrestricted random sampling) is used here to allow the replacement. We will only include variables id, read, write, math, science and socst in the sample data set.
proc surveyselect data=hsb25 method = urs sampsize = 10 rep=1 seed=12345 out=hsbs2; id id read write math science socst;run;proc print data = hsbs2 noobs;run; Number id read write math science socst Hits 22 42 39 39 56 46 1 47 47 46 49 33 41 1 51 42 36 42 31 39 1 57 71 65 72 66 56 1139 68 59 61 55 71 1144 60 65 58 61 66 3147 47 62 53 53 61 1153 39 31 40 39 51 1
The data set hsbs2 has only 6 observations, because observation with id = 22 should be counted three times. Here is a sample code to create a data set with 10 observations based on hsbs2.
data hsbs2f; set hsbs2; do i = 1 to numberhits; output; end; drop i;run;proc print data = hsbs2f noobs;run; Number id read write math science socst Hits 22 42 39 39 56 46 1 47 47 46 49 33 41 1 51 42 36 42 31 39 1 57 71 65 72 66 56 1139 68 59 61 55 71 1144 60 65 58 61 66 3144 60 65 58 61 66 3144 60 65 58 61 66 3147 47 62 53 53 61 1153 39 31 40 39 51 1
Jeff Morison <jmt_mtf@YAHOO.COM> wrote:Hi:
I have a dataset containing the following fields and
data, I need to select a pre-specified size of simple
random sample of HRNs from each LOCACTION.
Does PROC SURVEYSELECT does this?, any help with the
code would be appreciated.
TIA,
Jeff
LOCATION HRN
001 234
001 123
001 345
... ...
002 123
002 134
002 145
002 146
... ...
003 124
003 156
003 156
... ...
__________________________________
Discover Yahoo!
Have fun online with music videos, cool games, IM and more. Check it out!
http://discover.yahoo.com/online.html
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|