Date: Mon, 20 Oct 2003 11:12:51 -0700
Reply-To: Will Dwinnell <predictr@BELLATLANTIC.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Will Dwinnell <predictr@BELLATLANTIC.NET>
Organization: http://groups.google.com
Subject: Stratified sampling with 'proc surveyselect'
Content-Type: text/plain; charset=ISO-8859-1
Problem: Select a sample from a data set, so that distributions of
particular variables are "similar" to the original distribution.
Solution (?): Use 'proc surveyselect' to generate a new data set,
thus:
data LMN;
input State $ Sex $ X Y @@;
datalines;
PA M 2 7
PA M 7 4
PA M 1 7
PA M 5 10
some number of data lines here...
NY F 9 1
NY F 6 1
NY F 1 3
NY F 1 2
NY F 7 6
;
run;
proc sort data=LMN;
by
State
Sex;
run;
proc surveyselect data=LMN method=sys seed=8810 rate=0.1 out=XYZ;
strata
State
Sex;
run;
This appears to work, but with a rate of 0.1, and 100 total original
data lines (in data set LMN), surveyselect occasioanlly delivers 9 or
11 observations, not 10. Is this appropriate? Am I doing this
correctly?
Many thanks in anticipation...