| Date: | Fri, 16 Nov 2001 23:23:57 +0000 |
| Reply-To: | Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA> |
| Organization: | York University |
| Subject: | Re: shorthand in sas reg model? |
|---|
In article <OFBB9DEA8A.B145D434-ON88256B06.00711ACE@rtp.epa.gov>
Cassell.David@EPAMAIL.EPA.GOV (David L. Cassell) writes:
|kataliu <richardliu@NORTHWESTERN.EDU> wrote:
|> I am new to sas.
|
|I wonder if you are also new to inferential statistics, based on:
|
|> When I use the regression model in sas, I find that
|> I have lots of independent variables. Therefore, I
|> have to type each one in sas codes.
|>
|> For example,
|>
|> PROC REG DATA=...;
|> MODEL TARGET = ACOL1 MTGAT DKL ...../SELECTION = STEPWISE;
|> ^^^^^^^^^^^^^^^^^^^^^
|> over 200 variables with different name!!
|> RUN;
|
|This is a problem waiting to happen. That many variables and a
|stepwise selection procedure will help you to fit.. well.. most likely
|a lot of garbage. Measurement error, collinearity, non-interpretability
|of 'relative importance', and a host of other issues will plague you.
|With 200 variables, you could have 200 sources of random noise and
|you'll probably get this to fit a swell model for you [where 'swell'
|is unspecified]. Such a thing has happened - and been printed for all
|to see in the literature - more times than you want to know.
Below is a simple, but effective teaching example I have used for a
while to demonstrate the perils of blind stepwise selection ---
generate 100 N(0,1) predictors, and an independent N(0,1) y.
Toss them into stepwise selection, and -- hey-- you can get an
R^2 of .25 or maybe greater. But, generate two similar samples,
and use the model selected by each to cross-validate the other--
whoa-- the R^2 drops to non-signifcance.
The code below depends on how the seed for the normal() function
is used on your machine. I ran the first reg step once, then use
the variables selected in stepwise for the last step. Another
useful variation is to add 100 random N(0,1) X1-X100 predictors to a
real model. Students are amazed at how often the X variables turn
up among the ``real predictors''
----- stepsim.sas ----
title 'Stepwise simulation example - NO real predictors';
* Generate two sets of data: 100 random predictors, 200 observations;
data sim;
array x{100} x1-x100;
do testset= 1 to 2;
do n=1 to 200;
*-- generate the predictors-- all independent, just noise;
do i=1 to 100;
x(i) = normal(6752343);
end;
*-- generate the criterion-- no relation to any of the Xs;
y = normal(7654321);
output;
end;
end;
proc reg;
by testset;
model y = x1-x100 / selection=forward slentry=.05;
run;
/* Now see how well each prediction equation does in the other
data set.
- Each model should do well on the model for which it was
selected, but poorly on the other set of data
*/
title2 'Testing cross-validation';
proc reg data=sim;
by testset;
M1: model y = x13 x75 x5 x25 x82 x10 x38 x87 x94 x93 x29 x97;
M2: model y = x78 x14 x30 x25 x9 x4;
--
Michael Friendly Email: friendly@yorku.ca (NeXTmail OK)
Psychology Dept
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
|