Date: Fri, 4 Jun 2004 15:29:04 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: stepwise linear regression modeling
Content-type: text/plain; charset=US-ASCII
Paul Thompson <paul@WUBIOS.WUSTL.EDU> sagely replied to my post:
> [me]> Your best option is: do not do stepwise regression.
100 % agree
See? I told you he was sagacious. :-)
> [me]> Seriously. I have written pages on this issue in SAS-L before.
> [me]> (You can bore yourself to tears by looking up my rants in the
> [me]> SAS-L archives at
http://www.listserv.uga.edu/archives/sas-l.html
> [me]> if you want to. Just search for the keyword 'stepwise'.)
> [me]> Particularly when you are working with interaction terms, which
> [me]> by definition will be highly correlated with other variables
> [me]> in your model, stepwise regression can do bad things. You have
> [me]> no guarantee that you will get the right term, instead of some
> [me]> higher-order term which happens to be correlated.
>
> There is nothing wrong with selecting terms for a final model. You
> should do it yourself, however. Fit a model. Decide which terms
should
> stay in. Sometimes, I retain non-sig terms - which should remain in
for
> SUBSTANTIVE reasons (maybe age-adjustment is sensible).
Agreed. Absolutely. Scientific systemata are more important than
arbitrary statistical cutoffs. Build a model based on sound scientific
knowledge and hypothesis, then *test* that model. Don't throw every
known
variable (and all their interactions) into a big hopper labeled "DANGER:
STEPWISE" and assume that the results are 'right'.
If you can't avoid working with vast numbers of inter-correlated
variables,
and you only want results like some sort of working predictive formula,
then stepwise regression is totally wrong for you. Look at PROC PLS and
similar methodlogies instead.
Hey, I agree with Paul. Does that make me sagacious too?
David, the Circular Reasoner
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician