LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 4 Jun 2004 15:29:04 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: stepwise linear regression modeling
Content-type: text/plain; charset=US-ASCII

Paul Thompson <paul@WUBIOS.WUSTL.EDU> sagely replied to my post: > [me]> Your best option is: do not do stepwise regression.

100 % agree

See? I told you he was sagacious. :-)

> [me]> Seriously. I have written pages on this issue in SAS-L before. > [me]> (You can bore yourself to tears by looking up my rants in the > [me]> SAS-L archives at http://www.listserv.uga.edu/archives/sas-l.html > [me]> if you want to. Just search for the keyword 'stepwise'.) > [me]> Particularly when you are working with interaction terms, which > [me]> by definition will be highly correlated with other variables > [me]> in your model, stepwise regression can do bad things. You have > [me]> no guarantee that you will get the right term, instead of some > [me]> higher-order term which happens to be correlated. > > There is nothing wrong with selecting terms for a final model. You > should do it yourself, however. Fit a model. Decide which terms should > stay in. Sometimes, I retain non-sig terms - which should remain in for > SUBSTANTIVE reasons (maybe age-adjustment is sensible).

Agreed. Absolutely. Scientific systemata are more important than arbitrary statistical cutoffs. Build a model based on sound scientific knowledge and hypothesis, then *test* that model. Don't throw every known variable (and all their interactions) into a big hopper labeled "DANGER: STEPWISE" and assume that the results are 'right'.

If you can't avoid working with vast numbers of inter-correlated variables, and you only want results like some sort of working predictive formula, then stepwise regression is totally wrong for you. Look at PROC PLS and similar methodlogies instead.

Hey, I agree with Paul. Does that make me sagacious too? David, the Circular Reasoner -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page