Date: Thu, 15 Jul 2010 07:13:42 -0400
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject: Re: Step-Wise Methods re-evaluated
Content-Type: text/plain; charset="us-ascii"
David J Moriarty wrote
My opinion is that of a biologist, not a statistician, but I think
there is a role for stepwise methods. If all you are interested in is
the best possible prediction, then step-wise doesn't seem
advantageous. Use all available predictors; generally, the more
preditors the better.
But if you have a large number of potential predictors, and you're
trying to identify a relevant subset of those predictors, then I
think step-wise methods can be part of the effort. We need to realize
the substantial problems with the methods, as pointed out by our
statistical colleagues, and judge the results accordingly. But
step-wise methods may illuminate patterns in a large data set that
might be important. Why were certain predictors included? Why were
others excluded? Is it something relevant,or just a fault in the
method? Step-wise might get us to ask some questions that could be
I recommend step-wise methods only in a heuristic sense - they help
elucidate patterns and ask questions. But they should only be used a
small part of a large, comprehensive analysis of the data. If a
student brings me a thesis where all that's been done with the data
is some step-wise method, and all biological conclusions come from
that method - well, that's unacceptable. I would tell them they have
just barely got started in terms of understanding their data.
This makes some sense. Anything that gets you to "ask questions that could
be important" is a good thing.
I'd suggest, however, that there are now better ways to get these questions
use PROC GLMSELECT and vary the parameters. See what happens. Another
(although, I believe, it isn't in SAS STAT, only in some add on) is to use
as exploratory tools. These can really open up the field of questions and
illuminate patterns that
are difficult or impossible to find with traditional regression, regardless
of variable selection method.