Date: Thu, 15 Jul 2010 11:44:13 +0000
Reply-To: goladin@gmail.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "goladin@gmail.com" <goladin@GMAIL.COM>
Subject: Re: Step-Wise Methods re-evaluated
Content-Type: text/plain; charset="utf-8"
Hi,
Alternative model variables selection approach such as genetic algorithm.
Regards,
Murphy Choy
Sent from my Nokia phone
-----Original Message-----
From: Peter Flom
Sent: 15/07/2010 7:13:42 PM
Subject: Re: Step-Wise Methods re-evaluated
David J Moriarty wrote
<<<
My opinion is that of a biologist, not a statistician, but I think
there is a role for stepwise methods. If all you are interested in is
the best possible prediction, then step-wise doesn't seem
advantageous. Use all available predictors; generally, the more
preditors the better.
But if you have a large number of potential predictors, and you're
trying to identify a relevant subset of those predictors, then I
think step-wise methods can be part of the effort. We need to realize
the substantial problems with the methods, as pointed out by our
statistical colleagues, and judge the results accordingly. But
step-wise methods may illuminate patterns in a large data set that
might be important. Why were certain predictors included? Why were
others excluded? Is it something relevant,or just a fault in the
method? Step-wise might get us to ask some questions that could be
important.
I recommend step-wise methods only in a heuristic sense - they help
elucidate patterns and ask questions. But they should only be used a
small part of a large, comprehensive analysis of the data. If a
student brings me a thesis where all that's been done with the data
is some step-wise method, and all biological conclusions come from
that method - well, that's unacceptable. I would tell them they have
just barely got started in terms of understanding their data.
>>>>
This makes some sense. Anything that gets you to "ask questions that could
be important" is a good thing.
I'd suggest, however, that there are now better ways to get these questions
raised;
use PROC GLMSELECT and vary the parameters. See what happens. Another
alternative
(although, I believe, it isn't in SAS STAT, only in some add on) is to use
tree methods
as exploratory tools. These can really open up the field of questions and
illuminate patterns that
are difficult or impossible to find with traditional regression, regardless
of variable selection method.
Peter
|