| Date: | Wed, 8 Oct 2008 07:56:39 -0400 |
| Reply-To: | Nathaniel.Wooding@DOM.COM |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Nat Wooding <Nathaniel.Wooding@DOM.COM> |
| Subject: | Re: stepwise |
| In-Reply-To: | <14217379.1223413462667.JavaMail.root@mswamui-thinleaf.atl.sa.earthlink.net> |
| Content-Type: | text/plain; charset="US-ASCII" |
|---|
In addition to the source that Peter suggested, let me also suggest the
paper that he and David Cassell wrote last year. It appeared in several
places but you can see it at
http://www.nesug.org/Proceedings/nesug07/sa/sa07.pdf
In the paper, they demonstrated that stepwise will find solutions to random
collections of numbers.
Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977
Peter Flom
<peterflomconsult
ing@MINDSPRING.CO To
M> SAS-L@LISTSERV.UGA.EDU
Sent by: "SAS(r) cc
Discussion"
<SAS-L@LISTSERV.U Subject
GA.EDU> Re: stepwise
10/07/2008 05:04
PM
Please respond to
Peter Flom
<peterflomconsult
ing@mindspring.co
m>
nchapinal@YAHOO.COM wrote
>
>I am using proc stepwise to know which are the best predictors to
>distinguish between healthy and sick people.
>I know more or less how to do it. However, someone told me you can add
>an option that tells you how accurate is each particular predictor
>that you keep in the model in classifying people in the right
>category.
>
>Any help is welcome!
I see that others have already responded, saying I don't recommend this.
They are right. I don't.
The best source on why this is bad is Frank Harrell's book on Regression
Modeling Strategies
First there is, AFAIK, no such PROC as PROC STEPWISE -- it's not in SAS
help, so it is hard to know just what you are doing.
Stepwise does NOT allow you to know which are the best predictors.
I can probably recommend something better, but, let me ask some questions:
1) What is your sample size?
2) Where did the sample come from? (a survey? an experiment? or what?)
3) How many independent variables (IVs) have you got?
4) What is your dependent variable? Is it dichotomous (sick vs. not)? Or a
time to event (how long before you got sick)? Or something else?
5) Is the purpose of your study explanation, or prediction, or both?
6) Why did you choose the IVs you chose?
7) What does the literature say about these?
etc.
HTH
Peter
Peter L. Flom, PhD
Statistical Consultant
www DOT peterflom DOT com
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.
|