Date: Fri, 28 Oct 2005 14:10:45 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Running Proc Reg using BEST option
In-Reply-To: <200510281310.j9SAk0XQ002617@malibu.cc.uga.edu>
Content-Type: text/plain; format=flowed
excel_hari@YAHOO.COM wrote:
>I used the data set
>"http://www.ats.ucla.edu/stat/sas/webbooks/reg/hsb2.sas7bdat"
>
>I copied following code from UCLA SAS site.
>
>data selection;
> set hsb2;
> math2 = math*math;
> mathf = math*female;
> mathsch = math*schtyp;
> mathsci = math*science;
> sciencef = science*female;
> progsch = prog*schtyp;
>run;
>
>proc reg data=selection;
> model write = math socst female schtyp prog science math2 mathf
>mathsch mathsci
> sciencef progsch / selection=rsquare cp best=6 start=2
>stop=12;
>run;
>
>By running the above I get output of 61 models for number of dependent
>variables varying from 2 to 6.
>
>But If I modify the above model to below listed one then:-
>
>proc reg data=selection;
> model write = math socst female schtyp prog science math2 mathf
>mathsch mathsci
> sciencef progsch / selection=cp rsquare best=6 start=2
>stop=12;
>run;
>
>The total number of models I get is ONLY 6 (against 61 in previous
>model) in increasing order of Cp values. Since I havent changed the
>"best" value, why should the total number of models change.
>
>Also, in case of 2nd model the models were listed in increasing order
>of Cp while in 1st decreasing order of Rsquare?
>
>I also tried the following:-
>
>proc reg data=selection;
> model write = math socst female schtyp prog science math2 mathf
>mathsch mathsci
> sciencef progsch / selection=ADJRSQ cp best=6 start=2
>stop=12;
>run;
>
>but even in this case am getting only 6 Models arranged in decreasing
>order of Adjusted Rsquare?
>
>Please guide me.
Okay, my first guide tip is:
STOP DOING SELECTION METHODS IN PROC REG!!!!
Really. This is a bad thing. The fact that you can do it doesn't make it
right.
You can physically stick your tongue into a wall socket, but you would NEVER
do that, right? Well, not everything that is available in SAS is a good
thing.
Stepwise selection and all of its kin (SELECTION=RSQUARE, etc.) are
problems, and you shouldn't use them unless you understand what goes
wrong with them, and all the things you have to do to deal with those
problems. Just look at the list of things I wrote in my previous post to
you on
this topic.
Next, you have to understand what the selection methods are actually doing.
Without that understanding, you are just dumping bat's wing and eye of newt
into a cauldron and stirring until something happens. You cannot treat
these
as magical cauldrons which eventually produce a perfect potion.
The selection methods SELECTION=CP and SELECTION=ADJRSQ go through
a forward selection process and come up with the 6 best models of the ones
they have checked. That does NOT make them the 6 best possible models
of all the 4096-12-1 = 4083 feasible models in your range of START=2 to
STOP=12.
The SELECTION=RSQUARE process does something different. For each
number p of regressors (not counting the intercept), it computes the
R-squared
for all 12Cp = 12!/[p!(12-p)!] possible combinations, and coughs up the 6
best
for each level of p. So you get 6 models for p=2, 6 models for p=3, and so
on. BUT for p=12, there is only one model to check, since this uses all 12
of your regressors. So you have 6 models for p=2 to 11, and 1 model for
p=12. That's 61 models.
Now here's the fun part. NONE of these models may be any good. There
is no guarantee that any of these models is 'right' in any sense, or even is
halfway decent. Start checking the model assumptions for every single one
of
the models evaluated in every step of every selection process. Without
verifying that the models are doing the right thing, there's no way of
telling
how misled you really are. Oh, and you have 4083 different models to check
for regression diagnostics and residual plots and such. At a minimum, you
have
to check every one of the models that the processes coughed up and reported
to you.
Just be forewarned. To continue the 'cauldron' metaphor:
Rather than getting a golden potion of 'felix felicis', you may have ended
up
with a nasty wad of tarry goo. :-)
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/