LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 28 Oct 2005 14:10:45 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Running Proc Reg using BEST option
In-Reply-To:  <200510281310.j9SAk0XQ002617@malibu.cc.uga.edu>
Content-Type: text/plain; format=flowed

excel_hari@YAHOO.COM wrote: >I used the data set >"http://www.ats.ucla.edu/stat/sas/webbooks/reg/hsb2.sas7bdat" > >I copied following code from UCLA SAS site. > >data selection; > set hsb2; > math2 = math*math; > mathf = math*female; > mathsch = math*schtyp; > mathsci = math*science; > sciencef = science*female; > progsch = prog*schtyp; >run; > >proc reg data=selection; > model write = math socst female schtyp prog science math2 mathf >mathsch mathsci > sciencef progsch / selection=rsquare cp best=6 start=2 >stop=12; >run; > >By running the above I get output of 61 models for number of dependent >variables varying from 2 to 6. > >But If I modify the above model to below listed one then:- > >proc reg data=selection; > model write = math socst female schtyp prog science math2 mathf >mathsch mathsci > sciencef progsch / selection=cp rsquare best=6 start=2 >stop=12; >run; > >The total number of models I get is ONLY 6 (against 61 in previous >model) in increasing order of Cp values. Since I havent changed the >"best" value, why should the total number of models change. > >Also, in case of 2nd model the models were listed in increasing order >of Cp while in 1st decreasing order of Rsquare? > >I also tried the following:- > >proc reg data=selection; > model write = math socst female schtyp prog science math2 mathf >mathsch mathsci > sciencef progsch / selection=ADJRSQ cp best=6 start=2 >stop=12; >run; > >but even in this case am getting only 6 Models arranged in decreasing >order of Adjusted Rsquare? > >Please guide me.

Okay, my first guide tip is:

STOP DOING SELECTION METHODS IN PROC REG!!!!

Really. This is a bad thing. The fact that you can do it doesn't make it right. You can physically stick your tongue into a wall socket, but you would NEVER do that, right? Well, not everything that is available in SAS is a good thing.

Stepwise selection and all of its kin (SELECTION=RSQUARE, etc.) are problems, and you shouldn't use them unless you understand what goes wrong with them, and all the things you have to do to deal with those problems. Just look at the list of things I wrote in my previous post to you on this topic.

Next, you have to understand what the selection methods are actually doing. Without that understanding, you are just dumping bat's wing and eye of newt into a cauldron and stirring until something happens. You cannot treat these as magical cauldrons which eventually produce a perfect potion.

The selection methods SELECTION=CP and SELECTION=ADJRSQ go through a forward selection process and come up with the 6 best models of the ones they have checked. That does NOT make them the 6 best possible models of all the 4096-12-1 = 4083 feasible models in your range of START=2 to STOP=12.

The SELECTION=RSQUARE process does something different. For each number p of regressors (not counting the intercept), it computes the R-squared for all 12Cp = 12!/[p!(12-p)!] possible combinations, and coughs up the 6 best for each level of p. So you get 6 models for p=2, 6 models for p=3, and so on. BUT for p=12, there is only one model to check, since this uses all 12 of your regressors. So you have 6 models for p=2 to 11, and 1 model for p=12. That's 61 models.

Now here's the fun part. NONE of these models may be any good. There is no guarantee that any of these models is 'right' in any sense, or even is halfway decent. Start checking the model assumptions for every single one of the models evaluated in every step of every selection process. Without verifying that the models are doing the right thing, there's no way of telling how misled you really are. Oh, and you have 4083 different models to check for regression diagnostics and residual plots and such. At a minimum, you have to check every one of the models that the processes coughed up and reported to you.

Just be forewarned. To continue the 'cauldron' metaphor: Rather than getting a golden potion of 'felix felicis', you may have ended up with a nasty wad of tarry goo. :-)

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Back to: Top of message | Previous page | Main SAS-L page