LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 26 Oct 2006 11:33:01 -0700
Reply-To:   Vadim Pliner <Vadim.Pliner@VERIZONWIRELESS.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Vadim Pliner <Vadim.Pliner@VERIZONWIRELESS.COM>
Organization:   http://groups.google.com
Subject:   Re: Stepwise Regression
Comments:   To: sas-l@uga.edu
In-Reply-To:   <uHS%g.9638$484.8560@twister.nyroc.rr.com>
Content-Type:   text/plain; charset="iso-8859-1"

. Paige Miller wrote: > On 10/23/2006 6:00 PM, Vadim Pliner wrote: > > Michael, > > > > risking to be slaughtered on SAS-L, let me give you a scenario where I > > think stepwise regression could be used. > > 1. You are trying to predict something. > > 2. You have a lot of independent variables and the selection of > > variables presents a problem for you.

> Partial Least Squares is a better solution

a. What do you mean by "better"? Here is my definition of "better" in this context: if my objective is purely prediction and method X gives me closer fit to the actual values on validation data than method Y, then method X is better for me. b. I was not talking specifically about linear stepwise regression. AFAIK, Partial Least Squares is not applicable to the case when dependent variable is binary. I know there are alternatives to stepwise logistic regression for selecting variables that you might consider "better" as well, but see a. above.

> > 3. You have enough data points to split your data into a large enough > > training data set (where you build the model) and a large enough test > > or validation data set where you can select the best model.

> Partial Least Squares still is a better solution

See a. above again.

> > 4. You build a number of competing models, one of which is created with > > stepwise regression. > > This does nothing to eliminate the drawbacks of stepwise. Lots of data > does not eliminate the drawbacks of stepwise. Having a large tes tdata > set does not eliminate the drawbacks of stepwise. Creating additional > models does not eliminate the drawbacks of stepwise.

I agree, but what lots of data does is an opportunity to test which of the competing methods predicts best in practice on your specific data rather than in theory.

> > 5. If on the set aside test data set stepwise regression gives you the > > best predictions, select this model.

> So, you are saying that there are cases where, simply by random > chance, stepwise gives you better predictions, then this is a reason > to continue to use stepwise.

This is not exactly what I was saying. Yes, a couple of times in my experience stepwise logistic regression outperformed competing methods (3 or 4 neural networks and a decision tree). I doubt it was "by random chance", because the sample sizes were too big to believe in chance. I didn't say this was a reason to continue to use stepwise, I just gave a scenario where you could justify the use of stepwise regression, and this was, as far as I remember, the OP's question.

> > > > Do I think it's realistic to expect stepwise regression can produce the > > best model? Yes, it can. Would you prefer a model that is theoretically > > sound or the one that gives you better predictions? I'd prefer the > > latter if prediction were my sole goal.

> But you haven't shown that stepwise is a theoretically good way to get > better predictions (it is not), or that it is even a method that will > give you better predictions in a reasonable percentage of the cases. > > Frank and Friedman (Technometrics, 1992 I think) showed that in the > situations they studied OLS based methods (including stepwise) are the > worst thing to use when you have many variables -- worst meaning that > the MSE of the predictions, and the MSE of the coefficients are very > very large compared to the much smaller MSEs associated with Principal > Components Regression, Ridge Regression and oh yes Partial Least > Squares Regression.

I was NOT saying that "stepwise is a theoretically good way to get better predictions." On the contrary, I said that if you had two methods, say, X and Y, and Y is theoretically better (this is not stepwise, I admit) but X gives better predictions on validation data, I would select method X.

Vadim Pliner


Back to: Top of message | Previous page | Main SAS-L page