Paige Miller wrote:
> On 10/23/2006 6:00 PM, Vadim Pliner wrote:
> > Michael,
> > risking to be slaughtered on SAS-L, let me give you a scenario where I
> > think stepwise regression could be used.
> > 1. You are trying to predict something.
> > 2. You have a lot of independent variables and the selection of
> > variables presents a problem for you.
> Partial Least Squares is a better solution
a. What do you mean by "better"? Here is my definition of "better" in
this context: if my objective is purely prediction and method X gives
me closer fit to the actual values on validation data than method Y,
then method X is better for me.
b. I was not talking specifically about linear stepwise regression.
AFAIK, Partial Least Squares is not applicable to the case when
dependent variable is binary. I know there are alternatives to stepwise
logistic regression for selecting variables that you might consider
"better" as well, but see a. above.
> > 3. You have enough data points to split your data into a large enough
> > training data set (where you build the model) and a large enough test
> > or validation data set where you can select the best model.
> Partial Least Squares still is a better solution
See a. above again.
> > 4. You build a number of competing models, one of which is created with
> > stepwise regression.
> This does nothing to eliminate the drawbacks of stepwise. Lots of data
> does not eliminate the drawbacks of stepwise. Having a large tes tdata
> set does not eliminate the drawbacks of stepwise. Creating additional
> models does not eliminate the drawbacks of stepwise.
I agree, but what lots of data does is an opportunity to test which of
the competing methods predicts best in practice on your specific data
rather than in theory.
> > 5. If on the set aside test data set stepwise regression gives you the
> > best predictions, select this model.
> So, you are saying that there are cases where, simply by random
> chance, stepwise gives you better predictions, then this is a reason
> to continue to use stepwise.
This is not exactly what I was saying. Yes, a couple of times in my
experience stepwise logistic regression outperformed competing methods
(3 or 4 neural networks and a decision tree). I doubt it was "by random
chance", because the sample sizes were too big to believe in chance. I
didn't say this was a reason to continue to use stepwise, I just gave a
scenario where you could justify the use of stepwise regression, and
this was, as far as I remember, the OP's question.
> > Do I think it's realistic to expect stepwise regression can produce the
> > best model? Yes, it can. Would you prefer a model that is theoretically
> > sound or the one that gives you better predictions? I'd prefer the
> > latter if prediction were my sole goal.
> But you haven't shown that stepwise is a theoretically good way to get
> better predictions (it is not), or that it is even a method that will
> give you better predictions in a reasonable percentage of the cases.
> Frank and Friedman (Technometrics, 1992 I think) showed that in the
> situations they studied OLS based methods (including stepwise) are the
> worst thing to use when you have many variables -- worst meaning that
> the MSE of the predictions, and the MSE of the coefficients are very
> very large compared to the much smaller MSEs associated with Principal
> Components Regression, Ridge Regression and oh yes Partial Least
> Squares Regression.
I was NOT saying that "stepwise is a theoretically good way to get
better predictions." On the contrary, I said that if you had two
methods, say, X and Y, and Y is theoretically better (this is not
stepwise, I admit) but X gives better predictions on validation data, I
would select method X.