Date: Wed, 3 Jul 2002 12:55:24 -0400
Reply-To: mark.k.moran@CENSUS.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mark Moran <mark.k.moran@CENSUS.GOV>
Subject: Re: Proc to Cross-Validate Proc Logistic (Proc or macro)
Content-type: text/plain; charset=us-ascii
Obviously I am relatively untutored on crossvalidation. My colleague has a
PhD and his approach is to divide the data into 10 subsamples, fit a model
using I suppose 9 of the 10 subsamples to predict the 10th, change to a
different 9 subsamples to predict the 10th, etc. This would require only
10 models. If there are 500,000 records, then by the leave-one-out method
of crossvalidation how long will it take to create 500,000 models in PROC
LOGISTIC by this method? Wouldn't this quickly become cumbersome with so
many observations? I think he is predicting from 5 or 6 predictors.
Mark Moran
-----Original Message------:
> Mark,
> The leave-out-one approach to crossvalidation is a classic
crossvalidation strategy dating from the 1960's. Mosteller and
Wallace (1960, JASA 58, 275-309) first suggested the leave-out-one
approach. Mosteller and Tukey (1968, Handbook of Social Psychology,
G Lindzey and E Aronson, eds. Addison-Wesley) and Lachenbruch and
Mickey (1968, Technometrics, 10, 1-11) were the first real
applications of the leave-out-one approach. In the leave-out-one
approach, each observation from 1 to N is dropped from the fitting
model and is used as the validation sample. So you have N models
and corresponding validation samples. In linear models, the
regression coefficients can be quickly updated when you employ a
leave-out-one strategy. For nonlinear models such as in logistic
regression, the parameter estimates can only be approximated with a
fast algorithm. One would really need to iterate on the original
|