Date: Wed, 3 Jul 2002 09:24:12 0700
ReplyTo: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Proc to CrossValidate Proc Logistic (Proc or macro)
InReplyTo: <OF629DA2F6.7263CABEON85256BEB.004C67B6@tco.census.gov>
ContentType: text/plain; charset=usascii
Mark,
The leaveoutone approach to crossvalidation is a classic
crossvalidation strategy dating from the 1960's. Mosteller and
Wallace (1960, JASA 58, 275309) first suggested the leaveoutone
approach. Mosteller and Tukey (1968, Handbook of Social Psychology,
G Lindzey and E Aronson, eds. AddisonWesley) and Lachenbruch and
Mickey (1968, Technometrics, 10, 111) were the first real
applications of the leaveoutone approach. In the leaveoutone
approach, each observation from 1 to N is dropped from the fitting
model and is used as the validation sample. So you have N models
and corresponding validation samples. In linear models, the
regression coefficients can be quickly updated when you employ a
leaveoutone strategy. For nonlinear models such as in logistic
regression, the parameter estimates can only be approximated with a
fast algorithm. One would really need to iterate on the original
data rather than employing the "hat" matrix if you wished to obtain
the maximum likelihood estimates of the parameters when the ith
observation is dropped. But iterating on the original data would be
extremely time consuming, without much benefit in terms of precision
of the parameter estimates in most cases. Therefore, the standard
implementation of crossvalidation is now the leaveoutone approach
with approximation of the parameter estimates. Any other
crossvalidation approach would need explicit operationalization.
This as much as anything is why the leaveoutone approach is a
standard implementation. The leaveoutone approach is already
operationally well defined, regardless of the particular sample
that you are working with.
Dale
 Mark Moran <mark.k.moran@CENSUS.GOV> wrote:
> Dale, my friend's dataset contains hundreds of thousands of records.
> Are
> you saying that from all these records this will drop "one"
> observation
> from it and fit the regression? Isn't there a much more general way
> to
> create a crossvalidation for any regression routine, macro's already
> written for such a purpose?
>
> Mark
>
>

>
> Mark,
>
> The classification table generated with the CTABLE option to the
> MODEL statement of PROC LOGISTIC is generated through a leaveoutone
> approximate crossvalidation. I say approximate crossvalidation
> since the parameter estimates when the ith observation is dropped
> are not the full maximum likelihood estimates, but a onestep
> approximation to the parameter estimates. I am sure that a macro
> could be easily written to handle the crossvalidation when more
> than one observation is dropped at any one time.
>
> Dale
>
>
>  Mark Moran <mark.k.moran@CENSUS.GOV> wrote:
> > My colleague is running a PROC LOGISTIC in SAS 8.2 with all
> > categorical variables (dummy variables and interactions, he says).
> He
> > wants to be able to crossvalidate his model, splitting the data
> into
> > 10 pieces, reserving 1, predicting, moving on to another split,
> etc. Is
> there an
> > existing way to accomplish this in SAS (new SAS 8 Proc would be the
> first
> choice,
> > or second a macro)?
> >
> > Mark Moran
=====

Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 6672926
Fax: (206) 6675977

__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial  First Month Free
http://sbc.yahoo.com
