LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2002, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 3 Jul 2002 12:55:24 -0400
Reply-To:     mark.k.moran@CENSUS.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mark Moran <mark.k.moran@CENSUS.GOV>
Subject:      Re: Proc to Cross-Validate Proc Logistic (Proc or macro)
Comments: To: Dale McLerran <stringplayer_2@YAHOO.COM>
Content-type: text/plain; charset=us-ascii

Obviously I am relatively untutored on crossvalidation. My colleague has a PhD and his approach is to divide the data into 10 subsamples, fit a model using I suppose 9 of the 10 subsamples to predict the 10th, change to a different 9 subsamples to predict the 10th, etc. This would require only 10 models. If there are 500,000 records, then by the leave-one-out method of crossvalidation how long will it take to create 500,000 models in PROC LOGISTIC by this method? Wouldn't this quickly become cumbersome with so many observations? I think he is predicting from 5 or 6 predictors.

Mark Moran

-----Original Message------:

> Mark,

> The leave-out-one approach to crossvalidation is a classic crossvalidation strategy dating from the 1960's. Mosteller and Wallace (1960, JASA 58, 275-309) first suggested the leave-out-one approach. Mosteller and Tukey (1968, Handbook of Social Psychology, G Lindzey and E Aronson, eds. Addison-Wesley) and Lachenbruch and Mickey (1968, Technometrics, 10, 1-11) were the first real applications of the leave-out-one approach. In the leave-out-one approach, each observation from 1 to N is dropped from the fitting model and is used as the validation sample. So you have N models and corresponding validation samples. In linear models, the regression coefficients can be quickly updated when you employ a leave-out-one strategy. For nonlinear models such as in logistic regression, the parameter estimates can only be approximated with a fast algorithm. One would really need to iterate on the original


Back to: Top of message | Previous page | Main SAS-L page