LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2009, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 22 Sep 2009 07:16:07 -0700
Reply-To:   Daniel <daniel.biostatistics@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Daniel <daniel.biostatistics@GMAIL.COM>
Organization:   http://groups.google.com
Subject:   Bootstrap for shrinkage and optimism
Comments:   To: sas-l@uga.edu
Content-Type:   text/plain; charset=ISO-8859-1

Good morning All,

I am developing a predictive model (outcome binary) following the methodology outlined in "Clinical prediction models" by Steyerberg, or that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models: Issues in developing models, evaluating assumtions and adequacy, and measuring and reducing errors). I am using bootstrap to obtain measures of shrinkage and optimism to correct my regression coefficients and goodness of fit (GOF) measures (respectively) for overfitting. The steps include:

1. Obtain X bootstrap samples with replacement, of the same size as the original data 2. Use each sample to model the outcome using, in our case, a fixed set of covariates. Get GOF measures of interest 3. Score the original data with the model obtained in 2. Obtain GOF measures of interest on the model applied to the original data ... some additional steps irrelevant to my question

I've used David Cassell's advice to program, in very few lines, steps 1 and 2, by building a dataset with my X bootstrap samples with replacement, and then running PROC LOGISTIC with the "BY REPLICATE" statement.

To score the original data using each of my X models, I used the OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a second PROC LOGISTIC, this time with the INEST= option. But for this to work the way I want, I need to use a "BY REPLICATE" statement and this means that I need to have to create a dataset with my original data repeated X times, each time with a new value of REPLICATE. This allows me to avoid the do loop. The negative aspect (though it might be mitigated by the efficiency of using the BY statement) is that I need to create this dataset and depending on the value of X, it can get quite large. Can you think of other ways this could be done as efficiently as steps 1 and 2 (perhaps from your own experiences)?

Thank you.

Daniel


Back to: Top of message | Previous page | Main SAS-L page