|
Good morning All,
I am developing a predictive model (outcome binary) following the
methodology outlined in "Clinical prediction models" by Steyerberg, or
that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic models:
Issues in developing models, evaluating assumtions and adequacy, and
measuring and reducing errors). I am using bootstrap to obtain
measures of shrinkage and optimism to correct my regression
coefficients and goodness of fit (GOF) measures (respectively) for
overfitting. The steps include:
1. Obtain X bootstrap samples with replacement, of the same size as
the original data
2. Use each sample to model the outcome using, in our case, a fixed
set of covariates. Get GOF measures of interest
3. Score the original data with the model obtained in 2. Obtain GOF
measures of interest on the model applied to the original data
... some additional steps irrelevant to my question
I've used David Cassell's advice to program, in very few lines, steps
1 and 2, by building a dataset with my X bootstrap samples with
replacement, and then running PROC LOGISTIC with the "BY REPLICATE"
statement.
To score the original data using each of my X models, I used the
OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a
second PROC LOGISTIC, this time with the INEST= option. But for this
to work the way I want, I need to use a "BY REPLICATE" statement and
this means that I need to have to create a dataset with my original
data repeated X times, each time with a new value of REPLICATE. This
allows me to avoid the do loop. The negative aspect (though it might
be mitigated by the efficiency of using the BY statement) is that I
need to create this dataset and depending on the value of X, it can
get quite large. Can you think of other ways this could be done as
efficiently as steps 1 and 2 (perhaps from your own experiences)?
Thank you.
Daniel
|