|
If it is only for the Scoring step, OP does not need to replicate the
original data X times
use:
PROC SCORE , the REPLICATE variable in SCORE= dataset will to do this job,
for example:
data Remission;
input remiss cell smear infil li blast temp;
label remiss='Complete Remission';
datalines;
1 .8 .83 .66 1.9 1.1 .996
1 .9 .36 .32 1.4 .74 .992
0 .8 .88 .7 .8 .176 .982
0 1 .87 .87 .7 1.053 .986
1 .9 .75 .68 1.3 .519 .98
0 1 .65 .65 .6 .519 .982
1 .95 .97 .92 1 1.23 .992
0 .95 .87 .83 1.9 1.354 1.02
0 1 .45 .45 .8 .322 .999
0 .95 .36 .34 .5 0 1.038
0 .85 .39 .33 .7 .279 .988
0 .7 .76 .53 1.2 .146 .982
0 .8 .46 .37 .4 .38 1.006
0 .2 .39 .08 .8 .114 .99
0 1 .9 .9 1.1 1.037 .99
1 1 .84 .84 1.9 2.064 1.02
0 .65 .42 .27 .5 .114 1.014
0 1 .75 .75 1 1.322 1.004
0 .5 .44 .22 .6 .114 .99
1 1 .63 .63 1.1 1.072 .986
0 1 .33 .33 .4 .176 1.01
0 .9 .93 .84 .6 1.591 1.02
1 1 .58 .58 1 .531 1.002
0 .95 .32 .3 1.6 .886 .988
1 1 .6 .6 1.7 .964 .99
1 1 .69 .69 .9 .398 .986
0 1 .73 .73 .7 .398 .986
;
run;
proc sort data=Remission; by Remiss; run;
ods select none;
proc surveyselect data=Remission out=samp rate=1 method=urs outhits rep=10;
strata remiss;
run;
ods select all;
proc sort data=samp;by Replicate; run;
proc logistic data=samp outest=est noprint;
by Replicate;
model remiss(event='1')=cell smear infil li blast temp;
run;
proc score data=Remission(rename=(remiss=remiss0)) out=out score=est
type=parms;
by replicate;
var cell smear infil li blast temp;
run;
On Tue, 22 Sep 2009 10:43:29 -0500, Data _null_; <iebupdte@GMAIL.COM> wrote:
>No read it again
>
>> this means that I need to have to create a dataset with my original
>> data repeated X times, each time with a new value of REPLICATE
>
>
>On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote:
>> add one more option: OUTHITS
>> otherwise multiple replicated records will be collapsed into one
>> besides, for Bootstrap analysis, OP needs to sample WITH REPLACEMENT, not
>> WITHOUT REPLACEMENT
>>
>> **********************;
>> ods select none;
>> proc surveyselect data=sashelp.class out=class100
>> rate=1 method=urs rep=100 outhits;
>> run;
>> ods select all;
>> **********************;
>>
>>
>> On Tue, 22 Sep 2009 10:23:39 -0500, Data _null_; <iebupdte@GMAIL.COM>
wrote:
>>
>> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
>> >> this means that I need to have to create a dataset with my original
>> >> data repeated X times, each time with a new value of REPLICATE
>> >
>> >METHOD=URS does NOT produce the data the that I think the OP is
>> >requesting. If I understand correctly he wants to replicate the
>> >original data set REP=n times.
>> >
>> >Similar to this but with less work.
>> >
>> >data class10;
>> > set
>> > sashelp.class(in=in1 )
>> > sashelp.class(in=in2 )
>> > sashelp.class(in=in3 )
>> > sashelp.class(in=in4 )
>> > sashelp.class(in=in5 )
>> > sashelp.class(in=in6 )
>> > sashelp.class(in=in7 )
>> > sashelp.class(in=in8 )
>> > sashelp.class(in=in9 )
>> > sashelp.class(in=in10) open=defer;
>> > replicate = index(cats(of in:),'1');
>> > run;
>> >
>> >
>> >Using URS does not do that produce that same result.
>> >
>> >2048 proc surveyselect method=urs rate=1 rep=10 data=sashelp.class
>> >out=class10;
>> >2049 run;
>> >
>> >NOTE: The data set WORK.CLASS10 has 124 observations and 7 variables.
>> >
>> >
>> >On 9/22/09, oloolo <dynamicpanel@yahoo.com> wrote:
>> >> in addition to what DATA _NULL_ said, be sure to use:
>> >> method=urs
>> >> to get a random sample WITH REPLACEMENT
>> >> you can set other values for "rate=", say rate=0.7
>> >>
>> >> proc surveyselect data=yourdata out=sample
>> >> rate=1 method=urs rep=100;
>> >> run;
>> >>
>> >> On Tue, 22 Sep 2009 10:01:24 -0500, Data _null_; <iebupdte@GMAIL.COM>
>> wrote:
>> >>
>> >> >Consider a SURVEYSELECT with RATE=1. This is in one of Cassel's
paper
>> >> >but you may have missed it.
>> >> >
>> >> >2042 proc surveyselect rate=1 rep=10 data=sashelp.class out=class10;
>> >> >2043 run;
>> >> >
>> >> >NOTE: Under the specified sampling rate, all units will be included
in
>> >> >the sample.
>> >> >NOTE: The data set WORK.CLASS10 has 190 observations and 6 variables.
>> >> >
>> >> >
>> >> >
>> >> >On 9/22/09, Daniel <daniel.biostatistics@gmail.com> wrote:
>> >> >> Good morning All,
>> >> >>
>> >> >> I am developing a predictive model (outcome binary) following the
>> >> >> methodology outlined in "Clinical prediction models" by
Steyerberg, or
>> >> >> that in StatMed vol. 15 pp. 361-387 (Multivariable prognostic
models:
>> >> >> Issues in developing models, evaluating assumtions and adequacy,
and
>> >> >> measuring and reducing errors). I am using bootstrap to obtain
>> >> >> measures of shrinkage and optimism to correct my regression
>> >> >> coefficients and goodness of fit (GOF) measures (respectively) for
>> >> >> overfitting. The steps include:
>> >> >>
>> >> >> 1. Obtain X bootstrap samples with replacement, of the same size as
>> >> >> the original data
>> >> >> 2. Use each sample to model the outcome using, in our case, a fixed
>> >> >> set of covariates. Get GOF measures of interest
>> >> >> 3. Score the original data with the model obtained in 2. Obtain GOF
>> >> >> measures of interest on the model applied to the original data
>> >> >> ... some additional steps irrelevant to my question
>> >> >>
>> >> >> I've used David Cassell's advice to program, in very few lines,
steps
>> >> >> 1 and 2, by building a dataset with my X bootstrap samples with
>> >> >> replacement, and then running PROC LOGISTIC with the "BY REPLICATE"
>> >> >> statement.
>> >> >>
>> >> >> To score the original data using each of my X models, I used the
>> >> >> OUTEST= option in my PROC LOGISTIC run of step 2, and I then run a
>> >> >> second PROC LOGISTIC, this time with the INEST= option. But for
this
>> >> >> to work the way I want, I need to use a "BY REPLICATE" statement
and
>> >> >> this means that I need to have to create a dataset with my original
>> >> >> data repeated X times, each time with a new value of REPLICATE.
This
>> >> >> allows me to avoid the do loop. The negative aspect (though it
might
>> >> >> be mitigated by the efficiency of using the BY statement) is that I
>> >> >> need to create this dataset and depending on the value of X, it can
>> >> >> get quite large. Can you think of other ways this could be done as
>> >> >> efficiently as steps 1 and 2 (perhaps from your own experiences)?
>> >> >>
>> >> >> Thank you.
>> >> >>
>> >> >> Daniel
>> >> >>
>> >>
>>
|