|
David,
To be honest, what I am really after is to compute predicted values of y
after dropping the specific observation for which I want to make a
prediction.
I used the code you originally sent
data outb;
do replicate = 1 to num;
do rec = 1 to num;
set aaa nobs=num point=rec;
if replicate ^= rec then output;
end;
end;
stop;
run;
and then tried the following
proc sort data=outb;
by gvkey;
run;
proc reg data=outb;
model price_adjusted = data18PPCTOTALatadjustedshares
BV_ADJ_adjustedshares;
by replicate;
output out=b
p=yhat;
run;
quit;
to get the out of sample predicted values i am interested in. I am not sure
though whether the specific observation is dropped when computing the out
of sample predicted value.
Thanks for the additional points you made. I have already trimmed the
outliers. What exactly do you mean by validation dataset?
Thanks,
Marina
On Fri, 15 Dec 2006 15:41:00 -0800, David L Cassell <davidlcassell@MSN.COM>
wrote:
>mkekrou@YAHOO.CO.UK wrote:
>>On Thu, 14 Dec 2006 14:43:43 -0800, David L Cassell
<davidlcassell@MSN.COM>
>>wrote:
>>
>> >mkekrou@YAHOO.CO.UK wrote:
>> >>
>> >>Hello everybody,
>> >>
>> >>I want to estimate out-of-sample predicted values using jacknife
>> >>regressions. I am using the following code which ideally want to adapt
>>to
>> >>estimate predicted values in a similar vein to proc reg
>> >>(proc reg data=test;
>> >>model y=x;
>> >>output p=pred;
>> >>run;
>> >>quit;)
>> >>
>> >>Any ideas about how to estimate predicted values will be much
>>appreciated.
>> >>
>> >>DATA one;
>> >>
>> >>set a END=lastcase;
>> >>
>> >>IF lastcase THEN CALL SYMPUT ('ncases', _N_) ;
>> >>
>> >>RUN;
>> >>
>> >>*Macro portion of program begins here;
>> >>
>> >>%MACRO JackReg ;
>> >>
>> >>%DO I = 1 %TO &ncases ;
>> >>
>> >>DATA temp&I ;
>> >>
>> >>SET one ;
>> >>
>> >>IF _N_ NE &I ;
>> >>
>> >>RUN;
>> >>
>> >>PROC REG OUTEST = loopIest adjrsq;
>> >>
>> >>Omits&I: MODEL y = x w
>> >>*Specify your model in the line above;
>> >>
>> >>RUN ;
>> >>
>> >>PROC APPEND BASE = RegEsts NEW = loopIest ;
>> >>
>> >>RUN ;
>> >>
>> >>%END ;
>> >>
>> >>%MEND JackReg;
>> >>
>> >>*Macro portion of program ends here;
>> >>
>> >>%JackReg;
>> >>
>> >>*this statement actually runs the macro JackReg;
>> >>
>> >>* End of jackknife regression program ;
>> >>
>> >>*Calculate mean coefficient and R-square estimates;
>> >>
>> >>proc means data=Regests;
>> >>
>> >>run;
>> >>
>> >>
>> >>Thanks
>> >
>> >Let me suggest that you scrap the macro approach *entirely*.
>> >Things will be simpler - and faster - if you write this as:
>> >
>> >[1] a process to build all your jackknife data sets in one long data
>> > set;
>> >[2] a single PROC REG with a BY statement; and then
>> >[3] your PROC MEANS at the end.
>> >
>> >Although I might use a PROC UNIVARIATE at the end, so I
>> >could get nonparametric confidence intervals.
>> >
>> >In addition, let me suggest that jackknifing and bootstrapping
>> >will not solve all your problems here anyway. It depends on what
>> >you are trying to do (the big picture, not "I want a jackknife")
>> >and what your data are like. In fact, you might have better
>> >success running the data through PROC ROBUSTREG instead of
>> >doing all this. It depends on the data, the data sources, the
>> >data features, the study purpose, the . . .
>> >
>> >
>> >So let me show you how I would do the jackknife here,
>> >and you can decide where to go after that.
>> >
>> >
>> > /* build all N jackknife data sets in OUTB */
>> >data outb;
>> > do replicate = 1 to num;
>> > do rec = 1 to num;
>> > set YourData nobs=num point=rec;
>> > if replicate ^= rec then output;
>> > end;
>> > end;
>> > stop;
>> > run;
>> >
>> > /* use by-processing */
>> >proc reg data=outb outest=Regests adjrsq;
>> > model y = x w;
>> > run;
>> >
>> > /* then aggregate however you want */
>> >proc univariate data=Regests . . . .
>> > . . . . .
>> >
>> >
>> >Now whether using a jackknife will achieve your goals
>> >is another matter entirely.
>> >
>> >HTH,
>> >David
>> >--
>> >David L. Cassell
>> >mathematical statistician
>> >Design Pathways
>> >3115 NW Norwood Pl.
>> >Corvallis OR 97330
>> >
>> >_________________________________________________________________
>> >All-in-one security and maintenance for your PC. Get a free 90-day
>>trial!
>> >http://clk.atdmt.com/MSN/go/msnnkwlo0050000002msn/direct/01/?
>>href=http://clk.atdmt.com/MSN/go/msnnkwlo0050000001msn/direct/01/?
>>href=http://www.windowsonecare.com/?sc_cid=msn_hotmail
>>
>>Many thanks for your reply.
>>
>>Let me explain what I want to do-I basically want to run a horse race
>>between 2 models and therefore want to calculate out of sample predicted
>>values. I use jacknife because i want to get predictions for each firm
>>without using that firm's data to get its predicted value.
>>
>>Now the code you wrote works ok; when you recommend using by processing i
>>guess you mean to run the regression by replicate.BUT the issue is that I
>>don't get the predicted values I am looking for. Is there any option i can
>>use to generate the predicted value of the dependent variable y? I am
>>looking for an option similar to that in prog reg that you can get
>>predicted values proc reg data=a;
>> model y =x1 x2;
>> output out=b
>> p=yhat;
>> run;
>>
>>Thanks
>
>Oh, I see that you are getting the right results with my code.
>
>Okay, first off, what you are doing is not really a jackknife resampling.
>Technically, if you want to be all jargon-y, this is more of a cross-
>validation effort. You don't really want to compute N statistics,
>each based on N-1 records. You want to get a single point estimate
>on each record, and that point estimate can be computed *without*
>doing a jackknife because it only requires a tweaking of the original
>X'X matrix to do the computations.
>
>As Ian has sagely pointed out, you do not have to do any resampling
>to get what you are after, since the PRESS statistic gives you the
>ith residual divided by (1-h), where h is the leverage, and where the
>model has been refit without the ith observation. So I think that
>Ian's code needs a *tiny* tweak.
>
>Change his OUTPUT statement to:
>
> output out=regests press=y_hp h=lever;
>
>and change the transformation in the following data step to:
>
> y_pred = y - y_hp*(1-lever) ;
>
>
>Then you don't need any looping process or by-processing at all.
>
>But I'm not sure that you should be trying this. Now that I see
>what you are really after, I think I should point out that you need
>to clean up your data before trying any sort of 'horse race'. A
>single leverage point or outlier, or any other sort of deviation from
>the Ordianary Least Squares regression assumptions can mess this
>up. Furthermore, a horse race may end up telling you more about
>how well one of your models fits extraneous random noise than
>how the model really fits your data. Or how well the model will
>work on new data. I would recommend using some 'validation'
>data sets for this sort of examination instead.
>
>HTH,
>David
>--
>David L. Cassell
>mathematical statistician
>Design Pathways
>3115 NW Norwood Pl.
>Corvallis OR 97330
>
>_________________________________________________________________
>Get the latest Windows Live Messenger 8.1 Beta version. Join now.
>http://ideas.live.com
|