LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 16 Dec 2006 04:32:18 -0500
Reply-To:   Marina Kekrou <mkekrou@YAHOO.CO.UK>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Marina Kekrou <mkekrou@YAHOO.CO.UK>
Subject:   Re: Jacknife regressions

David, To be honest, what I am really after is to compute predicted values of y after dropping the specific observation for which I want to make a prediction.

I used the code you originally sent data outb; do replicate = 1 to num; do rec = 1 to num; set aaa nobs=num point=rec; if replicate ^= rec then output; end; end; stop; run;

and then tried the following

proc sort data=outb; by gvkey; run;

proc reg data=outb; model price_adjusted = data18PPCTOTALatadjustedshares BV_ADJ_adjustedshares; by replicate; output out=b p=yhat; run; quit;

to get the out of sample predicted values i am interested in. I am not sure though whether the specific observation is dropped when computing the out of sample predicted value.

Thanks for the additional points you made. I have already trimmed the outliers. What exactly do you mean by validation dataset?

Thanks,

Marina

On Fri, 15 Dec 2006 15:41:00 -0800, David L Cassell <davidlcassell@MSN.COM> wrote:

>mkekrou@YAHOO.CO.UK wrote: >>On Thu, 14 Dec 2006 14:43:43 -0800, David L Cassell <davidlcassell@MSN.COM> >>wrote: >> >> >mkekrou@YAHOO.CO.UK wrote: >> >> >> >>Hello everybody, >> >> >> >>I want to estimate out-of-sample predicted values using jacknife >> >>regressions. I am using the following code which ideally want to adapt >>to >> >>estimate predicted values in a similar vein to proc reg >> >>(proc reg data=test; >> >>model y=x; >> >>output p=pred; >> >>run; >> >>quit;) >> >> >> >>Any ideas about how to estimate predicted values will be much >>appreciated. >> >> >> >>DATA one; >> >> >> >>set a END=lastcase; >> >> >> >>IF lastcase THEN CALL SYMPUT ('ncases', _N_) ; >> >> >> >>RUN; >> >> >> >>*Macro portion of program begins here; >> >> >> >>%MACRO JackReg ; >> >> >> >>%DO I = 1 %TO &ncases ; >> >> >> >>DATA temp&I ; >> >> >> >>SET one ; >> >> >> >>IF _N_ NE &I ; >> >> >> >>RUN; >> >> >> >>PROC REG OUTEST = loopIest adjrsq; >> >> >> >>Omits&I: MODEL y = x w >> >>*Specify your model in the line above; >> >> >> >>RUN ; >> >> >> >>PROC APPEND BASE = RegEsts NEW = loopIest ; >> >> >> >>RUN ; >> >> >> >>%END ; >> >> >> >>%MEND JackReg; >> >> >> >>*Macro portion of program ends here; >> >> >> >>%JackReg; >> >> >> >>*this statement actually runs the macro JackReg; >> >> >> >>* End of jackknife regression program ; >> >> >> >>*Calculate mean coefficient and R-square estimates; >> >> >> >>proc means data=Regests; >> >> >> >>run; >> >> >> >> >> >>Thanks >> > >> >Let me suggest that you scrap the macro approach *entirely*. >> >Things will be simpler - and faster - if you write this as: >> > >> >[1] a process to build all your jackknife data sets in one long data >> > set; >> >[2] a single PROC REG with a BY statement; and then >> >[3] your PROC MEANS at the end. >> > >> >Although I might use a PROC UNIVARIATE at the end, so I >> >could get nonparametric confidence intervals. >> > >> >In addition, let me suggest that jackknifing and bootstrapping >> >will not solve all your problems here anyway. It depends on what >> >you are trying to do (the big picture, not "I want a jackknife") >> >and what your data are like. In fact, you might have better >> >success running the data through PROC ROBUSTREG instead of >> >doing all this. It depends on the data, the data sources, the >> >data features, the study purpose, the . . . >> > >> > >> >So let me show you how I would do the jackknife here, >> >and you can decide where to go after that. >> > >> > >> > /* build all N jackknife data sets in OUTB */ >> >data outb; >> > do replicate = 1 to num; >> > do rec = 1 to num; >> > set YourData nobs=num point=rec; >> > if replicate ^= rec then output; >> > end; >> > end; >> > stop; >> > run; >> > >> > /* use by-processing */ >> >proc reg data=outb outest=Regests adjrsq; >> > model y = x w; >> > run; >> > >> > /* then aggregate however you want */ >> >proc univariate data=Regests . . . . >> > . . . . . >> > >> > >> >Now whether using a jackknife will achieve your goals >> >is another matter entirely. >> > >> >HTH, >> >David >> >-- >> >David L. Cassell >> >mathematical statistician >> >Design Pathways >> >3115 NW Norwood Pl. >> >Corvallis OR 97330 >> > >> >_________________________________________________________________ >> >All-in-one security and maintenance for your PC. Get a free 90-day >>trial! >> >http://clk.atdmt.com/MSN/go/msnnkwlo0050000002msn/direct/01/? >>href=http://clk.atdmt.com/MSN/go/msnnkwlo0050000001msn/direct/01/? >>href=http://www.windowsonecare.com/?sc_cid=msn_hotmail >> >>Many thanks for your reply. >> >>Let me explain what I want to do-I basically want to run a horse race >>between 2 models and therefore want to calculate out of sample predicted >>values. I use jacknife because i want to get predictions for each firm >>without using that firm's data to get its predicted value. >> >>Now the code you wrote works ok; when you recommend using by processing i >>guess you mean to run the regression by replicate.BUT the issue is that I >>don't get the predicted values I am looking for. Is there any option i can >>use to generate the predicted value of the dependent variable y? I am >>looking for an option similar to that in prog reg that you can get >>predicted values proc reg data=a; >> model y =x1 x2; >> output out=b >> p=yhat; >> run; >> >>Thanks > >Oh, I see that you are getting the right results with my code. > >Okay, first off, what you are doing is not really a jackknife resampling. >Technically, if you want to be all jargon-y, this is more of a cross- >validation effort. You don't really want to compute N statistics, >each based on N-1 records. You want to get a single point estimate >on each record, and that point estimate can be computed *without* >doing a jackknife because it only requires a tweaking of the original >X'X matrix to do the computations. > >As Ian has sagely pointed out, you do not have to do any resampling >to get what you are after, since the PRESS statistic gives you the >ith residual divided by (1-h), where h is the leverage, and where the >model has been refit without the ith observation. So I think that >Ian's code needs a *tiny* tweak. > >Change his OUTPUT statement to: > > output out=regests press=y_hp h=lever; > >and change the transformation in the following data step to: > > y_pred = y - y_hp*(1-lever) ; > > >Then you don't need any looping process or by-processing at all. > >But I'm not sure that you should be trying this. Now that I see >what you are really after, I think I should point out that you need >to clean up your data before trying any sort of 'horse race'. A >single leverage point or outlier, or any other sort of deviation from >the Ordianary Least Squares regression assumptions can mess this >up. Furthermore, a horse race may end up telling you more about >how well one of your models fits extraneous random noise than >how the model really fits your data. Or how well the model will >work on new data. I would recommend using some 'validation' >data sets for this sort of examination instead. > >HTH, >David >-- >David L. Cassell >mathematical statistician >Design Pathways >3115 NW Norwood Pl. >Corvallis OR 97330 > >_________________________________________________________________ >Get the latest Windows Live Messenger 8.1 Beta version. Join now. >http://ideas.live.com


Back to: Top of message | Previous page | Main SAS-L page