```Date: Fri, 15 Dec 2006 16:26:40 -0800 Reply-To: David L Cassell Sender: "SAS(r) Discussion" From: David L Cassell Subject: Jackknifing for fun and profit! Content-Type: text/plain; format=flowed SAS-Lers everywhere: As you may have noticed, the subject of producing a jackknife data set for computing a jackknife estimate has come up. The basic idea is that if you have N records, you want to analyze the data N times, each time omitting the Ith record. Then you have a linearization of the behavior of the statistic of interest. This means you end up with a data set with N*(N-1) rows. As N gets big, this gets ridiculously unwieldy. At N=1000, you're making a dataset that has nearly a million rows. So the construction of the data set starts to matter. Here's the code I showed Marina: data outb; do replicate = 1 to num; do rec = 1 to num; set test nobs=num point=rec; if replicate ^= rec then output; end; end; stop; run; But there are other ways to generate the OUTB data file, given the starting data set TEST. Here's a PROC SURVEYSELECT method. You knew I was going to go there sooner or later, didn't you? proc sql noprint; select count(*) into :size from test; quit; proc surveyselect data=test out=outb1 method=srs samprate=1 rep=&SIZE. ; run; data outb / view=outb; set outb1; if replicate=mod(_n_,&SIZE.)+1 then delete; run; This works because the proc spots that the sample will have to pull every record, so it just outputs all the records. In order. For each replicate. And you can do this with PROC SQL too, of course. Here's one way. data test2 / view=test2; set test; rec=_n_; run; proc sql noprint; create table outb as select a.rec as replicate, b.* from test2 a, test2 b where a.rec^=b.rec; quit; And you can try using the SASFILE to speed things up, although SAS tries to buffer the input data set anyway, so there is not much advantage for small files. So here's the question. Can you come up with a faster way of building the OUTB dataset so that it comes out already sorted by the value of REPLICATE ? By the nature of the process, it does not have to be sorted within each value of REPLICATE, unless you just want it that way. Feel free to make up your own TEST data set as a starting point. This is supposed to be a general solution, so if I offer a single TEST data set, that could bias the results. Just a little something since I didn't buy you a holiday gift, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Stay up-to-date with your friends through the Windows Live Spaces friends list. http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mk ```

Back to: Top of message | Previous page | Main SAS-L page