LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 9 Apr 2004 08:00:46 -0400
Reply-To:     "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject:      Re: records between dates

Robert wrote: > > For each date, I need to count the number events within the next 14 > day window. Such that the resulting dataset looks as follows: > > Obs id date events date+14 number of events > from date to date+14 > > 1 1 11/12/01 1 11/26/01 8 > 2 1 11/13/01 2 11/27/01 7

You indicate a repeated id, so presume you need to do this within a by group.

Here are two SQL ways, the second being faster. The SQL ways do cartesian joins, which do not scale well. I wouldn't use it for data with more than 10K rows.

*------------------------; proc sql _method; create table counts as select outer.* , outer.date+14 as endate format = mmddyy10. , ( select sum(inner.events) from foo as inner where inner.id=outer.id and inner.date between outer.date and outer.date+14 ) as nevents from foo as outer ; quit;

proc sql _method; create table counts2 as select distinct A.*, A.date+14 as endate format=mmddyy10., sum(B.events) as nevents from foo as A, foo as B where A.id = B.id and B.date between A.date and A.date+14 group by A.id, A.date ; quit; *------------------------;

Consider now a larger problem, 1M rows of data broken into 4,000 groups of 250 observations each. The SQL outer join will run and run and run.

*------------------------; data foo (index=(focus=(id date))); do id = 1 to 4000; do date = today()-249 to today(); * do date = today()-32 to today(); events = int (10*ranuni(0)); * events = 1; output; end; end; format date mmddyy10.; run;

*------------------------;

You can speed things up considerably by precomputing the 1M eventSums and then using SQL to merge the eventSums with the original data.

Q: How do you precompute a value from rolling ranges ? A: Ring arrays.

*------------------------; * day to day+14 contains 15 days; * only works if at most one day per obs, can handle sequences of rows where event counts of some days are not present;

* ix tracks the current slot for new data; * ix0 tracks the current slot that is start of ring; * when size of ring is large enough ix can never 'overrun' ix0 and correct results ensue. * when size of ring is too small, incorrect results will occurr and infinite loops may.;

data endsum (keep=id date0 sum14days);

* not _temporary_, each array element implicitly set to missing at top of data step; array d_ring [16] ; * ring dimension should be at least one larger than maximum number of obs in days range; array e_ring [16] ;

ix0 = 1; do ix = 1 by 1 until (last.id); set foo; by id;

date0 = d_ring[ix0]; if . < date0 and date - date0 > 14 then do;

* if current date was end of date range, it would be * larger than range of interest, thus data of date range of * interest (starting at d_ring[ix0]) is contained in the ring;

sum14days = 0;

* compute value from data in ring array;

do i = ix0 by 1; if i = ix then leave; if d_ring[i] = . then leave; if d_ring[i]-date0 > 14 then leave;

sum14days + e_ring[i];

* roll pointer, set to 0 because by 1 loop control will increment before executing statements in loop; if i >= dim(d_ring) then i = 0; end;

output;

* update start pointer;

ix0 + 1; if ix0 > dim(d_ring) then ix0 = 1; end;

* insert data;

d_ring [ix] = date; e_ring [ix] = events;

* update current pointer, set to 0 because 1 by 1 loop control will increment before until () test;

if ix >= dim(d_ring) then ix = 0; end;

* roll off anything left in the ring in reverse so as to not have to perform extra summing loops; * this means output will *NOT* be sorted in id,date order;

sum14days = 0; do i = ix by -1 until (i eq ix0); date0 = d_ring[i]; sum14days + e_ring[i]; output; if i = 1 then i = dim(d_ring)+1 ; end;

format date0 mmddyy10.; run;

proc sql; create table counts3 as select foo.* , date+14 as endate format=mmddyy10. , sum14days as eventsSumOverDateRange from foo, endsum where foo.id = endsum.id and foo.date = endsum.date0 ; quit;

*------------------------;

The ring array computation should perform roughly 14 * 1M inner loop iterations while computing the rolling sums. If the date range was larger, or the number of rows where larger, the computation could be sped up some more by maintaining a 'state' sum to which things are added to and subtracted from (reduces number of innermost looping)

You might also be able to get rolling sums from SAS/QC Proc MACONTROL or other SAS/STAT Procs.

-- Richard A. DeVenezia http://www.devenezia.com/downloads/sas/samples/


Back to: Top of message | Previous page | Main SAS-L page