LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2003, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 18 Dec 2003 17:05:35 -0500
Reply-To:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject:   Re: Time Interval Aggregation to an Event

I'm not getting the performance improvements I was hoping to see. Ordinarily I would put the problem aside and take another look tomorrow, but I'll be away for a few days so I'll post it now and hope that one of the sqlheads can tune it up.

First, I created macrovariables for the critical dimensions:

%let maxtime=20000; %let lookback=180;

Then I scaled up the test data by looping:

data one; input time vol type $; do time = time to &MAXTIME by 16; output; end; cards; 1 9 b 2 2 b 2 3 s 2 4 s 5 10 b 6 12 s 7 13 s 7 22 b 9 21 b 12 9 s 14 5 b 15 2 b 16 1 b ;

proc sort data=one; by time; run;

As before, I added sequence numbers:

data two; set one; obsnum ++ 1; run;

Here's the tweaked SQL code:

proc sql; create view cross as select 0 as offset, * from two union select 1 as offset, * from two ; create table three(drop=obsnum) as select distinct two.*, coalesce(sum( cross.vol*(cross.type='b') ),0) as bvol, coalesce(sum(-cross.vol*(cross.type='s') ),0) as svol, calculated bvol + calculated svol as net_vol from two left join cross on int(two.time/&LOOKBACK) - offset = int(cross.time/&LOOKBACK) and two.time-cross.time > 0 and two.time-cross.time <= &LOOKBACK group by two.obsnum order by two.obsnum ; quit;

The idea is to divide time into arbitrary segments of the specified length (&LOOKBACK). They are arbitrary in that it does not really matter where the boundaries fall. The subexpressions referencing the INT function do that. Then for any given observation, the past observations to be rolled into the BVOL and SVOL calculations either fall in the same segment or the immediately preceding one. That's what OFFSET is intended to reference.

So the added condition in the ON clause is actually redundant in that it is less restrictive than the inequality conditions. However, it is an equality condition, which the optimizer can exploit. Whether there is a practical benefit depends on the data, but since 180 is a small fraction of 20000, I expected to see significant faster execution.

For reference, here is the naive code, rewritten slightly to make use of the macrovariables:

proc sql; create table slow(drop=obsnum) as select distinct two.*, coalesce(sum( cross.vol*(cross.type='b') ),0) as bvol, coalesce(sum(-cross.vol*(cross.type='s') ),0) as svol, calculated bvol + calculated svol as net_vol from two left join two as cross on two.time-cross.time > 0 and two.time-cross.time <=&LOOKBACK group by two.obsnum order by two.obsnum ; quit;

I ran PROC COMPARE and confirmed that the two solutions yield identical results.

However, the time saving is only about 50 percent, less than I expected.

On Thu, 18 Dec 2003 12:34:46 -0500, Howard Schreier <Howard_Schreier@ITA.DOC.GOV> wrote:

>Here's my naive solution. > >As a preliminary, add sequence numbers to provide a unique key and to >assure preservation of the original order. > > data two; > set one; > obsnum ++ 1; > run; > >Now use a reflexive join to derive the new columns: > > proc sql; > create table three(drop=obsnum) as > select distinct > two.*, > coalesce(sum( cross.vol*(cross.type='b') ),0) as bvol, > coalesce(sum(-cross.vol*(cross.type='s') ),0) as svol, > calculated bvol + calculated svol as net_vol > from two left join two as cross > on two.time-cross.time > 0 and > two.time-cross.time < 4 > group by two.obsnum > order by two.obsnum > ; > >This join cannot be optimized, so performance may be a problem. But as I >indicated in an earlier post, there are tweaks which can remedy that. > >On Wed, 17 Dec 2003 18:24:31 -0500, Chang Y. Chung ><chang_y_chung@HOTMAIL.COM> wrote: > >>Hi, Dorian, >> >>Here is one solution -- which is kind of ugly, but works. >> >>Cheers, >>Chang >> >><sasl:code> >>data one; >> input time vol type $; >>cards; >>1 9 b >>2 2 b >>2 3 s >>2 4 s >>5 10 b >>6 12 s >>7 13 s >>7 22 b >>9 21 b >>12 9 s >>14 5 b >>15 2 b >>16 1 b >>run; >> >> /* a solution */ >> >> options mprint; >> >> /* step 1: create obs number -- for restoring the original order */ >> %macro addObs; >> create table addObs as >> select monotonic() as obs, * >> from one >> ; >> %mend; >> >> /* step 2: summary dataset for each type and time lag */ >> %macro summ(type=, lag=); >> create table &type.&lag. as >> select time+&lag. as time, sum(vol) as &type.&lag. >> from addObs >> where type = "&type." >> group by time >> ; >> %mend; >> >> /* short utilities */ >> %macro t2i(type) ; %*;%index (bs,&type. ) %mend; >> %macro i2t(index); %*;%substr(bs,&index.,1) %mend; >> >> /* step 3: put all of them together */ >> %macro allSumm(maxLag=); >> %local i t l tables vars; >> >> %*-- create summary datasets and list of vars and tables --*; >> %let tables = addObs; >> %let vars = addObs.time, addObs.vol; >> %do i = 1 %to 2; >> %let t = %i2t(&i.); >> %put ***t=&t.***; >> %do l = 1 %to &maxlag.; >> %summ(type=&t., lag=&l.); >> %let tables = &tables. left join &t.&l. on addObs.time = >>&t.&l..time; >> %let vars = &vars., &t.&l.; >> %end; >> %end; >> >> %*-- put the datasets together --*; >> create table allSumm as >> select &vars. >> from &tables. >> ; >> %mend; >> >> /* step 4: actual work */ >> proc sql; >> >> %addObs >> >> %allSumm(maxlag=3) >> >> select time, vol >> , coalesce(sum(b1,b2,b3),0) as bvol >> , coalesce(-1*sum(s1,s2,s3),0) as svol >> , coalesce(sum(calculated bvol, calculated svol),0) as net_vol >> from allSumm >> ; >> quit; >> >> /* output >> time vol bvol svol net_vol >> ------------------------------------------------ >> 1 9 0 0 0 >> 2 2 9 0 9 >> 2 4 9 0 9 >> 2 3 9 0 9 >> 5 10 2 -7 -5 >> 6 12 10 0 10 >> 7 13 10 -12 -2 >> 7 22 10 -12 -2 >> 9 21 22 -25 -3 >> 12 9 21 0 21 >> 14 5 0 -9 -9 >> 15 2 5 -9 -4 >> 16 1 7 0 7 >> */ >></sasl:code> >> >>On Wed, 17 Dec 2003 13:23:36 -0800, Dorian Noel >><d.noel@ISMACENTRE.RDG.AC.UK> wrote: >> >>>Dear SAS users, >>> >>>Season's Greetings. I am attempting to generate three new variables >>>conditioned on the occurrence of an event. For every event that >>>occurs, I am agregating the number of observations in a 3-second >>>window prior to the event (see sample data below). Three new >>>variables will be created in the process bvol (sum of Bs), svol (sum >>>of Ss) and net_vol (sum of bvol and svol). >>> >>>Time vol type bvol svol net_vol >>>1 9 b 0 0 0 note that + Bs and - Ss >>>2 2 b 9 0 9 >>>2 3 s 9 0 9 >>>2 4 s 9 0 9 >>>5 10 b 2 -7 -5 >>>6 12 s 10 0 10 >>>7 13 s 10 -12 -2 >>>7 22 b 10 -12 -2 >>>9 21 b 22 -25 -3 >>>12 9 s 21 0 21 >>>14 5 b 0 -9 -9 >>>15 2 b 5 -9 -4 >>>16 1 b 7 0 7 >>> >>>Thanks in advance for your assistance. >>> >>>Take care and enjoy the upcoming holidays. >>> >>>Dorian


Back to: Top of message | Previous page | Main SAS-L page