LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Nov 2004 08:36:15 -0600
Reply-To:     "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Subject:      Re: Output first record of consecutive recordstreak(s)
              withcountof each streak
Comments: To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Content-Type: text/plain; charset="us-ascii"

Jack and Micheal,

I am refering to Paul's paper "The magnificent DO". Part of my confusion perhaps lies in the following under the section "DOW-Loop: The Scheme". Here Paul's DoW construction does show the use of until:

Data...; <Stuff done before Break-event>; Do <index Specs> Until (break-Event); set A; <stuff done for each Record>; end; <stuff done after break-event...>; Run;

Which would imply that one would need an until but not necessarily a by statement.

But in the following paragraph he states:

"In most (although not all) situations where the DOW-loop is applicable, the input data set is grouped and/or sorted, and the break event occurs when the last observation in the by-group has been processed. In such a case, the DOW-loop logically separates actions that are performed."

Which lead me to think that one needed the "until (break-event)" but not the by statement.

Then while researching post by Paul, specifically the DoW, I ran across what he called a double DoW, example:

data two ; do count = 1 by 1 until ( last.inst ) ; set ttotal ; by inst ; totfund = sum (totfund, fund, 0) ; end ; do _n_ = 1 to count ; set ttotal ; output ; end ; run ;

From post: http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0406D&L=sas-l&P=R24734

Which as you can see has no until (break-evet) in the second do-loop. But uses a counter from the first.

Which is exactly what Howard and I did in the code we posted. But even Paul (the mightest proponet of the DoW) considered the second do-loop in the above example not just an explicit do-loop but rather a second DoW.

So I guess the question is what constitutes a DoW.

Does it have to have a until (break-event).

If so then when doing more than one DoW do the other do-loops need to have a until (break- event) to be also considered a DoW?

Or do the secondary and terterary DoW's mearly need to be linked to the first that has an until(break-event) by some means that relates to the break-event?

Toby Dunn Thinking I have as many questions as when I started this shindig.

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Jack Hamilton Sent: Wednesday, November 03, 2004 6:39 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Output first record of consecutive recordstreak(s) withcountof each streak

I would say that that's also just a loop, not a DoW loop.

An important characteristic of the DoW loop, in my understanding of it, is that it explicitly handles a group of data, not just one observation, in each iteration of the data step. A second characteristic is that it uses default data step behavior to handle resetting group summary variables to missing.

The sample below doesn't have either of those characteristics. It's just a loop.

Toby's code is likewise just three loops. It uses another technique championed by Paul Dorfman, which is explicit iteration through a data set, but in my opinion that doesn't mean it has DoW loops. It doesn't take advantage of

I'm not sure which of Paul's papers Toby is referring to.

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Michael Murff" <mjm33@MSM1.BYU.EDU> 11/03/2004 4:05 PM >>> Just curious, while this thread is still alive. I am (re)reading Rick Aster's "Professional SAS Programming Secrets" (1e, pub. 1991). In the section under linear search (p. 416), it reads:

<quoted from the book> LOOKUP=.; DO POINT = 1 TO NOBS UNTIL(LOOKUP NE .); SET TABLE (KEEP = KEY LUOOKUP RENAME=(JET=LKEY LOOKUP=LLOOKUP)) POINT=POINT NOBS=NOBS; IF KEY = LKEY THEN LOOKUP = LLOOKUP; END; <end of quote>

If the only criterion for a DoW is to put the set statement within an explicit DO loop, then wouldn't this be considered an (early) use of the said programming technique? And wouldn't this imply an appellation such as, ehem, DoA, with all due respect to Master Ian. Admittedly, this acronym is in use (think film noir).

M.M. Provo, UT

>>> "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US> 11/3/2004 2:31:33 PM >>> Jack,

I thought and I hope I read Paul Dorfmans paper correctly, that a DoW (Do-loop of Whitlock for those who do not know) has less to do with the construction of the do-loop, but rather has everything to do with the fact that set statement is inside a do-loop. Thus, takeing us from a internal looping process that makes fuzzy the before, during, and after the intenal loop, to a distinct before , during, and after the internal looping process of a datastep.

The by part of the loop is ussually unneccessary ( I believe even Howard stated that it could be taken off in example that he gave) and the until part of the loop is not always necessary.

But if I interpreted Paul's paper wrong then I stand corrected.

Paul did I get it correct?

Toby Dunn

-----Original Message----- From: Jack Hamilton [mailto:JackHamilton@firsthealth.com] Sent: Wednesday, November 03, 2004 3:21 PM To: SAS-L@LISTSERV.UGA.EDU; Dunn, Toby Subject: Re: [SAS-L] Output first record of consecutive recordstreak(s) with countof each streak

That looks like three loops, but not three DoW loops, which typically contain UNTIL last. in the DO statement and a BY statement for each SET. See Howard's example (which uses a counter instead of UNTIL in the second DoW).

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US> 11/03/2004 11:29 AM >>> Micheal,

Here is a post that sent awhile back that uses a triple DoW,

And just because I am really bored at work today:

A triple DoW Solution:

Data one (drop = i j k);

do i = 1 by 1 until (eof); set old end = eof; end;

do k = 1 to i by 1; set old;

if group ne k then do; group = k; name = "new"; pd = 0; util = 0; output; end; end;

do j = 1 to i; set old; output; end;

Run;

Toby Dunn

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael Murff Sent: Wednesday, November 03, 2004 1:04 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Output first record of consecutive recordstreak(s) with countof each streak

Hi SAS-L,

I would like to warmly thank (in no particular order) Kevin, Howard, Puddin, Nat, and Richard for their expert assistance on this problem. For summary purposes I include three working approaches below. The goal was to count up the number of streaks or runs within a by group. I have tested these programs for their CPU efficiency on a PC, but did not bother to do so on our Linux server.

I ran two test, one with i=100 and one with i=100K. Note that N is roughly equal to i*50 (as an upper limit). The performance was as follows:

(rd) - (small n: ~.18 sec; large n: ~20 sec) (hs) - (small n: ~.07 sec; large n: ~5 sec) (kv) - (small n: ~.18 sec; large n: ~21 sec)

These numbers are the sum of each step, where more than one is used. I have to say that the Double DoW is pretty darn impressive both in speed and syntactical brevity. Can anyone think of a case where a triple DoW or some higher order DoW might be useful?

Best,

Michael Murff

**********************************************;

%let fromdate=01jan1980; %let todate=01nov2004;

/* data simulation */ data test; do i=1 to 100; do j = 1 to int(50*ranuni(10)); coname=compress("coname"||put(i,z3.));

date1=ranuni(1)*("&todate"d-"&fromdate"d)+"&fromdate"d;

date2=ranuni(2)*("&todate"d-"&fromdate"d)+"&fromdate"d; a = ranuni(3); b = ranuni(4); if a > b then streak = 1; else streak = 0; obs +1; output; format date1 date2 date9.; end; end; drop i j;

run;

/************/ /* Richard Devenzia's Solution */ /* two-step approach */ /************/ data rdtemp; length streakid markcount 8;

set test; by coname;

if first.coname then markcount = 0;

if a>b then markcount + 1; else markcount = 0;

if markcount = 1 then streakid + 1; run;

proc sql; create table rdevenzia as select coname, date1,date2, a,b, obs, max(markcount) as streaklength from rdtemp group by coname, streakid having markcount=1 and max(markcount) > 3 ; quit;

/************/ /* Howard Schreier's Solution */ /* Double DoW */ /************/ data hschreier(drop=n);

do cnt = 1 by 1 until(last.streak); set hstemp; by coname streak notsorted; end;

do n = 1 to cnt; set hstemp; by coname streak notsorted; if n=1 and streak and cnt>3 then output; end; run;

/************/ /* Kevin Viel's Solution */ /* SQL / DoW with Array processing */ /************/

proc sql noprint ; select put( max( count ) , 8. ) into : nobs from ( select count( * ) as count from test group by coname ) ;

select put( count( distinct coname ) , 8. ) into : n_coname from test ; quit;

%let streak = 4 ; data kviel ( drop = _m_ ) ;

array O_ ( &n_coname. , &nobs. ) _temporary_ ; array C_ ( &n_coname. , &nobs. ) _temporary_ ;

do _m_ = 1 by 1 until ( end ) ; count = 0 ; do _n_ = 1 by 1 until ( last.coname ) ;

set test end = end ; by coname ;

if last.coname = 0 then do ; if a <= b and count => &streak. then do ; O_( _m_ , _n_ - count ) = 1 ; C_( _m_ , _n_ - count ) = count ; count = 0 ; end ; else if a > b then count + 1 ; else if a <= b then count = 0 ; end ; else do ; if a > b and count => %eval( &streak. - 1 ) then do ; O_( _m_ , _n_ - count ) = 1 ; C_( _m_ , _n_ - count ) = count ; count = 0 ; end ; else if a <= b and count => &streak. then do ; O_( _m_ , _n_ - count ) = 1 ; C_( _m_ , _n_ - count ) = count ; end ; end ; end ; end ;

do _m_ = 1 by 1 until ( end1 ) ; do _n_ = 1 by 1 until ( last.coname ) ; set test end = end1 ; by coname ; *obs + 1 ; if O_( _m_ , _n_ ) = 1 then do ; count = C_( _m_ , _n_ ) ; output ; end ; end ; end ; run ;

proc compare base=hschreier compare=kviel; var obs date1 date2; run;

proc compare base=hschreier compare=rdevenzia; var obs date1 date2; run;


Back to: Top of message | Previous page | Main SAS-L page