Date: Thu, 4 Nov 2004 08:36:15 -0600
Reply-To: "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Subject: Re: Output first record of consecutive recordstreak(s)
withcountof each streak
Content-Type: text/plain; charset="us-ascii"
Jack and Micheal,
I am refering to Paul's paper "The magnificent DO". Part of my
confusion perhaps lies in the following under the section "DOW-Loop: The
Scheme". Here Paul's DoW construction does show the use of until:
Data...;
<Stuff done before Break-event>;
Do <index Specs> Until (break-Event);
set A;
<stuff done for each Record>;
end;
<stuff done after break-event...>;
Run;
Which would imply that one would need an until but not necessarily a by
statement.
But in the following paragraph he states:
"In most (although not all) situations where the DOW-loop is applicable,
the input data set is grouped and/or sorted, and the break event occurs
when the last observation in the by-group has been processed. In such a
case, the DOW-loop logically separates actions that are performed."
Which lead me to think that one needed the "until (break-event)" but not
the by statement.
Then while researching post by Paul, specifically the DoW, I ran across
what he called a double DoW, example:
data two ;
do count = 1 by 1 until ( last.inst ) ;
set ttotal ;
by inst ;
totfund = sum (totfund, fund, 0) ;
end ;
do _n_ = 1 to count ;
set ttotal ;
output ;
end ;
run ;
From post:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0406D&L=sas-l&P=R24734
Which as you can see has no until (break-evet) in the second do-loop.
But uses a counter from the first.
Which is exactly what Howard and I did in the code we posted. But even
Paul (the mightest proponet of the DoW) considered the second do-loop in
the above example not just an explicit do-loop but rather a second DoW.
So I guess the question is what constitutes a DoW.
Does it have to have a until (break-event).
If so then when doing more than one DoW do the other do-loops need to
have a until (break- event) to be also considered a DoW?
Or do the secondary and terterary DoW's mearly need to be linked to the
first that has an until(break-event) by some means that relates to the
break-event?
Toby Dunn
Thinking I have as many questions as when I started this shindig.
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Jack Hamilton
Sent: Wednesday, November 03, 2004 6:39 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Output first record of consecutive recordstreak(s)
withcountof each streak
I would say that that's also just a loop, not a DoW loop.
An important characteristic of the DoW loop, in my understanding of it,
is that it explicitly handles a group of data, not just one observation,
in each iteration of the data step. A second characteristic is that it
uses default data step behavior to handle resetting group summary
variables to missing.
The sample below doesn't have either of those characteristics. It's
just a loop.
Toby's code is likewise just three loops. It uses another technique
championed by Paul Dorfman, which is explicit iteration through a data
set, but in my opinion that doesn't mean it has DoW loops. It doesn't
take advantage of
I'm not sure which of Paul's papers Toby is referring to.
--
JackHamilton@FirstHealth.com
Manager, Technical Development
Metrics Department, First Health
West Sacramento, California USA
>>> "Michael Murff" <mjm33@MSM1.BYU.EDU> 11/03/2004 4:05 PM >>>
Just curious, while this thread is still alive. I am (re)reading Rick
Aster's "Professional SAS Programming Secrets" (1e, pub. 1991). In the
section under linear search (p. 416), it reads:
<quoted from the book>
LOOKUP=.;
DO POINT = 1 TO NOBS UNTIL(LOOKUP NE .);
SET TABLE (KEEP = KEY LUOOKUP RENAME=(JET=LKEY LOOKUP=LLOOKUP))
POINT=POINT NOBS=NOBS;
IF KEY = LKEY THEN LOOKUP = LLOOKUP;
END;
<end of quote>
If the only criterion for a DoW is to put the set statement within an
explicit DO loop, then wouldn't this be considered an (early) use of the
said programming technique? And wouldn't this imply an appellation such
as, ehem, DoA, with all due respect to Master Ian. Admittedly, this
acronym is in use (think film noir).
M.M.
Provo, UT
>>> "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US> 11/3/2004 2:31:33 PM >>>
Jack,
I thought and I hope I read Paul Dorfmans paper correctly, that a DoW
(Do-loop of Whitlock for those who do not know) has less to do with the
construction of the do-loop, but rather has everything to do with the
fact that set statement is inside a do-loop. Thus, takeing us from a
internal looping process that makes fuzzy the before, during, and after
the intenal loop, to a distinct before , during, and after the internal
looping process of a datastep.
The by part of the loop is ussually unneccessary ( I believe even Howard
stated that it could be taken off in example that he gave) and the until
part of the loop is not always necessary.
But if I interpreted Paul's paper wrong then I stand corrected.
Paul did I get it correct?
Toby Dunn
-----Original Message-----
From: Jack Hamilton [mailto:JackHamilton@firsthealth.com]
Sent: Wednesday, November 03, 2004 3:21 PM
To: SAS-L@LISTSERV.UGA.EDU; Dunn, Toby
Subject: Re: [SAS-L] Output first record of consecutive
recordstreak(s)
with countof each streak
That looks like three loops, but not three DoW loops, which typically
contain UNTIL last. in the DO statement and a BY statement for each SET.
See Howard's example (which uses a counter instead of UNTIL in the
second DoW).
--
JackHamilton@FirstHealth.com
Manager, Technical Development
Metrics Department, First Health
West Sacramento, California USA
>>> "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US> 11/03/2004 11:29 AM >>>
Micheal,
Here is a post that sent awhile back that uses a triple DoW,
And just because I am really bored at work today:
A triple DoW Solution:
Data one (drop = i j k);
do i = 1 by 1 until (eof);
set old end = eof;
end;
do k = 1 to i by 1;
set old;
if group ne k then do;
group = k;
name = "new";
pd = 0;
util = 0;
output;
end;
end;
do j = 1 to i;
set old;
output;
end;
Run;
Toby Dunn
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Michael Murff
Sent: Wednesday, November 03, 2004 1:04 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Output first record of consecutive recordstreak(s) with
countof each streak
Hi SAS-L,
I would like to warmly thank (in no particular order) Kevin, Howard,
Puddin, Nat, and Richard for their expert assistance on this problem.
For summary purposes I include three working approaches below. The goal
was to count up the number of streaks or runs within a by group. I have
tested these programs for their CPU efficiency on a PC, but did not
bother to do so on our Linux server.
I ran two test, one with i=100 and one with i=100K. Note that N is
roughly equal to i*50 (as an upper limit). The performance was as
follows:
(rd) - (small n: ~.18 sec; large n: ~20 sec)
(hs) - (small n: ~.07 sec; large n: ~5 sec)
(kv) - (small n: ~.18 sec; large n: ~21 sec)
These numbers are the sum of each step, where more than one is used. I
have to say that the Double DoW is pretty darn impressive both in speed
and syntactical brevity. Can anyone think of a case where a triple DoW
or some higher order DoW might be useful?
Best,
Michael Murff
**********************************************;
%let fromdate=01jan1980;
%let todate=01nov2004;
/* data simulation */
data test;
do i=1 to 100;
do j = 1 to int(50*ranuni(10));
coname=compress("coname"||put(i,z3.));
date1=ranuni(1)*("&todate"d-"&fromdate"d)+"&fromdate"d;
date2=ranuni(2)*("&todate"d-"&fromdate"d)+"&fromdate"d;
a = ranuni(3);
b = ranuni(4);
if a > b then streak = 1;
else streak = 0;
obs +1;
output;
format date1 date2 date9.;
end;
end;
drop i j;
run;
/************/
/* Richard Devenzia's Solution */
/* two-step approach */
/************/
data rdtemp;
length streakid markcount 8;
set test;
by coname;
if first.coname then markcount = 0;
if a>b
then markcount + 1;
else markcount = 0;
if markcount = 1 then
streakid + 1;
run;
proc sql;
create table rdevenzia as
select coname, date1,date2, a,b, obs, max(markcount) as streaklength
from rdtemp
group by coname, streakid
having markcount=1 and max(markcount) > 3
;
quit;
/************/
/* Howard Schreier's Solution */
/* Double DoW */
/************/
data hschreier(drop=n);
do cnt = 1 by 1 until(last.streak);
set hstemp;
by coname streak notsorted;
end;
do n = 1 to cnt;
set hstemp;
by coname streak notsorted;
if n=1 and streak and cnt>3 then output;
end;
run;
/************/
/* Kevin Viel's Solution */
/* SQL / DoW with Array processing */
/************/
proc sql noprint ;
select put( max( count ) , 8. ) into : nobs
from ( select count( * ) as count
from test
group by coname
)
;
select put( count( distinct coname ) , 8. ) into : n_coname
from test
;
quit;
%let streak = 4 ;
data kviel ( drop = _m_ ) ;
array O_ ( &n_coname. , &nobs. ) _temporary_ ;
array C_ ( &n_coname. , &nobs. ) _temporary_ ;
do _m_ = 1 by 1 until ( end ) ;
count = 0 ;
do _n_ = 1 by 1 until ( last.coname ) ;
set test end = end ;
by coname ;
if last.coname = 0 then
do ;
if a <= b and count => &streak. then
do ;
O_( _m_ , _n_ - count ) = 1 ;
C_( _m_ , _n_ - count ) = count ;
count = 0 ;
end ;
else if a > b then count + 1 ;
else if a <= b then count = 0 ;
end ;
else
do ;
if a > b and count => %eval( &streak. - 1 ) then
do ;
O_( _m_ , _n_ - count ) = 1 ;
C_( _m_ , _n_ - count ) = count ;
count = 0 ;
end ;
else if a <= b and count => &streak. then
do ;
O_( _m_ , _n_ - count ) = 1 ;
C_( _m_ , _n_ - count ) = count ;
end ;
end ;
end ;
end ;
do _m_ = 1 by 1 until ( end1 ) ;
do _n_ = 1 by 1 until ( last.coname ) ;
set test end = end1 ;
by coname ;
*obs + 1 ;
if O_( _m_ , _n_ ) = 1 then
do ;
count = C_( _m_ , _n_ ) ;
output ;
end ;
end ;
end ;
run ;
proc compare base=hschreier compare=kviel;
var obs date1 date2;
run;
proc compare base=hschreier compare=rdevenzia;
var obs date1 date2;
run;