Date: Thu, 13 Jun 2002 13:44:58 -0600
Reply-To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Subject: Re: dow loop?
Content-Type: text/plain; charset=US-ASCII
I think "Whitlock loop" would be a better term. "DOW loop" sounds like
it has something to do with industrial averages. And why is the "O"
capitalized?
--
JackHamilton@FirstHealth.com
Manager, Technical Development
METRICS Department, First Health
West Sacramento, California USA
>>> "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV> 06/13/2002 11:07
AM >>>
Jay Weedon <jweedon@EARTHLINK.NET> wrote:
> Sorry to display such ignorance. Can someone point me to an
> explanation of what this term means & what it's used for?
Jay,
It's not you. It's us. The term has become nearly standard on
SAS-L for the Whitlock do-loop. [I believe Paul coined the term, but
if
I'm
wrong I will be promptly corrected - and if I'm right, I may still be
promptly corrected! :-] That's the situation where one puts the SET
statement inside the do-loop and uses some data set information to end
the loop (typically, last.whatever or end-of-file). Ian or Paul [or
any other list guru] may wish to refine my loose description.
I was recently confronted with the issue of documenting my use of the
DOW-loop in some production code, and I asked Paul off-list what he
did
in the same situation. Here is the disclaimer he suggested:
------------------------------------------------------------------------
This program may contain one or more constructs similar to the
following:
Data <...Data Set Names...> ;
<...Stuff Executed Before Break-Event... > ;
Do <...Cnt-Var = From-Var By Step-Var...> Until ( Break-Event ) ;
Set A ;
<...Stuff Executed For Each In-Record...> ;
End ;
<...Stuff Executed After Break-Event... > ;
Run ;
<The code between angle brackets is, generally speaking, optional.> We
call
the structure the DOW-loop, where W stands for Ian Whitlock.
The intent of organizing such a structure is to achieve a logical
isolation
of instructions executed between two successive break-events from
actions
performed before and after a break-event, in the most programmatically
natural manner. In most (but not all) situations, the input data set
is
grouped and/or sorted, and the break-event occurs when the last record
in a
by-group has been processed. In such a case, the DOW-loop logically
separates actions performed (1) before the first record in a by-group
is
read, (2) for each record in the group, and (3) after the last record
in
the
group is read.
Example: Input file A is sorted by ID. This step multiplies and
summarizes
all VAR values within each ID-group, counts the number of all and
non-missing records in each group, finds the group average, and writes
1
record with COUNT, SUM, MEAN and PROD to file B after each by-group:
Data B ( Keep = Id Prod Sum Count Mean) ;
Prod = 1 ;
Do Count = 1 By 1 Until ( Last.Id ) ;
Set A ;
By Id ;
If Var <= .Z Then Continue ;
Mcount = Sum (Mcount, 1) ;
Prod = Prod * Var ;
Sum = Sum (Sum, Var) ;
End ;
Mean = Sum / Mcount ;
Run ;
How it works (1, 2, 3 denote stuff performed before, between, and
after
break-event<s>): (1) PROD and COUNT are set to 1, and the non-retained
SUM,
MEAN, and MCOUNT are set to missing by default (control is at the top
of
the
Data step). (2) DOW-loop starts to iterate, reading the next record
from A
at the top of every iteration. While it iterates, control never leaves
the
Do-End boundaries. If VAR is missing, CONTINUE passes control straight
to
the bottom of the loop, otherwise MCOUNT, PROD and SUM are computed.
After
the last record in the group is processed, the loop stops. At this
point,
PROD, COUNT, SUM, and MEAN contain the group-aggregate values. (3)
Control
is transferred to the statement following the loop. MEAN is computed,
and
control is passed to the bottom of the step, where the implicit OUTPUT
writes the record to B. Control is passed to the top of the step, the
variables are re-initialized, and the next group is processed.
Note: Contrary to the common practice, the accumulation variables need
NOT
be retained. Because the DOW-loop passes control to the top of the
Data
step
ONLY before the first record in a by-group is to be read, this is the
only
point where non-retained variables are reset to missing, and it is
exactly
where this action is required.
-------------------------------------------
-- Paul Dorfman 2001/08/11 --
I think that describes the situation better than I would have.
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician