Date: Sun, 26 Dec 2004 23:41:39 -0500
Reply-To: Don Henderson <donaldjhenderson@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Don Henderson <donaldjhenderson@HOTMAIL.COM>
Subject: Re: UPDATE/OUTPUT Interaction
In-Reply-To: <200412261620.iBQGKcw5008962@listserv.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Howard,
Replying again as the first time I mistakenly sent to just Howard instead of
the list.
The default output statement processing for an UPDATE step is that the
default output only occurs when the "last." condition is true for the last
variable listed in the BY statement. So in your specific example, change:
if first.id or intrans then output up;
to
if last.id then output up;
However I must admit that I am not sure why you used a condition of
"first.id or intrans" in your logic to begin with. Regardless, you want to
use "last." because it will
- only output one obs per unique set of keys (as listed in the by statement)
- show the observations after all the updates have been applied
- handle the case where there are multiple trans observations, regardless of
whether there was an obs in the master data set for that set of keys.
HTH,
-don h
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Howard
Schreier <hs AT dc-sug DOT org>
Sent: Sunday, December 26, 2004 11:21 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: UPDATE/OUTPUT Interaction
The issue involves the result of explicit OUTPUT statements in a DATA step
driven by an UPDATE statement. Here is an example. Start with test data:
data master;
input id $;
cards;
a
a
b
c
c
;
data trans;
input id $ value;
cards;
a 11
a 12
a 13
b 20
;
Now perform UPDATE, with implicit OUTPUT:
data up;
update master trans;
by id;
n = _n_;
run;
Result:
Obs id value n
1 a 13 3
2 a . 4
3 b 20 5
4 c . 6
5 c . 7
Notice that there were 7 iterations before the DATA step terminated, but SAS
knew not to output observations during the first 2 iterations, because the
TRANS observations had not been exhausted for the first BY group. I'm
wondering where that bit of information is kept, and how it might be
inspected.
I constructed the example with multiple observations in MASTER for some BY
groups. Of course that generated warnings in the log. So I want to segregate
those observations. I change my DATA step to:
data up extra;
update master trans(in=intrans);
by id;
if first.id or intrans then output up;
else output extra;
run;
EXTRA materializes as expected:
Obs id value
1 a .
2 c .
But UP now contains observations for the intermediate iterations in the ID=a
group:
Obs id value
1 a 11
2 a 12
3 a 13
4 b 20
5 c .
Apparently the suppression of the first two observations no longer takes
place now that explicit OUTPUT statements are in use. How can the code be
modified to keep those observations from being output? I know it can be done
with pre-processing of either input dataset; I'm looking for a solution
within the single DATA step.