|
Call missing can be made to work, but not as simply as I would like,
because call missing, as an operational statement, would have to
be invoked before the MERGE statement. But, at that point, the SAS
compilier doesn't know the attributes of variables in the CALL
MISSING statement.
As a result, the following DATA step fails:
data c;
call missing (of _all_);
merge a b;
by code'
run;
and generates the error message
"ERROR 252-185: The MISSING subroutine call does not have enough arguments."
Yes, you could work around by using:
data c;
if 0 then set a b;
call missing (of _all_);
merge a b;
by code;
run;
It works, but does not have the simple directness that we have when
using the RETAIN statement. RETAIN (and my proposed NORETAIN) can be
placed before explicit attributes of the variable are declared, but
still works. For instance the following works:
data c;
retain z; ** Z could be numeric or character **;
merge a b;
by code;
z=put(_n_,z6.); ** Now we tell SAS Z is character **;
run;
> -----Original Message-----
> From: Data _null_; [mailto:iebupdte@gmail.com]
> Sent: Wednesday, December 09, 2009 11:00 AM
> To: Keintz, H. Mark
> Cc: SAS-L@listserv.uga.edu
> Subject: Re: Do we need a NORETAIN statement? was: about full join
> problem
>
> Does CALL MISSING fill the need?
>
> On 12/9/09, Keintz, H. Mark <mkeintz@wharton.upenn.edu> wrote:
> > Craig:
> >
> > The behavior, in a many-to-many merge, in which SAS propogates the
> > last record in a BY-group from one dataset to all the extra records
> > of the same BY-group in the other dataset, has been documented by
> > SAS since day one.
> >
> > I agree that this is not necessarily intuitive, but it is a sort of
> > extension of what SAS does with a many-to-one merge. And that, in
> > turn, is a manifestation of the fundamental SAS principle of
> > automatically RETAINing all variables read via a MERGE (or SET or
> > UPDATE) statement.
> >
> > So I don't agree that the behavior is "wrong". And it does provide
> > the user a great deal of flexibility in looking at the preceding
> > record anytime before a MERGE or SET statement appears in a DATA
> > step.
> >
> > To learn whether that was an intentional design feature, or an
> outcome
> > of other design goals, you'd have to wait for a little birdie to
> peep.
> >
> >
> > But this does make me think that what we really want is a NORETAIN
> > statement. Then you could do something like this:
> >
> > data c;
> > merge a b;
> > by code;
> > noretain manager assistant ;
> > ** Manager is from dataset A, assistant from B **;
> > run;
> >
> > And more generally, it would let you tell SAS to apply the NORETAIN
> > behavior to just a subset of the variables from A and/or B. You
> > could for instance NORETAIN the manager but retain the DEPARTMENT,
> > is such a variable appeared in A or B.
> >
> >
> > Regards,
> > Mark
> >
> >
> >
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf
> Of
> > > Craig Johnson
> > > Sent: Tuesday, December 08, 2009 10:33 PM
> > > To: SAS-L@LISTSERV.UGA.EDU
> > > Subject: Re: about full join problem
> > >
> > > I'll heap some praise on someone if they can figure this out
> because
> > > I'm
> > > convinced it's impossible. Frankly, I'm confused on how one knows
> if
> > > the
> > > original data step merge is correct. If you look at dataset A you
> have
> > > code
> > > 145 for max and xam. In dataset B you have 145 for jerry, tracy,
> and
> > > wade.
> > > After the merge they end up with max with jerry and xam with tracy
> and
> > > wade. How is max only being grouped with jerry while xam is
> grouped
> > > with
> > > two? IE how does SAS know that xam has two and max only has one?
> IMO
> > > there
> > > is no way for SAS how to know what to merge what manager together
> with
> > > what
> > > assistant. Here is an example, if you add another case in for code
> 145
> > > in
> > > dataset B (145 bob in line one of the cards) it is grouped with Max
> and
> > > Jerry is now grouped with xam. To me that indicates that SAS is
> > > merging the
> > > first 145 in A with the first 145 in B and then merging the rest of
> the
> > > 145s
> > > in B with the second 145 in A. In other words, the example was a
> fluke
> > > and
> > > the merge that is wanted can't be done because there is no PK/FK
> > > relationship between manager and assistant. Instead a secondary FK
> > > (code)
> > > is being improperly used to try and do a full join on a many-to-
> many
> > > relationship. In which case, the Cartesian product is technically
> > > correct.
> >
|