|
Craig:
The behavior, in a many-to-many merge, in which SAS propogates the
last record in a BY-group from one dataset to all the extra records
of the same BY-group in the other dataset, has been documented by
SAS since day one.
I agree that this is not necessarily intuitive, but it is a sort of
extension of what SAS does with a many-to-one merge. And that, in
turn, is a manifestation of the fundamental SAS principle of
automatically RETAINing all variables read via a MERGE (or SET or
UPDATE) statement.
So I don't agree that the behavior is "wrong". And it does provide
the user a great deal of flexibility in looking at the preceding
record anytime before a MERGE or SET statement appears in a DATA
step.
To learn whether that was an intentional design feature, or an outcome
of other design goals, you'd have to wait for a little birdie to peep.
But this does make me think that what we really want is a NORETAIN
statement. Then you could do something like this:
data c;
merge a b;
by code;
noretain manager assistant ;
** Manager is from dataset A, assistant from B **;
run;
And more generally, it would let you tell SAS to apply the NORETAIN
behavior to just a subset of the variables from A and/or B. You
could for instance NORETAIN the manager but retain the DEPARTMENT,
is such a variable appeared in A or B.
Regards,
Mark
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Craig Johnson
> Sent: Tuesday, December 08, 2009 10:33 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: about full join problem
>
> I'll heap some praise on someone if they can figure this out because
> I'm
> convinced it's impossible. Frankly, I'm confused on how one knows if
> the
> original data step merge is correct. If you look at dataset A you have
> code
> 145 for max and xam. In dataset B you have 145 for jerry, tracy, and
> wade.
> After the merge they end up with max with jerry and xam with tracy and
> wade. How is max only being grouped with jerry while xam is grouped
> with
> two? IE how does SAS know that xam has two and max only has one? IMO
> there
> is no way for SAS how to know what to merge what manager together with
> what
> assistant. Here is an example, if you add another case in for code 145
> in
> dataset B (145 bob in line one of the cards) it is grouped with Max and
> Jerry is now grouped with xam. To me that indicates that SAS is
> merging the
> first 145 in A with the first 145 in B and then merging the rest of the
> 145s
> in B with the second 145 in A. In other words, the example was a fluke
> and
> the merge that is wanted can't be done because there is no PK/FK
> relationship between manager and assistant. Instead a secondary FK
> (code)
> is being improperly used to try and do a full join on a many-to-many
> relationship. In which case, the Cartesian product is technically
> correct.
|