LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2009, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 9 Dec 2009 10:56:14 -0500
Reply-To:   "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject:   Do we need a NORETAIN statement? was: about full join problem
Comments:   To: Craig Johnson <cjohns38@GMAIL.COM>
In-Reply-To:   <>
Content-Type:   text/plain; charset="us-ascii"


The behavior, in a many-to-many merge, in which SAS propogates the last record in a BY-group from one dataset to all the extra records of the same BY-group in the other dataset, has been documented by SAS since day one.

I agree that this is not necessarily intuitive, but it is a sort of extension of what SAS does with a many-to-one merge. And that, in turn, is a manifestation of the fundamental SAS principle of automatically RETAINing all variables read via a MERGE (or SET or UPDATE) statement.

So I don't agree that the behavior is "wrong". And it does provide the user a great deal of flexibility in looking at the preceding record anytime before a MERGE or SET statement appears in a DATA step.

To learn whether that was an intentional design feature, or an outcome of other design goals, you'd have to wait for a little birdie to peep.

But this does make me think that what we really want is a NORETAIN statement. Then you could do something like this:

data c; merge a b; by code; noretain manager assistant ; ** Manager is from dataset A, assistant from B **; run;

And more generally, it would let you tell SAS to apply the NORETAIN behavior to just a subset of the variables from A and/or B. You could for instance NORETAIN the manager but retain the DEPARTMENT, is such a variable appeared in A or B.

Regards, Mark

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Craig Johnson > Sent: Tuesday, December 08, 2009 10:33 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: about full join problem > > I'll heap some praise on someone if they can figure this out because > I'm > convinced it's impossible. Frankly, I'm confused on how one knows if > the > original data step merge is correct. If you look at dataset A you have > code > 145 for max and xam. In dataset B you have 145 for jerry, tracy, and > wade. > After the merge they end up with max with jerry and xam with tracy and > wade. How is max only being grouped with jerry while xam is grouped > with > two? IE how does SAS know that xam has two and max only has one? IMO > there > is no way for SAS how to know what to merge what manager together with > what > assistant. Here is an example, if you add another case in for code 145 > in > dataset B (145 bob in line one of the cards) it is grouped with Max and > Jerry is now grouped with xam. To me that indicates that SAS is > merging the > first 145 in A with the first 145 in B and then merging the rest of the > 145s > in B with the second 145 in A. In other words, the example was a fluke > and > the merge that is wanted can't be done because there is no PK/FK > relationship between manager and assistant. Instead a secondary FK > (code) > is being improperly used to try and do a full join on a many-to-many > relationship. In which case, the Cartesian product is technically > correct.

Back to: Top of message | Previous page | Main SAS-L page