Date: Tue, 3 Mar 2009 21:03:15 -0500
Reply-To: Ian Whitlock <iw1sas@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <iw1sas@GMAIL.COM>
Subject: Re: Many-to-Many Merge in a data step?
Content-Type: text/plain; charset=ISO-8859-1
Summary: Left "Cartesian" join in a DATA step
#iw-value=1
Bruce,
You cannot use MERGE because it is 1 to 1 within by groups.
Consequently the code gets a bit harder. I added more test cases to
check the code. Here it is.
data reference;
fname="Bruce";lname="Johnson"; zip='75235';newid=134;
output;
fname="Bruce";lname="Johnson"; zip='60056';newid=522;
output;
fname="John";lname="Johnson"; zip='60056';newid=522;
output;
run;
data patient;
fname="Allen";lname="Johnson"; zip='75390';recordid=100;
output;
fname="Bruce";lname="Johnson"; zip='75390';recordid=144;
output;
fname="Bruce";lname="Johnson"; zip='75390';recordid=145;
output;
fname="Jack";lname="Johnson"; zip='75390';recordid=200;
output;
run;
proc sql;
create table test as
select a.*, b.zip as reference_zip, b.newid
from patient a left join reference b
on a.fname=b.fname and a.lname=b.lname
;
select * from test ;
quit;
data test2 ( drop = begpt cnt _:) ;
retain begpt 1 ;
set patient ( in = left ) ;
by fname lname ;
if first.lname then cnt = 0 ;
if left ;
do ptr = begpt to nobs ;
set reference
(rename = (fname=_fname lname=_lname zip=
reference_zip))
point = ptr nobs = nobs ;
if fname=_fname and lname=_lname then
do ;
if first.lname then cnt + 1 ;
output ;
end ;
else
if cnt = 0 and (_fname > fname or (_fname = fname and
_lname > lname))then
do ;
call missing (reference_zip, newid) ;
output ;
leave ;
end ;
end ;
if last.lname then begpt + cnt ;
run ;
proc print data=test2;
run;
As you can see the two data sets have the same observations, but not
have the same order. If you want to guarantee a particular order
then you have to use ORDER BY in SQL and PROC SORT for the DATA
step. I leave it to you to add and specify enough variables to be
able to determine the order.
Ian Whitlock
================
Date: Tue, 3 Mar 2009 09:51:37 -0600
From: bruce johnson <chimanbj@GMAIL.COM>
Subject: Many-to-Many Merge in a data step?
I can do the many-to-many merge in Proc SQL, but I can't understand
why the commented code below won't produce 4 records. It, instead,
produces only 2.
data reference;
fname="Bruce";lname="Johnson"; zip='75235';newid=134;
output;
fname="Bruce";lname="Johnson"; zip='60056';newid=522;
output;
run;
data patient;
fname="Bruce";lname="Johnson"; zip='75390';recordid=144;
output;
fname="Bruce";lname="Johnson"; zip='75390';recordid=145;
output;
run;
proc sql;
create table test as
select a.*, b.zip as reference_zip, b.newid
from patient a left join reference b
on a.fname=b.fname and a.lname=b.lname
;
quit;
/*data test;
merge patient(in=a)
reference(in=b rename=(zip=reference_zip))
;
by fname lname;
if a;
run;*/
proc print data=test;
run;
The SQL output looks like this:
REFERENCE
Obs FNAME LNAME ZIP RECORDID
ZIP
NEWID
1 Bruce Johnson 75390 144
75235 134
2 Bruce Johnson 75390 145
75235 134
3 Bruce Johnson 75390 144
60056 522
4 Bruce Johnson 75390 145
60056 522
The Merge output looks like this:
REFERENCE_
Obs FNAME LNAME ZIP RECORDID
ZIP
NEWID
1 Bruce Johnson 75390 144
75235 134
2 Bruce Johnson 75390 145
60056 522
Any thoughts?