Date: Tue, 8 Nov 2005 10:14:26 -0500
Reply-To: "Michael S. Zdeb" <msz03@HEALTH.STATE.NY.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Michael S. Zdeb" <msz03@HEALTH.STATE.NY.US>
Subject: Re: How to make repeats in merge-by-group to work?
In-Reply-To: <1131450900.655635.242990@g43g2000cwa.googlegroups.com>
Content-type: text/plain; charset=US-ASCII
Hi...I agree with Toby Dunn...use PROC SQL. But, for those of us who used
SAS prior to V6, there was no PROC SQL and many-to-many merges still had to
be done with a data step. For those of us who remember NESUG '88 (or for
that matter, 1988), the proceedings contain a paper by Robina Thornton and
Merry Rabb on "Advanced SET and MERGE Processing" that shows how to do
many-to-many merging in a data step merge. The method does what PROC SQL
does in the background, i.e. it ensures that all observations in data sets
one and two with the same ID "see" each other at some point in the data
step processing. So, here it is (from NESUG '88) with the posted data and
it does give the desired result...
data one;
input id $ fruit $;
datalines;
a apple
a apple
b banana
c coconut
c coconut
c coconut
;
proc sort data=one;
by id;
run;
data two;
input id $ color $;
datalines;
a amber
b brown
b black
c cocoa
c cream
;
proc sort data=two;
by id;
run;
* create POINTERS for both data sets - reduces the data step merge to
one-to-one merging;
data one_point (keep=id one_start one_end);
retain one_start;
set one;
by id;
if first.id then one_start = _n_;
if last.id then do;
one_end = _n_;
output;
end;
run;
data two_point (keep=id two_start two_end);
retain two_start;
set two;
by id;
if first.id then two_start = _n_;
if last.id then do;
two_end = _n_;
output;
end;
run;
* merge the POINTER data sets;
* use the point information to cycle through observations with matching
IDs;
data one_two (keep=id fruit color);
merge one_point two_point;
by id;
do i = one_start to one_end;
set one point=i;
do j = two_start to two_end;
set two point=j;
output;
end;
end;
run;
proc print data=one_two;
run;
Gives...
Obs id fruit color
1 a apple amber
2 a apple amber
3 b banana brown
4 b banana black
5 c coconut cocoa
6 c coconut cream
7 c coconut cocoa
8 c coconut cream
9 c coconut cocoa
10 c coconut cream
PS Another posted solution...
DATA test(drop=id2 );
set one;
do i=1 to nn;
set two(rename=(id=id2)) nobs=nn point=i;
if id=id2 then output;
end;
RUN;
also works but requires a complete pass through data set two for each
observation in data set one rather than only looking within observations
with the same ID.
Mike Zdeb
U@Albany School of Public Health
1 University Drive
Rensselaer, NY 12144-3456
(P)518-402-6479
(F)630-604-1475
Anonymous user
<Number_42@CARAMA
IL.COM> To
Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU
Discussion" cc
<SAS-L@LISTSERV.U
GA.EDU> Subject
How to make repeats in
merge-by-group to work?
11/08/2005 06:55
AM
Please respond to
Anonymous user
<Number_42@CARAMA
IL.COM>
Hello,
i have some datasets with repeats in the by group and it's not a
problem. I would like SAS to handle it properly. Let's start from what
SAS do:
DATA one;
INPUT id $ fruit $;
DATALINES;
a apple
a apple
b banana
c coconut
c coconut
c coconut
;
PROC SORT data=one;
BY id;
RUN;
DATA two;
INPUT id $ color $;
DATALINES;
a amber
b brown
b black
c cocoa
c cream
;
PROC SORT data=two;
BY id;
RUN;
DATA test;
MERGE one two;
BY id;
RUN;
PROC PRINT;
RUN;
/*
Obs id fruit color
1 a apple amber
2 a apple amber
3 b banana brown
4 b banana black
5 c coconut cocoa
6 c coconut cream
7 c coconut cream
*/
Now what i need is that:
1 a apple amber
2 a apple amber
3 b banana brown
4 b banana black
5 c coconut cocoa
6 c coconut cocoa
7 c coconut cocoa
8 c coconut cream
9 c coconut cream
10 c coconut cream
It migth sound useless here but in my real data, i have other columns
which have different values but are not keys.
I know how to do this with a proc sql, but is it possible to do it in
"pure SAS"?
TIA