Date: Tue, 19 Jul 2005 09:04:51 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: can SQL do many-to-many merge
Content-Type: text/plain; charset="us-ascii"
Kiran:
SQL and SAS Data step merges differ in a critical respect. In concept a
SQL join follows a simple "brute force" procedure to combine data from
two tables. (The SQL optimizer may choose a more efficient procedure
that yields an equivalent dataset.)
A SQL join yields the same result as
1. match 1st row of dataset1 to each row of dataset2 and select columns
into dataset3 if condition holds;
2. repeat for 2nd row of dataset1 through last row of dataset1.
The SAS Data step MERGE without a BY statement and with an IN dataset1
and IN dataset2 condition overlays dataset2 over dataset1. Master
Whitlock has explained the logic it follows, but I find it too
convoluted for practical applications. I demonstrate (below) that a
many-to-many MERGE with a BY statement, followed by a SORT nodups, and
for a specific state of data produces the same results as a SQL equijoin
with a DISTINCT qualifier:
data trades;
input date date9. timestamp price Quantity;
cards;
01JAN2004 600 50 2
01JAN2004 660 53 10
02JAN2004 760 54 5
02JAN2004 760 53 3
;
run;
data test;
input date date9. timestamp PreBestBuy PreBestSell PostBestBuy
PostBestSell;
cards;
01JAN2004 600 48 52 50 53
01JAN2004 660 49 54 53 55
01JAN2004 660 49 54 53 55
02JAN2004 760 54 55 56 57
02JAN2004 760 54 55 56 57
02JAN2004 760 54 55 56 57
02JAN2004 760 54 55 56 57
;
run;
/* Would like to get "consolidate" dataset by merging the datasets on
common variables date and timestamp. Desired output "consolidate"
should have the following records:
*/
data consolidate;
input date date9. timestamp price Quantity PreBestBuy PreBestSell
PostBestBuy PostBestSell;
cards;
01JAN2004 600 50 2 48 52 50 53
01JAN2004 660 53 10 49 54 53 55
02JAN2004 760 54 5 54 55 56 57
02JAN2004 760 53 3 54 55 56 57
;
run;
/* To get the above "consolidate" dataset, I am using the following
datastep command:
(BY group added by SWH and IF condition modified)
*/
data consolidate;
merge trades(in=a) test(in=b);
by date timestamp;
if a then output;
keep date timestamp price Quantity PreBestBuy PreBestSell
PostBestBuy postBestSell;
run;
proc sort data =consolidate nodups;
by date timestamp;
quit;
/* SQL join alternative
*/
proc sql;
create table consolidate2 as
select distinct
t1.date,t1.timestamp,t1.price,t1.quantity,t2.PreBestBuy,t2.PreBestSell,t
2.PostBestBuy,t2.postBestSell
from trades as t1 inner join test as t2
on t1.date=t2.date and t1.timestamp=t2.timestamp
;
quit;
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Kiran Kumar
Sent: Tuesday, July 19, 2005 4:38 AM
To: SAS-L@listserv.uga.edu
Subject: can SQL do many-to-many merge
Dear SAS users:
Still trying to find a way out for merging two datasets, both having
multiple records on common variables.
Specifically, I have two datasets, trade and test dataset.
data trades;
input date date9. timestamp price Quantity;
cards;
01JAN2004 600 50 2
01JAN2004 660 53 10
02JAN2004 760 54 5
02JAN2004 760 53 3
;
run;
data test;
input date date9. timestamp PreBestBuy PreBestSell PostBestBuy
PostBestSell; cards; 01JAN2004 600 48 52 50 53 01JAN2004 660 49 54 53 55
01JAN2004 660 49 54 53 55 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55
56 57 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 ; run;
Would like to get "consolidate" dataset by merging the datasets on
common variables date and timestamp. Desired output "consolidate"
should have the following records:
data consolidate;
input date date9. timestamp price Quantity PreBestBuy PreBestSell
PostBestBuy PostBestSell; cards; 01JAN2004 600 50 2 48 52 50 53
01JAN2004 660 53 10 49 54 53 55 02JAN2004 760 54 5 54 55 56 57 02JAN2004
760 53 3 54 55 56 57 ; run;
To get the above "consolidate" dataset, I am using the following
datastep command: data consolidate; merge trades(in=a) test(in=b); if a
and b then output; keep date timestamp price Quantity PreBestBuy
PreBestSell PostBestBuy postBestSell; run; which gives the following
records. 01JAN2004 600 50 2 48 52 50 53 01JAN2004 660 53 10 49 54 53 55
01JAN2004 660 54 5 49 54 53 55 02JAN2004 760 53 3 54 55 56 57
But the output donot match the requirement..as there is mismatching of
rows...for 01Jan2004 (3rd record)
Some of my friends, suggested me to try with SQL many-to-many merge
instead of datastep.
I am sure this task can be done easily in sas, request you to spare
some time.
Thank you,
kk