LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (July 2005, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 19 Jul 2005 09:04:51 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: can SQL do many-to-many merge
Comments: To: Kiran Kumar <kk.chikky@gmail.com>
Content-Type: text/plain; charset="us-ascii"

Kiran: SQL and SAS Data step merges differ in a critical respect. In concept a SQL join follows a simple "brute force" procedure to combine data from two tables. (The SQL optimizer may choose a more efficient procedure that yields an equivalent dataset.)

A SQL join yields the same result as 1. match 1st row of dataset1 to each row of dataset2 and select columns into dataset3 if condition holds; 2. repeat for 2nd row of dataset1 through last row of dataset1.

The SAS Data step MERGE without a BY statement and with an IN dataset1 and IN dataset2 condition overlays dataset2 over dataset1. Master Whitlock has explained the logic it follows, but I find it too convoluted for practical applications. I demonstrate (below) that a many-to-many MERGE with a BY statement, followed by a SORT nodups, and for a specific state of data produces the same results as a SQL equijoin with a DISTINCT qualifier:

data trades; input date date9. timestamp price Quantity; cards; 01JAN2004 600 50 2 01JAN2004 660 53 10 02JAN2004 760 54 5 02JAN2004 760 53 3 ; run; data test; input date date9. timestamp PreBestBuy PreBestSell PostBestBuy PostBestSell; cards; 01JAN2004 600 48 52 50 53 01JAN2004 660 49 54 53 55 01JAN2004 660 49 54 53 55 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 ; run;

/* Would like to get "consolidate" dataset by merging the datasets on common variables date and timestamp. Desired output "consolidate" should have the following records: */ data consolidate; input date date9. timestamp price Quantity PreBestBuy PreBestSell PostBestBuy PostBestSell; cards; 01JAN2004 600 50 2 48 52 50 53 01JAN2004 660 53 10 49 54 53 55 02JAN2004 760 54 5 54 55 56 57 02JAN2004 760 53 3 54 55 56 57 ; run;

/* To get the above "consolidate" dataset, I am using the following datastep command: (BY group added by SWH and IF condition modified) */ data consolidate; merge trades(in=a) test(in=b); by date timestamp; if a then output; keep date timestamp price Quantity PreBestBuy PreBestSell PostBestBuy postBestSell; run; proc sort data =consolidate nodups; by date timestamp; quit; /* SQL join alternative */ proc sql; create table consolidate2 as select distinct t1.date,t1.timestamp,t1.price,t1.quantity,t2.PreBestBuy,t2.PreBestSell,t 2.PostBestBuy,t2.postBestSell from trades as t1 inner join test as t2 on t1.date=t2.date and t1.timestamp=t2.timestamp ; quit;

Sig -----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Kiran Kumar Sent: Tuesday, July 19, 2005 4:38 AM To: SAS-L@listserv.uga.edu Subject: can SQL do many-to-many merge

Dear SAS users:

Still trying to find a way out for merging two datasets, both having multiple records on common variables.

Specifically, I have two datasets, trade and test dataset. data trades; input date date9. timestamp price Quantity; cards; 01JAN2004 600 50 2 01JAN2004 660 53 10 02JAN2004 760 54 5 02JAN2004 760 53 3 ; run; data test; input date date9. timestamp PreBestBuy PreBestSell PostBestBuy PostBestSell; cards; 01JAN2004 600 48 52 50 53 01JAN2004 660 49 54 53 55 01JAN2004 660 49 54 53 55 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 02JAN2004 760 54 55 56 57 ; run;

Would like to get "consolidate" dataset by merging the datasets on common variables date and timestamp. Desired output "consolidate" should have the following records: data consolidate; input date date9. timestamp price Quantity PreBestBuy PreBestSell PostBestBuy PostBestSell; cards; 01JAN2004 600 50 2 48 52 50 53 01JAN2004 660 53 10 49 54 53 55 02JAN2004 760 54 5 54 55 56 57 02JAN2004 760 53 3 54 55 56 57 ; run;

To get the above "consolidate" dataset, I am using the following datastep command: data consolidate; merge trades(in=a) test(in=b); if a and b then output; keep date timestamp price Quantity PreBestBuy PreBestSell PostBestBuy postBestSell; run; which gives the following records. 01JAN2004 600 50 2 48 52 50 53 01JAN2004 660 53 10 49 54 53 55 01JAN2004 660 54 5 49 54 53 55 02JAN2004 760 53 3 54 55 56 57

But the output donot match the requirement..as there is mismatching of rows...for 01Jan2004 (3rd record)

Some of my friends, suggested me to try with SQL many-to-many merge instead of datastep.

I am sure this task can be done easily in sas, request you to spare some time.

Thank you,

kk


Back to: Top of message | Previous page | Main SAS-L page