Date: Tue, 16 Apr 2002 17:33:45 -0400
Reply-To: "Braten, Michael (Exchange)" <mbraten@BEAR.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Braten, Michael (Exchange)" <mbraten@BEAR.COM>
Subject: Re: Proc merge data set order
Content-Type: text/plain
my 2 cents:
The rule is that the dataset mentioned first ( or left) is updated by the
one mentioned second ( or right ).
I would go with the first.
You may achieve more efficiency if you use a IF statement to control
events in the Merge.
What do you want to emerge in the newly created dataset?
If you want the 1 gig to emerge then add :
DATA combined;
MERGE IN1.SUMMARY (IN=A keep= ssn fundgrp scrssn )
IN6.SSN (IN=B);
By ssn;
IF A ; * <=== ;
RUN;
This way you get the updates and the non-updates.
If you want only the updated obs add : IF A AND B;
You get only the updated records.
> -----Original Message-----
> From: Wagner, Eric J. (V15) [SMTP:Eric.Wagner@MED.VA.GOV]
> Sent: Tuesday, April 16, 2002 3:54 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Proc merge data set order
>
> Greetings,
>
> When performing a PROC MERGE, where one dataset is much larger than the
> other, is it faster to list the large dataset first or second? I am
> trying
> to minimize CPU cycles used and execution time. For example:
>
> DATA combined;
> MERGE IN1.SUMMARY (IN=A keep=fundgrp scrssn ) IN6.SSN (IN=B);
> By ssn;
> RUN;
>
> or
>
> DATA combined;
> MERGE IN6.SCRSSN (IN=B) IN1.SUMMARY (IN=A keep=fundgrp scrssn );
> By ssn;
> RUN;
>
> where IN1.SUMMARY is on the order of 1GB in size and IN6.SSN is about
> 200k.
>
>
>
> Thank you,
> Eric Wagner
***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.
***********************************************************************