LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2000, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 5 Jul 2000 16:45:19 GMT
Reply-To:     amichiel@EARTHLING.NET
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         amichiel@EARTHLING.NET
Organization: Deja.com - Before you buy.
Subject:      Re: "PROC SQL", "PROC SORT + MERGE"

In kmself@IX.NETCOM.COM wrote: > On Tue, Jul 04, 2000 at 05:08:43PM +0800, wong wrote: > > Suppose I have 2 datasets with 1 common key, I can merge these 2 > > datasets by > > 1) PROC SQL > > or > > 2) Use PROC SORT first, then use MERGE > > If both datasets are very large, is it better to use SQL ? (The > > purpose is to use less virtual memory). > > How large is large? =20 > - How many records and rows per dataset? > - What OS are you running on? What hardware limitations (disk size, > available memory)? > - Have you tried running the job yet? > > Default action of both the SORT/MERGE and SQL methods is roughly > equivalent in terms of memory usage. SAS doesn't (usually) read an > entire dataset to memory, you shouldn't see an appreciable difference > here. More typically, clock time, cpu time, and working set space > (SAS WORK storage) are limiting factors for large merges and/or joins.

If 1 or less of the datasets has duplicates of the 'BY' (Merge) Variable the difference between using PROC SQL and PROC SORT and a datastep MERGE is quite small. If you refer to the programming effiencies book (and mine is packed away) - for large datasets (>500K obs), the PROC SQL will generally and slowly become more efficient than the SORT & MERGE method. However, I believe that external sort programs (aka SyncSort) may be used to push that out past 1M records.

Other methods, such as hashing, etc can be used, and are probably ideal for cases where you are merging only to 'add' a limited amount of data (small number of variables) to the 'master' dataset.

Generally speaking, a PROC SQL 'merge' doesn't have any secret weapon to improve the efficiency of the merge, over a SORT and MERGE method. If it were as simple as that, SAS would create a 'fix' (like perhaps a PROC MERGE) or clearly document the improved performances of using a PROC SQL for the merge instead of a Data Step merge.

Sent via Deja.com http://www.deja.com/ Before you buy.


Back to: Top of message | Previous page | Main SAS-L page