Date: Wed, 5 Jul 2000 16:45:19 GMT
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
Organization: Deja.com - Before you buy.
Subject: Re: "PROC SQL", "PROC SORT + MERGE"
In kmself@IX.NETCOM.COM wrote:
> On Tue, Jul 04, 2000 at 05:08:43PM +0800, wong wrote:
> > Suppose I have 2 datasets with 1 common key, I can merge these 2
> > datasets by
> > 1) PROC SQL
> > or
> > 2) Use PROC SORT first, then use MERGE
> > If both datasets are very large, is it better to use SQL ? (The
> > purpose is to use less virtual memory).
> How large is large? =20
> - How many records and rows per dataset?
> - What OS are you running on? What hardware limitations (disk size,
> available memory)?
> - Have you tried running the job yet?
> Default action of both the SORT/MERGE and SQL methods is roughly
> equivalent in terms of memory usage. SAS doesn't (usually) read an
> entire dataset to memory, you shouldn't see an appreciable difference
> here. More typically, clock time, cpu time, and working set space
> (SAS WORK storage) are limiting factors for large merges and/or joins.
If 1 or less of the datasets has duplicates of the 'BY' (Merge) Variable
the difference between using PROC SQL and PROC SORT and a datastep MERGE
is quite small. If you refer to the programming effiencies book (and
mine is packed away) - for large datasets (>500K obs), the PROC SQL will
generally and slowly become more efficient than the SORT & MERGE method.
However, I believe that external sort programs (aka SyncSort) may be
used to push that out past 1M records.
Other methods, such as hashing, etc can be used, and are probably ideal
for cases where you are merging only to 'add' a limited amount of data
(small number of variables) to the 'master' dataset.
Generally speaking, a PROC SQL 'merge' doesn't have any secret weapon to
improve the efficiency of the merge, over a SORT and MERGE method. If
it were as simple as that, SAS would create a 'fix' (like perhaps a
PROC MERGE) or clearly document the improved performances of using a
PROC SQL for the merge instead of a Data Step merge.
Sent via Deja.com http://www.deja.com/
Before you buy.