Date: Mon, 10 Mar 2008 00:40:46 -0400
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: Re: Advice on joining large tables wanted.
Phil,
Your best choice is to use some kind of a hash table. If P7 is a limited
range numeric integer (say, reasonably, between 1 and 50 million for RAM
footprint of 500 Mb), use a key-indexed table as described in, for example,
http://www2.sas.com/proceedings/sugi26/p008-26.pdf
You may need to do some simple additional coding (for the scheme itself is
the simplest) to account for duplicate keys in SMALL. If you have a
difficulty doing that, give us a holler. That is the fastest method, period,
and it will leave the SQL join in dust. That will also work if your key is
not numeric or integer but can be inexpensively converted to one as
described in the above link.
If P7 is a long character key (i.e. longer than 3 bytes) then use the V9
hash object. The latter accepts unique keys only (that will change in V9.2),
so extra code to handle those is needed. You can find it with all the gory
details here:
http://www2.sas.com/proceedings/sugi31/232-31.pdf
For your purposes, filter out everything having to do with POINT= options
from there, since your SMALL is indeed really small. Just consider a hash
table complete with the keys and satellite(s) and concentrate on the variant
allowing to store and harvest duplicate key records.
Kind regards
------------
Paul Dorfman
Jax, FL
------------
On Sun, 9 Mar 2008 18:37:13 -0700, Phil <phillip.anderra@GMAIL.COM> wrote:
>g'day all,
>
>I have two tables, Big & Small that I need to join on the variable P7
>and must keep all the records from Big.
>
>Big has 135m rows, 14 variables, all about $10, with the key variable
>P7 having 60k distinct, and evenly distributed values.
>Small has 170k rows and 2 variables. The key variable P7 is *almost*
>distinct.
>Left Join on P7 looked the obvious choice but on my (average) PC it
>takes approx 2.5 hours to run.
>
>I've been trawling though this ng and have found suggestions on
>improving the run time is situations like this based on a number of
>methods, including; indexing, increasing Bufsize & BufNo, using hash
>joins etc etc and while I realise that there's no 'One size fits all'
>solution would anyone care to give their opinion on a 'best option'
>here to help out a slow-moving 9.1 user?
>
>
>
>cheers
>Phil
|