LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 10 Mar 2008 00:40:46 -0400
Reply-To:     Paul Dorfman <sashole@BELLSOUTH.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <sashole@BELLSOUTH.NET>
Subject:      Re: Advice on joining large tables wanted.
Comments: To: phillip.anderra@GMAIL.COM

Phil,

Your best choice is to use some kind of a hash table. If P7 is a limited range numeric integer (say, reasonably, between 1 and 50 million for RAM footprint of 500 Mb), use a key-indexed table as described in, for example,

http://www2.sas.com/proceedings/sugi26/p008-26.pdf

You may need to do some simple additional coding (for the scheme itself is the simplest) to account for duplicate keys in SMALL. If you have a difficulty doing that, give us a holler. That is the fastest method, period, and it will leave the SQL join in dust. That will also work if your key is not numeric or integer but can be inexpensively converted to one as described in the above link.

If P7 is a long character key (i.e. longer than 3 bytes) then use the V9 hash object. The latter accepts unique keys only (that will change in V9.2), so extra code to handle those is needed. You can find it with all the gory details here:

http://www2.sas.com/proceedings/sugi31/232-31.pdf

For your purposes, filter out everything having to do with POINT= options from there, since your SMALL is indeed really small. Just consider a hash table complete with the keys and satellite(s) and concentrate on the variant allowing to store and harvest duplicate key records.

Kind regards ------------ Paul Dorfman Jax, FL ------------

On Sun, 9 Mar 2008 18:37:13 -0700, Phil <phillip.anderra@GMAIL.COM> wrote:

>g'day all, > >I have two tables, Big & Small that I need to join on the variable P7 >and must keep all the records from Big. > >Big has 135m rows, 14 variables, all about $10, with the key variable >P7 having 60k distinct, and evenly distributed values. >Small has 170k rows and 2 variables. The key variable P7 is *almost* >distinct. >Left Join on P7 looked the obvious choice but on my (average) PC it >takes approx 2.5 hours to run. > >I've been trawling though this ng and have found suggestions on >improving the run time is situations like this based on a number of >methods, including; indexing, increasing Bufsize & BufNo, using hash >joins etc etc and while I realise that there's no 'One size fits all' >solution would anyone care to give their opinion on a 'best option' >here to help out a slow-moving 9.1 user? > > > >cheers >Phil


Back to: Top of message | Previous page | Main SAS-L page