LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 2 Sep 2008 11:09:09 -0700
Reply-To:     Jay Soule <jay.soule@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Jay Soule <jay.soule@GMAIL.COM>
Subject:      Re: Hash lookup not quicker than SQL inner join?
In-Reply-To:  <98634D42B37B2E4E96CF3A3BD3CD9AE60254E93F@EX1VS2.nyced.org>
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Sep 2, 2008 at 9:00 AM, Bucher Scott <SBucher@schools.nyc.gov> wrote: > You could try the "PROC SQL _METHOD;" or "PROC SQL _TREE;" options to > get more details on how SQL is actually performing the joins. In some > cases, SQL actually performs a hash join. I believe there are one or two > conference papers on this topic and the aforementioned options.

Right - if the note in the SAS log after the SQL step shows that the method used was "sqlxjhsh" then SAS used a hash join.

- Jay Soule

> Sander Burggraaff > Sent: Tuesday, September 02, 2008 11:42 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Hash lookup not quicker than SQL inner join? > > I always assumed that a hash lookup would be a quicker way to join two > tables than a SQL join. But I recently did some testing and the > results were not what I expected. > > Let's say we have two tables: Table_1 and Table_2. These two tables > have 4 variables in common: B, C, D and E. B is a character variable > with a length of 20 and the other 3 are all numeric with a length of > 8. Table_1 also has a variable called A1 and Table_2 has a variable > called A2. > > Of course the variables B, C, D and E in both tables are keys we want > to join the tables on. The result table will hold the variables A1 and > A2. > > Table_1 holds the same variables as Table_2. At first we tried > 25.000.000 observations but we ran out of memory when using the Hash > lookup. Apparently the 1.6Gb memory available was not enough to store > the hash table in. So we lowered the amount of observations to > 20.000.000. > > To our surprise the inner join performed just as quickly as the hash > lookup. I expected SAS to start sorting when using the inner join but > apparently it doesn't. During some tests it even looked like the inner > join was quicker than the hash lookup. > > Can anyone explain why the inner join is just as quick? No sorting? > Why? > > I searched this discussion group but couldn't find any postings about > it. Have older posts been deleted? >


Back to: Top of message | Previous page | Main SAS-L page