Date: Wed, 9 Mar 2011 15:03:08 -0800
Reply-To: "DUELL, BOB (ATTCINW)" <BD9439@ATT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "DUELL, BOB (ATTCINW)" <BD9439@ATT.COM>
Subject: Re: SQL lookup - more efficient way?
In-Reply-To: <AANLkTi=dR+U-pRj5d+nsUKpvmjY=6yqpjg3sbQOJCFTr@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi Mike,
I'm adding SAS-L to the reply, which you normally would want to do. To your questions:
First, "a" and "b" are just table aliases, the only purpose of which is to save typing. In other words, the "where" clause in the sub-query could have been written without aliases as:
where main.num_x between lookup.beg_num and lookup.end_num
Next, using "select 1" is personal coding quirk. It's just a constant integer one (1). You could just as well say "select 'hello world'" or even "select *". It helps me remember that the sub-query is not really returning any rows and in some databases requires less memory.
Finally, you are correct. The technique I showed will not include values for "beg_num" or "end_num" from your "lookup" table. If those columns are required then you must JOIN the two tables exactly as you have done. Unfortunately, if you must do this in pure ANSI sql then I don't think it can be optimized. The BETWEEN part of your join condition will require a full product join (every row from "main" combined with every row in "lookup"). That can be very resource intensive.
I have a similar reporting problem (a "master" table with 100 million wireless device serial numbers and a 30,000 row "lookup" table with device characteristics by starting and ending serial numbers). My solution was to create a user-written SAS format that returns a row_id that maps back to the "lookup" table. It looks a bit like this:
proc sql;
select *
from main a
join lookup b
on put(a.num_x,lkupfmt.) = b.row_id
order by id;
quit;
Of course, this technique requires that the lookup table not have any overlapping ranges. That may not work for you, but I thought I'd mention it.
Bob
PS - For what it's worth, there SAS formats like this can be "published" to some databases with the appropriate software. The formats get turned into a function. We experimented with doing this with our Teradata system late last year and it worked great. There are a couple "problems" with this solution, but not if you have the full support of your IT organization. Ours was not excited about the additional administrative work. Too bad for us, because running SAS formats inside Teradata was really cool.
-----Original Message-----
From: Michael Murff [mailto:mysasbox@gmail.com]
Sent: Wednesday, March 09, 2011 2:17 PM
To: DUELL, BOB (ATTCINW)
Subject: Re: SQL lookup - more efficient way?
hi Bob,
This is a very interesting technique, really appreciate your instruction. I didn't realize you can invoke elements of "a" in the sub query, assumed that only elements of 'b' could be "passed up" to outer query, but apparently the communication is two-way.
Also, could you clarify I'm not familiar with "select 1" type syntax -- is the short hand for "one record" different than * ?
Finally, I tested the query and it just returns all records but not the ranges (beg_num and end_num). The purpose of the query is to also fetch the associated ranges.
Thanks so much.
Mike
On Wed, Mar 9, 2011 at 2:43 PM, DUELL, BOB (ATTCINW) <BD9439@att.com> wrote:
Try a correlated query instead of a join:
proc sql;
select *
from main a
where exists (
select 1 from lookup b
where a.num_x between b.beg_num and b.end_num
)
order by id;
quit;
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Michael Murff
Sent: Wednesday, March 09, 2011 10:35 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SQL lookup - more efficient way?
** Hi sas-l;
** my real tables are quite large;
** anybody know how to optimize this using ansi standard SQL syntax? ;
data main;
do id=1 to 10;
num_x=round(100*uniform(123),1); output;
end;
run;
data lookup;
beg_num=1; end_num=10; output;
beg_num=11; end_num=20; output;
beg_num=21; end_num=30; output;
run;
proc sql;
select * from main, lookup
where num_x between beg_num and end_num
order by id;
quit;
endsas;
** thank you in advance;
** Regards, Mike;