Date: Mon, 20 Apr 2009 14:58:33 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Memory use of proc sql in inner joins
In-Reply-To: <024bcbf5-00e1-4211-b5e5-ea3b8ea13ba9@c9g2000yqm.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"
I've compared the execution plans and run times of queries with and without nested queries that subset tuples prior to an inner join:
data R1;
do i=1 to 1E6;
key=round(ranuni(208189)*10000,1.);
output;
end;
run;
data R2;
do i=1 to 1E6;
key=round(ranuni(208179)*10000,1.);
output;
end;
run;
proc sql _method _tree _plan;
create table yield as
select R1.key as R1key, R2.key as R2key
from R1 inner join R2
on R1.key=R2.key
where R1.key >= 3000 and R2.key <=7000
;
quit;
proc sql _method _tree _plan;
create table yield as
select R1.key as R1key, R2.key as R2key
from (select * from R1 where R1.key >= 3000)
inner join
(select * from R2 where R2.key <= 7000)
on R1.key=R2.key
;
quit;
Under MS Windows SAS, the execution plans look identical. The SAS SQL compiler "promotes" the WHERE conditions in the first query as specified in the second query. Run times also look two similar to indicate a difference in the execution plans.
You results may vary ... When using earlier versions of Base SAS, I found advantages in optimizing inner joins. Now I don't intervene except in special situations or in a left join, which SAS still doesn't optimize properly.
S
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dornak
Sent: Monday, April 20, 2009 11:54 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Memory use of proc sql in inner joins
Hello,
Someone told me that proc sql uses less memory for inner joins when extra where clauses which affect only one table are included as dataset options. Can anyone tell me if that's true? I was living in the vague hope that SAS optimizes the SQL code for me. I'm using SAS 8.02 in a UNIX environment.
Here comes an example:
Suppose you have two tables you want to join on their primary key nr and get only those lines which have car="VW". I would do it like this:
proc sql;
create table t as
select *
from
adress as a,
contract as v
where
a.nr= v.nr
and
v.car= 'Maybach'
;
quit;
What was said to be more efficient is:
proc sql;
create table t as
select *
from
adress as a,
contract (
where = (
car= 'VW
)
)
as v
where
a.nr= v.nr;
quit;