Date: Sat, 26 Mar 2011 18:35:15 -0400
Reply-To: bbser 2009 <bbser2009@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: bbser 2009 <bbser2009@GMAIL.COM>
Subject: Re: subquery return multi-rows, SAS PROC SQL
In-Reply-To: <AC5F423B9AB19C43BE62C86BAF8113CC016157@WABOTH9MSGUSR8C.ITServices.sbc.com>
Content-Type: text/plain; charset="us-ascii"
Bob
Thanks a lot for giving me such a detailed explanation together with the
reference.
I just gave it a brief look and bet that was what I would expect.
I will be reading it later thoroughly.
Be prepared for my second thank-you. :)
Max
-----Original Message-----
From: DUELL, BOB (ATTCINW) [mailto:bd9439@att.com]
Sent: March-26-11 5:14 PM
To: bbser 2009; SAS-L@LISTSERV.UGA.EDU
Subject: RE: subquery return multi-rows, SAS PROC SQL
Hi Max,
I'm not familiar with that book. I hope it isn't an example of a potential
test question. If I found this on a test I'd spend all my time fruitless
arguing with the exam paper (although I suppose they don't use paper any
more).
It's really an SQL question anyway, not SAS. This is an example of what is
called a "correlated query" and although it works as written I personally
wouldn't write it that way (I'd use an EXISTS clause).
Let's look at your original query:
proc sql;
select firstname, lastname
from sasuser.staffmaster
where 'NA'=
(
select jobcategory
from sasuser.supervisors
where staffmaster.empid=supervisors.empid
)
;
quit;
Although you might think that the sub-query is executed separately, it is
really run for each row in staffmaster. In other words, if staffmaster has
1,000 rows then the sub-query is "called" 1,000 times. The "outer" where
clause adds a condition to the sub-query, effectively "where jobcategory =
'NA'". See the "SAS 9.2 SQL Procedure User's Guide" for a much better
explanation. It's in the chapter named "Retrieving Data From Multiple
Tables". I'd send you a link but I've never figured out how to do so with
the new online documentation.
At any rate, I prefer using an EXISTS clause for things like this. The
effect is exactly the same but I find it easier to read. I'm not aware of
any performance differences, although they may exist:
proc sql;
select firstname, lastname
from sasuser.staffmaster
where exists
(
select 1
from sasuser.supervisors
where staffmaster.empid=supervisors.empid
and jobcategory = 'NA'
)
;
quit;
Since your question must be based on datasets supplied with your book, I
made some up to illustrate. It probably doesn't match the design of what
you are using, but I enjoyed making up the names:
data sasuser.staffmaster;
informat firstname lastname $15. empid 5.;
input firstname lastname empid;
datalines;
Mickey Mouse 1
Minnie Mouse 2
Donald Duck 3
Daisy Duck 4
Pluto Pup 5
Mary Poppins 6
Charles Xavier 7
Scott Summers 8
Bobby Drake 9
Warren Worthington 10
Henry McCoy 11
Jean Grey 12
run;
data sasuser.supervisors;
informat super_firstname super_lastname $15. jobcategory $2. empid 5.;
input super_firstname super_lastname jobcategory empid;
datalines;
Walt Disney NA 1
Walt Disney NA 2
Walt Disney NA 3
Walt Disney NA 4
Walt Disney NA 5
Walt Disney NA 6
Stan Lee XM 7
Stan Lee XM 8
Stan Lee XM 9
Stan Lee XM 10
Stan Lee XM 11
Stan Lee XM 12
run;
Anyway, this is probably more than you wanted to read. Forgive my
wordiness; I hate working on weekends.
Hope this helps,
Bob
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of bbser
2009
Sent: Saturday, March 26, 2011 12:02 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: subquery return multi-rows, SAS PROC SQL
Greetings!
The code listed below is from the advanced SAS certification book (P66).
It runs smoothly although, I think, the subquery returns multi-rows, which
contradicts to the rule.
Could you please explain this to me?
Especially, how those two WHERE clauses got executed?
Thank you.
Max
======================================
proc sql;
select firstname, lastname
from sasuser.staffmaster
where 'NA'=
(
select jobcategory
from sasuser.supervisors
where staffmaster.empid=supervisors.empid
)
;
quit;