|
Dimitri and Mark:
Use of a HAVING clause for subsetting of rows in a dataset dates back in
SAS-L at least to 10/1996. Apparently independent responses by Ian
Whitlock and Bernard Tremblay suggest that others knew how to use the
method even earlier.
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind9610D&L=sas-l&P=R10401&D=1&
H=0&O=D&T=1
I recall posting solutions to several 'Can you do this in SQL?'
challenges during that era. I'd call the 'HAVING x=min(x)' a method
somewhat specific to SAS SQL. Depending on the selection of columns, it
may require remerging.
So how many different methods potentially reduce the cardinality (number
of rows) in the yield of a SAS SQL query? Why different methods?
ON x ----> required in INNER and OUTER joins
and likely related to query optimizing using indexes.
WHERE x ..... ----> older form of join operator
..... x in (.....) -----> in set constraint
exists(....) -----> correlated
subquery
HAVING <summary> <op> <value> ---> typical constraint on groups
according to summary count, min/max, mean, etc.
x=<summary>(x) ---> row constraint within group.
Although a ON/WHERE condition logically precedes a HAVING condition, the
SAS SQL compiler may delay imposition of a constraint until it binds
values to symbolic references.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Terjeson, Mark
Sent: Monday, December 04, 2006 10:26 AM
To: Dimitri Shvorob; SAS-L@LISTSERV.UGA.EDU
Subject: RE: There's something about HAVING (Re: Selecting nearest value
in time)
Hi,
You've guess right. The HAVING is like the
WHERE in that it chops out rows (or keeps rows
if you prefer). i.e. the where criteria when
true keeps rows but the overall where task is
that it is subsetting your original data which
is chopping out rows.
The WHERE occurs *before* the SELECT, and the
HAVING occurs *after* the select. This is handy
when you want to subset the original data using
the WHERE and possibly filter some records after
a roll up as been performed.
The other major difference is that you can
incorporated aggregating functions into the
HAVING clause and not in the WHERE clause. This
allows to perform some aggregating and then you
can match row values to it when needed.
So your deducing is spot on.
Hope this is helpful.
Mark Terjeson
Senior Programmer Analyst, IM&R
Russell Investment Group
Russell
Global Leaders in Multi-Manager Investing
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Dimitri Shvorob
Sent: Sunday, December 03, 2006 2:14 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: There's something about HAVING (Re: Selecting nearest value in
time)
When I came across the SQL-based solution proposed by Arthur in
http://listserv.uga.edu/cgi-bin/wa?A2=ind0611B&L=sas-l&P=R23484
I doubted that it would work. Apart from misgivings about grouping and
joining in a single query, I thought that HAVING could only be used to
select groups (here, IDs), whereas in Art's code, the clause plainly
filters individual records (identified by ID and time). (Consider the
following excerpt from p.13 of PROC SQL user's guide: 'The HAVING clause
works with the GROUP BY clause to restrict the groups in a query's
results based on a given condition. PROC SQL applies the HAVING
condition after grouping the data and applying aggregate functions'). To
my surprise, test code paralleling the suspect query (see below; I match
nearest Saturday and Sunday dates) did work! Am I to conclude that PROC
SQL's HAVING is basically a WHERE allowed to reference group aggregates?
(Group minimum, in this instance).
Thank you.
data sa su;
do id = 1 to 3;
do i = 1 to 50;
date = intnx('day','1jan2006'd,i);
if weekday(date) = 7 then output sa;
if weekday(date) = 1 then output su;
end;
end;
format date date9.;
run;
proc sql;
create table sasu as
select distinct sa.id, sa.date as sadate, su.date as sudate from sa join
su on sa.id = su.id group by sa.id having abs(sa.date - su.date) =
min(abs(sa.date - su.date)); quit;
|