Date: Sat, 10 Jul 1999 01:55:34 -0400
Reply-To: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Organization: KInPH
Subject: Re: (MVS) Re:FIRST. and LAST. in PROC SQL
Content-Type: text/plain; charset=koi8-r
Nancy and Mike,
If Ray's original question was about emulating a control break during the SQL
execution, Sig and Howard have already answered it. If it was: "How to make SAS
use FIRST. and LAST. if the file has been ordered by SQL instead of SORT?", then
I must be missing something. The ability of SAS to create the FIRST. and LAST.
automatics, as well as its ability to process BY-groups in procs does not depend
in the least on what the dataset has in its descriptor. It depends only on the
actual physical sequence of the incoming keys specified in the BY clause. SAS
validates the order of the keys in the process of its invisible control break
while the dataset is being read in, and if the order checks OK, creates the
FIRST. and LAST. for every record. This process is performed absolutely
regardless of whether SORTEDBY is sitting in the descriptor or not, because the
only way to do a control break is to compare the next key with the previous one.
Besides, when SORTEDBY is forced in the descriptor via either DATASETS or the
data set option, SAS does NOT validate the sequence of the key variables. In
fact, one could write SORTEDBY in the header of a totally disordered SAS data
file, and SAS would not care less:
67 OPTION NOCENTER;
68 DATA A (SORTEDBY=KEY);
69 DO KEY = 4,4,4,2,2,1,3,3; OUTPUT; END;
70 RUN;
NOTE: The data set WORK.A has 8 observations and 1 variables.
NOTE: The DATA statement used 0.25 seconds.
71 PROC CONTENTS; RUN;
# Variable Type Len Pos
-----------------------------------
1 KEY Num 8 0
-----Sort Information-----
Sortedby: KEY
Validated: NO
Yet try to tell SAS "process A as sorted" by using BY KEY, and it is less than
welcome:
72 DATA B; SET A; BY KEY; IF FIRST.KEY; RUN;
ERROR: BY variables are not properly sorted on data set WORK.A.
KEY=4 FIRST.KEY=0 LAST.KEY=0 _ERROR_=1 _N_=3
However, if you do have the file physically sorted, SAS would not have the
problem with the BY without any SORTEDBY:
73 DATA A;
74 DO KEY = 1,2,2,3,3,4,4,4; OUTPUT; END;
75 RUN;
NOTE: The data set WORK.A has 8 observations and 1 variables.
76 DATA B; SET A; BY KEY; IF FIRST.KEY; RUN;
NOTE: The data set WORK.B has 4 observations and 1 variables.
And then, do we have a problem using BY-processing against sorted flat files
accessed via views without SORTEDBY present? Of course not! All in all, SORTEDBY
is needed for one only purpose: To let PROC SORT know that input is already
sorted, so it should not waste resources. And PROC SORT obeys blindly, for with
SORTEDBY=, it will not sort, no matter whether the dataset is actually ordered
or not and regardless of VALIDATE=YES or NO.
Back to Ray's SQL, after it has executed with ORDER BY, the output is sorted,
and, as Howard has noted, that is all we need for BY-processing. On a different
note, SQL with ORDER BY clause DOES write the same exact information in the
header as PROC SORT does, that is, SORTEDBY=KEY and VALIDATE=YES, so even if
SORTEDBY were necessary, it would already end up in the descriptor:
77 PROC SQL NOPRINT;
78 CREATE TABLE B AS SELECT * FROM A ORDER BY KEY;
NOTE: Table WORK.B created, with 8 rows and 1 columns.
79 QUIT;
NOTE: The PROCEDURE SQL used 0.13 seconds.
80 PROC CONTENTS; RUN;
# Variable Type Len Pos
-----------------------------------
1 KEY Num 8 0
-----Sort Information-----
Sortedby: KEY
Validated: YES
So, after SQL, there is no need to 'assert that the data set _WAS_ sorted after
it came out of PROC SQL' since not only it is already asserted but also
validated to boot.
Kind regards,
==================
Paul M. Dorfman
Jacksonville, FL
==================
"Brucken, Nancy" <Nancy.Brucken@WL.COM> wrote:
Just a footnote to Michael's posting- you can skip the call to PROC DATASETS
and put the SORTEDBY option directly in the PROC SQL step. The code would
look like:
**************************************************************;
* Create the table with PROC SQL *;
**************************************************************;
proc sql;
create table testdata (SORTEDBY=D_RACE) as
select d_race, sum(c_days) as sumcdays, sum(f_days) as sumfdays
from UNIXLIB.summ1
order by d_race;
quit;
Good luck!
Nancy
Nancy Brucken
Parke-Davis, QIP
(734) 622-5767
Internet address: Nancy.Brucken@wl.com
-----Original Message-----
From: RAITHEM [mailto:RAITHEM@WESTAT.COM]
Sent: Friday, July 09, 1999 4:41 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: (MVS) Re:FIRST. and LAST. in PROC SQL
The inimitable Dr. Ray Pass posted the following:
>Well I got what I needed. A couple of respondents (John Gerlach, Erik
>Larsen) pointed out correctly and astutely that SQL really doesn't know or
>care about internal order of observations because of its truly relational
>nature. Nevertheless, one can "order" the obs for display using ORDER BY.
>
>I did get a proposed method from a few of the gods (Ian and Sig) at what my
>friend MVS Raithel calls "SAS Mecca". The suggestion was to use MIN and/or
>MAX functions in PROC SQL to pick out lowest and/or highest values. I
>think I'll do a post-SQL PROC SORT. Seems like I lose nothing with this
>method, and gain good old FIRST. and LAST. vars that I know and love
>(besides, Ian agrees!) Thanks all.
>
Ray, since you were kind enough to actually refer to me as your friend in
public, the least I can do is to give you my own solution to your PROC SQL
blues. So, hold everything; _DO_NOT_ reach for that SORT procedure just
yet!
I would use my old comrade in arms SORTEDBY to assert that the data set
_WAS_
sorted after it came out of PROC SQL. I would chamber one SORTEDBY round
into
the waiting barrel of PROC DATASETS to assert the sort in the quietest,
sneakiest, least resource intensive way. Here is how it might look:
**************************************************************;
* Create the table with PROC SQL *;
**************************************************************;
proc sql;
create table testdata as
select d_race, sum(c_days) as sumcdays, sum(f_days) as sumfdays
from UNIXLIB.summ1
order by d_race;
quit;
**************************************************************;
* Assert the sort order with PROC DATASETS *;
**************************************************************;
proc datasets library=work;
modify testdata (sortedby=d_race);
run;
**************************************************************;
* Use my friendly First. variable... *;
**************************************************************;
data mike;
set testdata;
by d_race;
if first.d_race then put 'first d_race = ' d_race;
run;
Ray, as your practiced eye has undoubtedly already deduced, the first SAS
step,
above, extracts my data and sorts it by D_RACE via the ORDER BY SAS SQL
statement. My DATASETS procedure then asserts the sort on the TESTDATA data
set
via the MODIFY statement. If you were to run a CONTENTS procedure on the
TESTDATA data set after this point, you would see that the SAS System for
Information Delivery considers that the data set _IS_ now sorted. However,
since it is an asserted sort, and not the product of a PROC SORT, the
CONTENTS
listing will read: "Validated: NO". The last SAS step, above, exploits
the
fruit of my labor and uses the FIRST. variable.
Ray; you've just gotta love the SAS System for Information Delivery! There
always seems to be a way to do... anything! Best of luck in implementing a
low-resource intensive solution; I hope that there is no sequel to your SQL
problem!
I hope that this answer proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not reflect
those of my organization or my associates.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance."
E-mail: raithem@westat.com
Author: Tuning SAS Applications in the MVS Environment
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I invent nothing. I rediscover. -- Auguste Rodin
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++