LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 1999, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 10 Jul 1999 01:55:34 -0400
Reply-To:     "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Organization: KInPH
Subject:      Re: (MVS) Re:FIRST. and LAST. in PROC SQL
Content-Type: text/plain; charset=koi8-r

Nancy and Mike,

If Ray's original question was about emulating a control break during the SQL execution, Sig and Howard have already answered it. If it was: "How to make SAS use FIRST. and LAST. if the file has been ordered by SQL instead of SORT?", then I must be missing something. The ability of SAS to create the FIRST. and LAST. automatics, as well as its ability to process BY-groups in procs does not depend in the least on what the dataset has in its descriptor. It depends only on the actual physical sequence of the incoming keys specified in the BY clause. SAS validates the order of the keys in the process of its invisible control break while the dataset is being read in, and if the order checks OK, creates the FIRST. and LAST. for every record. This process is performed absolutely regardless of whether SORTEDBY is sitting in the descriptor or not, because the only way to do a control break is to compare the next key with the previous one. Besides, when SORTEDBY is forced in the descriptor via either DATASETS or the data set option, SAS does NOT validate the sequence of the key variables. In fact, one could write SORTEDBY in the header of a totally disordered SAS data file, and SAS would not care less:

67 OPTION NOCENTER; 68 DATA A (SORTEDBY=KEY); 69 DO KEY = 4,4,4,2,2,1,3,3; OUTPUT; END; 70 RUN; NOTE: The data set WORK.A has 8 observations and 1 variables. NOTE: The DATA statement used 0.25 seconds.

71 PROC CONTENTS; RUN; # Variable Type Len Pos ----------------------------------- 1 KEY Num 8 0 -----Sort Information----- Sortedby: KEY Validated: NO

Yet try to tell SAS "process A as sorted" by using BY KEY, and it is less than welcome:

72 DATA B; SET A; BY KEY; IF FIRST.KEY; RUN; ERROR: BY variables are not properly sorted on data set WORK.A. KEY=4 FIRST.KEY=0 LAST.KEY=0 _ERROR_=1 _N_=3

However, if you do have the file physically sorted, SAS would not have the problem with the BY without any SORTEDBY:

73 DATA A; 74 DO KEY = 1,2,2,3,3,4,4,4; OUTPUT; END; 75 RUN; NOTE: The data set WORK.A has 8 observations and 1 variables.

76 DATA B; SET A; BY KEY; IF FIRST.KEY; RUN; NOTE: The data set WORK.B has 4 observations and 1 variables.

And then, do we have a problem using BY-processing against sorted flat files accessed via views without SORTEDBY present? Of course not! All in all, SORTEDBY is needed for one only purpose: To let PROC SORT know that input is already sorted, so it should not waste resources. And PROC SORT obeys blindly, for with SORTEDBY=, it will not sort, no matter whether the dataset is actually ordered or not and regardless of VALIDATE=YES or NO.

Back to Ray's SQL, after it has executed with ORDER BY, the output is sorted, and, as Howard has noted, that is all we need for BY-processing. On a different note, SQL with ORDER BY clause DOES write the same exact information in the header as PROC SORT does, that is, SORTEDBY=KEY and VALIDATE=YES, so even if SORTEDBY were necessary, it would already end up in the descriptor:

77 PROC SQL NOPRINT; 78 CREATE TABLE B AS SELECT * FROM A ORDER BY KEY; NOTE: Table WORK.B created, with 8 rows and 1 columns. 79 QUIT; NOTE: The PROCEDURE SQL used 0.13 seconds.

80 PROC CONTENTS; RUN; # Variable Type Len Pos ----------------------------------- 1 KEY Num 8 0 -----Sort Information----- Sortedby: KEY Validated: YES

So, after SQL, there is no need to 'assert that the data set _WAS_ sorted after it came out of PROC SQL' since not only it is already asserted but also validated to boot.

Kind regards, ================== Paul M. Dorfman Jacksonville, FL ==================

"Brucken, Nancy" <Nancy.Brucken@WL.COM> wrote:

Just a footnote to Michael's posting- you can skip the call to PROC DATASETS and put the SORTEDBY option directly in the PROC SQL step. The code would look like:

**************************************************************; * Create the table with PROC SQL *; **************************************************************; proc sql; create table testdata (SORTEDBY=D_RACE) as select d_race, sum(c_days) as sumcdays, sum(f_days) as sumfdays from UNIXLIB.summ1 order by d_race; quit;

Good luck! Nancy

Nancy Brucken Parke-Davis, QIP (734) 622-5767 Internet address: Nancy.Brucken@wl.com

-----Original Message----- From: RAITHEM [mailto:RAITHEM@WESTAT.COM] Sent: Friday, July 09, 1999 4:41 PM To: SAS-L@LISTSERV.UGA.EDU Subject: (MVS) Re:FIRST. and LAST. in PROC SQL

The inimitable Dr. Ray Pass posted the following:

>Well I got what I needed. A couple of respondents (John Gerlach, Erik >Larsen) pointed out correctly and astutely that SQL really doesn't know or >care about internal order of observations because of its truly relational >nature. Nevertheless, one can "order" the obs for display using ORDER BY. > >I did get a proposed method from a few of the gods (Ian and Sig) at what my >friend MVS Raithel calls "SAS Mecca". The suggestion was to use MIN and/or >MAX functions in PROC SQL to pick out lowest and/or highest values. I >think I'll do a post-SQL PROC SORT. Seems like I lose nothing with this >method, and gain good old FIRST. and LAST. vars that I know and love >(besides, Ian agrees!) Thanks all. >

Ray, since you were kind enough to actually refer to me as your friend in public, the least I can do is to give you my own solution to your PROC SQL blues. So, hold everything; _DO_NOT_ reach for that SORT procedure just yet!

I would use my old comrade in arms SORTEDBY to assert that the data set _WAS_ sorted after it came out of PROC SQL. I would chamber one SORTEDBY round into the waiting barrel of PROC DATASETS to assert the sort in the quietest, sneakiest, least resource intensive way. Here is how it might look:

**************************************************************; * Create the table with PROC SQL *; **************************************************************; proc sql; create table testdata as select d_race, sum(c_days) as sumcdays, sum(f_days) as sumfdays from UNIXLIB.summ1 order by d_race; quit;

**************************************************************; * Assert the sort order with PROC DATASETS *; **************************************************************; proc datasets library=work; modify testdata (sortedby=d_race); run;

**************************************************************; * Use my friendly First. variable... *; **************************************************************; data mike; set testdata; by d_race;

if first.d_race then put 'first d_race = ' d_race;

run;

Ray, as your practiced eye has undoubtedly already deduced, the first SAS step, above, extracts my data and sorts it by D_RACE via the ORDER BY SAS SQL statement. My DATASETS procedure then asserts the sort on the TESTDATA data set via the MODIFY statement. If you were to run a CONTENTS procedure on the TESTDATA data set after this point, you would see that the SAS System for Information Delivery considers that the data set _IS_ now sorted. However, since it is an asserted sort, and not the product of a PROC SORT, the CONTENTS listing will read: "Validated: NO". The last SAS step, above, exploits the fruit of my labor and uses the FIRST. variable.

Ray; you've just gotta love the SAS System for Information Delivery! There always seems to be a way to do... anything! Best of luck in implementing a low-resource intensive solution; I hope that there is no sequel to your SQL problem!

I hope that this answer proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the book on performance." E-mail: raithem@westat.com Author: Tuning SAS Applications in the MVS Environment ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I invent nothing. I rediscover. -- Auguste Rodin ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page