LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 8 Jul 2004 00:34:03 -0400
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject:      Re: selecting smallest rec
Comments: To: Nicole Bibb <nicole.d.bibb@WELLSFARGO.COM>
In-Reply-To:  <200407080023.i680NH828791@listserv.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Nicole,

You have got sage advices regarding SQL, but beware that when you do this in the Kevin-corrected form,

having col1=min(col1),

SQL will get busy remerging the statistic it has computed with the original file, even though the output is to contain a single record. The SQL optimizer here is, well... not optimized enough so far. Instead of (1) merrily going record by record computing MIN, memorizing the respective satellite variables in the record, and finally outputting the corresponding values at end, SQL (2) goes through the file once, computes the statistic, then does the second pass to match the minimal value with the original file. SUMMARY will run 20 times faster, but it cannot grab the satellites right with the ID statement. Another choice, the Data step, is much faster still, as evidenced from

21 data a ; 22 do name = 1 to 1e7 ; 23 col1 = ceil (ranuni (1) * 1e10) ; 24 output ; 25 end ; 26 run ; NOTE: The data set WORK.A has 10000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 6.98 seconds 27 28 proc sql ; 29 create table minsql as select * from a having min(col1) = col1 ; NOTE: The query requires remerging summary statistics back with the original data. NOTE: Table WORK.MINSQL created, with 1 rows and 2 columns.

30 quit ; NOTE: PROCEDURE SQL used (Total process time): real time 12.45 seconds 31 32 data min_ds ( drop = _: ) ; 33 col1 = constant ('big') ; 34 do until (eof) ; 35 set a (rename = (col1 = _col1 name = _name)) end = eof ; 36 if _col1 => col1 then continue ; 37 col1 = _col1 ; 38 name = _name ; 39 end ; 40 run ; NOTE: There were 10000000 observations read from the data set WORK.A. NOTE: The data set WORK.MIN_DS has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 2.93 seconds cpu time 2.92 seconds

Of course, if you have more than one satellite (NAME in this case), you will have to code more in RENAME and re-assignment statements, even though it can be easily automated. SQL saves that part, so if you do not really care about performance, SQL is the way to go.

Kind regards, ---------------- Paul M. Dorfman Jacksonville, FL ----------------

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Nicole Bibb > Sent: Wednesday, July 07, 2004 8:23 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: selecting smallest rec > > I am trying to select a record based on the smallest value of > a variable. > > > proc sql errorstop; > create table goodscr as > select * > from champion2 > having min(col1); > quit; > > The data looks like > > name col1 > green 37 > blue 125 > brown 325 > > I want to keep only the record with the smallest value from col1. > > The sql is ignoring my having clause. >


Back to: Top of message | Previous page | Main SAS-L page