| Date: | Thu, 8 Jul 2004 00:34:03 -0400 |
| Reply-To: | sashole@bellsouth.net |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Paul M. Dorfman" <sashole@BELLSOUTH.NET> |
| Organization: | Sashole of Florida |
| Subject: | Re: selecting smallest rec |
|
| In-Reply-To: | <200407080023.i680NH828791@listserv.cc.uga.edu> |
| Content-Type: | text/plain; charset="us-ascii" |
Nicole,
You have got sage advices regarding SQL, but beware that when you do this in
the Kevin-corrected form,
having col1=min(col1),
SQL will get busy remerging the statistic it has computed with the original
file, even though the output is to contain a single record. The SQL
optimizer here is, well... not optimized enough so far. Instead of (1)
merrily going record by record computing MIN, memorizing the respective
satellite variables in the record, and finally outputting the corresponding
values at end, SQL (2) goes through the file once, computes the statistic,
then does the second pass to match the minimal value with the original file.
SUMMARY will run 20 times faster, but it cannot grab the satellites right
with the ID statement. Another choice, the Data step, is much faster still,
as evidenced from
21 data a ;
22 do name = 1 to 1e7 ;
23 col1 = ceil (ranuni (1) * 1e10) ;
24 output ;
25 end ;
26 run ;
NOTE: The data set WORK.A has 10000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 6.98 seconds
27
28 proc sql ;
29 create table minsql as select * from a having min(col1) = col1 ;
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.MINSQL created, with 1 rows and 2 columns.
30 quit ;
NOTE: PROCEDURE SQL used (Total process time):
real time 12.45 seconds
31
32 data min_ds ( drop = _: ) ;
33 col1 = constant ('big') ;
34 do until (eof) ;
35 set a (rename = (col1 = _col1 name = _name)) end = eof ;
36 if _col1 => col1 then continue ;
37 col1 = _col1 ;
38 name = _name ;
39 end ;
40 run ;
NOTE: There were 10000000 observations read from the data set WORK.A.
NOTE: The data set WORK.MIN_DS has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 2.93 seconds
cpu time 2.92 seconds
Of course, if you have more than one satellite (NAME in this case), you will
have to code more in RENAME and re-assignment statements, even though it can
be easily automated. SQL saves that part, so if you do not really care about
performance, SQL is the way to go.
Kind regards,
----------------
Paul M. Dorfman
Jacksonville, FL
----------------
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Nicole Bibb
> Sent: Wednesday, July 07, 2004 8:23 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: selecting smallest rec
>
> I am trying to select a record based on the smallest value of
> a variable.
>
>
> proc sql errorstop;
> create table goodscr as
> select *
> from champion2
> having min(col1);
> quit;
>
> The data looks like
>
> name col1
> green 37
> blue 125
> brown 325
>
> I want to keep only the record with the smallest value from col1.
>
> The sql is ignoring my having clause.
>
|