Hi Robin -
Don't feel alone about SQL optimization gotchas. It can be very convenient
and compact compared to equivalent datastep and procs, but like anything
that makes so many behind-the-scenes decisions, it can easily go awry. A
very good example was a recent thread on SQL with two versions of the UPCASE
function (Upper() vs. Upcase()); the SQL was logically identical, except the
"Upper()" version got its feet stuck in the mud and SQL aborted.
From: Pardee, Roy (pardee.r@GHC.ORG)
Subject: What the heck is UPPER()?
Date: 2004-06-24 13:25:02 PST
DDS Data Extraction
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Robin
Sent: Friday, August 13, 2004 1:03 AM
Subject: Re: Looking for advise on reducing run time of a SAS application
Talk about 'deja vu' !!
Have just come from a celebration - run time reduced from 18 hours plus
(never got a complete run as we ran into over night updates) down to 8
minutes 14 seconds. And this was just one step of a reasonably large
programme. We estimated that the complete run time would have been 60
hours plus .
The culprit was an SQL join which could not be optimised by SAS and
resulted in cartesian joins. The example above was joining 9,125 to 9M
. The original where clause used Upcase, Compress and Indexw
functions. We moved the order of the where clause evaluation, removed
the Indexw and replaced with CONTAINS (which required a pos processing
step to achieve the same results), removed the upcase by moving it to
the pre-process step , and the kicker was to remove the necessity of
the Cartesian join.
Original Query was:
"))=upcase(compress(d2.sname,"'-; ")) OR
Indexw(d1.dia_fullname, trim(upcase(compress(D2.sname,"'-; "))))
and d1.daydob ne d2.daybthdt
Revised query was:
/* Do the Upcase,Compress, Trim in the prior step as we subset the data */
/* so we are matching 9125 to 6M. This is not the overhead it appears */
/* as we are passing the data in any case for other reasons. */
Where ( d1.mondob=d2.monbthdt
(d1.fullname contains d2.usurname or
d2.fullname contains d1.usurname
Then a very quick SQL step to apply the INDEXW testing (which runs in
under 1/2 a second).
Do not get me wrong - SQL has its place - but sometimes it is soooo easy
to write code that gives the correct answer but is so darn inefficient.
>SAS tends to be heavily I/O bound, so usually the easiest hit is to cut
>on the I/O - don't write unnecessary datasets, don't sort unless you must
>but sort instead of indexing where you can, find ways to combine several
>steps into one, use BY processing instead of using a macro to loop through
>dataset or set of data and proc steps multiple times, use the dictionary
>tables to access metadata instead of PROC CONTENTS.
>Sometimes, things are slowed down by the screen refresh necessitated by
>writing stuff to the LOG or output window - turn off the verbosity of
>like PROC DATASETS or the MPRINT, MLOGIC, and SYMBOLGEN options - once the
>program is debugged, you probably don't need to see all that detail anyway.
>You can also try closing the LOG window, or even just minimizing the SAS
>session, or try running the application in batch mode. Turn off the viewer
>if you're using ODS, get rid of things like PROC PRINT.
>If several parts could conceivably run concurrently, break the program up
>into a few parts and run them at the same time in batch. You can overdo
>but I've seen reductions in wall time of 20% or better.
>If the system and/or the data reside on a network, things may speed up if
>you can move everything to the local hard drive.
>I've seen running time go from around 4 hours to under 4 minutes while
>achieving the same end result against the same data while running on the
>same machine when code is rewritten with an eye on minimizing I/O.
>Without knowing your task, having a nodding acquaintance with your data,
>seeing your code, it's impossible to suggest specific improvements.
>"David Fickbohm" <DavidF@HOMEGAIN.COM> wrote in message
>>I am working in an windows XP environment in SAS 8.1.
>>I have an application that runs about an hour and thirty minutes each day.
>>This application creates many sas datasets that are not needed once they
>>have been merged with another file or have been summarized or mathematics
>>has been done, etc. I have no problems with running out of space all of
>>these files are relatively small, 10k - 15k rows is probably the largest.
>>My questions is what would you people suggest to reduce run time. This is
>>NOT a major problem. The user is happy receiving his data by 9 to 9:30
>>I ask the question for three reasons:
>>1 - I would like reduce the run time. I have a feeling if I get some
>>on efficieny I can reduce the run time signifigantly.
>>2 - I would like to learn ways to write code so it is more efficient. I
>>have learned SAS more by reading manuals, trying things. asking questions,
>>etc. than sitting in classrooms.
>>3 - If there is a class or a book someone can suggest on the subject I
>>would appreciate hearing about it.
>>Data Mining Analyst
>>2450 45th St.
>>Emeryville, CA, 94608
>>Phone 510 655 0800 ext 4151
Robin & Charmaine Templer
Wellington, New Zealand
FAX : 64-4-476-4299