Date: Wed, 21 Jul 2010 15:34:19 -0400
Reply-To: Ian Whitlock <iw1sas@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <iw1sas@GMAIL.COM>
Subject: Re: In search of a more efficient program
Content-Type: text/plain; charset=ISO-8859-1
Andy,
You have gotten some answers to your question, but none have addressed
the problem.
Most of the time the only significant improvements in efficiency of a
program are to improve the algorithm. There is no magic, just hard work.
In SAS usually data passing is often the factor that causes long runs.
So look for ways to eliminate steps. Perhaps by combining them. Also
look for ways to arrange the hardware for more efficient IO.
Unfortunately, there is little real advice to give without more knowledge
of what you are doing.
I would begin with a list of steps and the time it takes each to execute.
Then I would try to summarize what each step does and see what may be
run in parallel or combined. Then rethink the solution.
A story from the early days of computing that made an impression on me
explained the difference between a coder and computer scientist. The
problem was to sort a thousand decks (really slow computers) of cards.
The coder wrote a bubble sort to sort one deck and then applied it to
each deck. The computer scientist made 52 piles putting each card in
its pile and then made the 1000 decks.
If you are lucky you might see an easy combination of steps or be able to
drastically improve a few steps. If you are unlucky you may have to look
at the whole algorithm and and try a completely different approach to
the problem. If you really are unlucky there may be little to improve.
The good news is that the speed of the hardware is rising rapidly.
If you still want help you might outline what your program does for SAS-L
and hope that someone might say, "Oh, usually it is better to do ..."
Ian Whitlock
===============
Date: Wed, 21 Jul 2010 07:05:30 -0400
From: Andy Arnold <awasas@COX.NET>
Subject: In search of a more efficient program
Background:
I've inherited a large, complex SAS program. Most files are quite small;
however, some are extremely large and have become a problem.
The files in question have 12-24 fields that are mostly numeric with a few
short (1-4character) fields. The files that are killing me have 20M records
and one has 160M records.
The files don't use SAS compression because the records are too short to
make compression cost & time effective.
Problem 1:
What are the trade-offs if I force numeric length to 4 or 2? Does SAS
always use a NUM 8 format internally? If I force short numeric field
lengths, will SAS have to convert them up to length=8 and back down again in
order to use the data?
Problem 2:
What are the trade-offs if I use an index instead of a sort? The problem
sort takes an hour, uses a single numeric key with about 10 distinct values,
sorts 20M records that are about 100 bytes long. I've been successful with
indexing before; in that situation, I needed to sort the file in 15
different sequences.
Thanks for your input and advice.
--Andy