Date: Wed, 12 Jan 2000 12:40:28 -0500
Reply-To: Douglas Dame <dougdame@HPE.UFL.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Douglas Dame <dougdame@HPE.UFL.EDU>
Subject: Re: Anyone with a macro to split file (w a by statement)?
In-Reply-To: <s87c663a.064@ita.doc.gov>
Content-Type: text/plain; charset="us-ascii"
Howard Schreier wrote:
>
> I would try to avoid running any SAS step or
> series of steps that many times.
>
> <snip snip>
>
> You did not give a lot of particulars, and perhaps something
> rules out this approach. But it's always worth a look before
> getting into something messy.
>
In the words of (?) Oscar Wilde, I wish I had said that.
While running through a bunch of stuff repetitively in a macro-powered loop
sometimes works well, bear in mind that the CPU (and clock) time associated
with running any proc or data step has two main parts: a more-or-less linear
variable component associated the size of the dataset you're processing, and
a more or less fixed up-front "overhead" cost of loading the proc into
memory so it's ready for use. (And there's also some overhead
hardware-related latency time associated with the disk drive getting its
head/s to the data, I'll ignore that, I try to live way above the hardware
level.)
The cumulative time is takes to invoke proc summary, as an example, 1700+
times is not inconsequential by any means. At a guess, say the overhead cost
of 0.7 CPU per invocation. The extra overhead needed to invoke that proc
alone 1700 times, instead of once on a much larger dataset, is 19.8 MINUTES,
with exactly the same amount of data being processed. Throw a few more procs
or data steps of various kinds into the loop, and you're easily looking an
hour or two of additional CPU time being chewed up. Elapsed wall-time would
depend on your computing environment, maybe 1.2 to 1.5 times the CPU seconds
on a one-person workstation, perhaps as bad as 10 times as much for a
medium/low priority job in a busy batch mainframe environment where it got
swapped out a lot for fairly extended periods.
If you're forced to loop through some section of code 100's or 1000's of
times to deal individually with subsets of your data, it pays to think hard
about how much of your pre-processing can be accomplished PRIOR to
MrMacroLoop, using "by-group" processing.
As an extreme but real example, I once re-engineered the macro-loops in a
production job that was generating complaints due to a long run time. By
moving pre-processing to by-groups, and leaving only the bare minimum of
stuff in MrMacroLoop that absolutely had to be there, an 6-hour-plus
run-time was reduced to 9 minutes. (Following that, my reputation was
sterling for the rest of the day. Ah, those were the good days.)
(The floor will now entertain discussion on today's debate topic:" SAS macro
do/end loops are available to the community of SAS programmers due to a
diabolical long-standing conspiracy between Mr. Watson, Mr. Grove, Mr.
Gates, Dr. Goodnight, and the National Association of Public Utilities, with
the express secret objective of increasing sales of computer hardware and
usage of electricity; True or False?" </little_joke>)
HTH/somebody/someplace/sometime
Douglas Dame
Shands HealthCare
Gainesville FL
|