Date: Tue, 9 Sep 2003 08:50:42 -0400
Reply-To: Michael Raithel <RAITHEM@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Michael Raithel <RAITHEM@WESTAT.COM>
Subject: Re: Index performance question
Richard posted the following very interesting question:
> I've got a relatively large file (4.3 million obs - 50
> variables) with a compound index on 13 of the variables.
> I've noticed that the first time a query is made the index
> performs ok. If I make exactly the same query (or a very
> similar query) in a later data step the index performs
> "lightning fast." It's almost as if the index "remembers"
> the previous invocation and uses this information to it's advantage.
> Does this make sense?
> I'm almost at the point where I'm going to perform a query in
> a throw-a-way data _null_ step just to get better performance
> on my later invocations. My tests seem to show a significant
> overall savings in both cpu and real time if I follow this
> strategy (that is, wasting a little time on a low overhead
> _null_ step so I can realize big savings on "real" data steps).
Richard, that is _MOST_ strange behavior, indeed! I would _LOVE_ to know
what operating system you are running this SAS application on... Oh, and it
would be great to know what release of SAS you are using, too.
When SAS opens a data set for processing, it allocates buffers for the data
and index portions of that SAS data set. The number of data buffers is
governed by the current value of the BUFNO= option that is in effect for the
data set. The number of Index buffers is set equal to the number of levels
of the Index--which is, itself, a B-Tree structure. However, a maximum of
three index buffers is allocated if the data set is to be read and a maximum
of four index buffers is allocated if the data set is to be updated.
After SAS is done with a particular DATA step, the buffers for the SAS data
sets and SAS indexes used by the DATA step are relinquished. They exist in
computer memory and the "space" that they take up is needed for buffers (and
other memory constructs) for the next DATA step(s) and/or PROCs. (The
exception to this would be when you use the SASFILE statement to load a SAS
data set into memory and keep it there between DATA and PROC steps).
From what you describe, it is almost as if SAS is "reclaiming" the index
buffers from your previous DATA steps. That is _MOST_ strange behavior,
indeed! I have had a lot of experience with SAS indexes (especially on the
BIG IRON), and have never noticed this, before. Perhaps you have discovered
a new "performance loophole" that we may be able to exploit.
I would love to get an explanation of this phenomena from a "little birdie"
if at all possible.
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not reflect
those of my organization or my associates. All SAS code and/or methodologies
specified in this posting are for illustrative purposes only and no warranty
is stated or implied as to their accuracy or applicability. People deciding
to use information in this posting do so at their own risk.
Michael A. Raithel
"The man who wrote the books on performance"
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second
I have come to believe that the whole world is an enigma, a
harmless enigma that is made terrible by our own mad attempt
to interpret it as though it had an underlying truth.
- Umberto Eco