LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 9 Sep 2003 08:50:42 -0400
Reply-To:     Michael Raithel <RAITHEM@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Michael Raithel <RAITHEM@WESTAT.COM>
Subject:      Re: Index performance question
Content-Type: text/plain

Dear SAS-L-ers,

Richard posted the following very interesting question:

> I've got a relatively large file (4.3 million obs - 50 > variables) with a compound index on 13 of the variables. > I've noticed that the first time a query is made the index > performs ok. If I make exactly the same query (or a very > similar query) in a later data step the index performs > "lightning fast." It's almost as if the index "remembers" > the previous invocation and uses this information to it's advantage. > > Does this make sense? > > I'm almost at the point where I'm going to perform a query in > a throw-a-way data _null_ step just to get better performance > on my later invocations. My tests seem to show a significant > overall savings in both cpu and real time if I follow this > strategy (that is, wasting a little time on a low overhead > _null_ step so I can realize big savings on "real" data steps). >

Richard, that is _MOST_ strange behavior, indeed! I would _LOVE_ to know what operating system you are running this SAS application on... Oh, and it would be great to know what release of SAS you are using, too.

When SAS opens a data set for processing, it allocates buffers for the data and index portions of that SAS data set. The number of data buffers is governed by the current value of the BUFNO= option that is in effect for the data set. The number of Index buffers is set equal to the number of levels of the Index--which is, itself, a B-Tree structure. However, a maximum of three index buffers is allocated if the data set is to be read and a maximum of four index buffers is allocated if the data set is to be updated.

After SAS is done with a particular DATA step, the buffers for the SAS data sets and SAS indexes used by the DATA step are relinquished. They exist in computer memory and the "space" that they take up is needed for buffers (and other memory constructs) for the next DATA step(s) and/or PROCs. (The exception to this would be when you use the SASFILE statement to load a SAS data set into memory and keep it there between DATA and PROC steps).

From what you describe, it is almost as if SAS is "reclaiming" the index buffers from your previous DATA steps. That is _MOST_ strange behavior, indeed! I have had a lot of experience with SAS indexes (especially on the BIG IRON), and have never noticed this, before. Perhaps you have discovered a new "performance loophole" that we may be able to exploit.

I would love to get an explanation of this phenomena from a "little birdie" if at all possible.

I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the books on performance" E-mail: MichaelRaithel@westat.com Author: Tuning SAS Applications in the MVS Environment Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. - Umberto Eco +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page