LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 28 Feb 2005 12:27:02 -0500
Reply-To:   Michael Raithel <michaelraithel@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Michael Raithel <michaelraithel@WESTAT.COM>
Subject:   Re: Creating dynamic Mainframe dataset using libname
Content-Type:   text/plain; charset="us-ascii"

Dear SAS-L-ers,

In an offshoot of this original thread, discussing optimal mainframe blocksizes, Larry Bertolini posted, in part the following:

> But I wonder if the smaller blocksize might not be more > efficient when performing random access to a dataset; e.g., > using POINT=, or KEY= on an indexed dataset. Why move around > 28K of data per I/O, if you're only after a single, 1K > observation, and if the odds of needing another observation > from the same block are very low? > > I haven't benchmarked this scenario, but I'd expect random > access to an indexed SAS dataset, with KEY=foo / UNIQUE, to > behave similarly to random access to a VSAM KSDS, where a > moderate-sized block (control interval, in VSAM-ese) of 4K is typical. >

<<Larry's entire posting may be found beneath the Sig line.>>

Larry, I know what you mean. That argument has been around for quite a while and is pretty well accepted in many circles. I have never been comfortable accepting it though--whether I am writing a program to access VSAM files or one to access SAS data sets via an index.

My reasoning is that:

1. If you allocate a substantial number of buffers in memory, 2. Lug large data set blocks into those buffers with every random access, 3. The DBMS has a Last Recently Used refresh mechanism for the buffers,

... that you increase the probability that the next observation that you need is already sitting in memory and that an I/O event can be avoided.

The smaller the SAS data set and the greater the number of buffers allocated, the greater should be the probability that the SAS data set page that is needed is already in memory. Conversely, the larger the data set and the fewer the buffers that you can allocate, the probability that you will already have a needed SAS page in memory is less likely. However, I have to believe that the probability is still greater with bigger block sizes than with smaller ones.

Now, as an aside, I have always cheated and sorted my transaction data set by key variable value. This increases the chances that the next index search will turn up an observation on the same page as the last. But, hey, don't tell anybody; I don't want to be known as a cheater!

Larry, thanks for the interesting discussion point!

I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the book on performance" E-mail: MichaelRaithel@westat.com Author: Tuning SAS Applications in the MVS Environment Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172

Currently Writing: The Complete Guide to Creating and Using SAS Indexes (due Summer 2005)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The cure for boredom is curiosity. There is no cure for curiosity. - Dorothy Parker +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

<<Here is Larry's original posting>>

> I agree, half-track LRECL and BLKSIZE are preferred in almost > all situations. It is more space-efficient in all cases, and > is certainly more CPU and I/O efficient for sequential access > to datasets (I'd guess that somewhere between 95% and 99.5% > of all SAS code that gets executed is basically sequential in nature). > > But I wonder if the smaller blocksize might not be more > efficient when performing random access to a dataset; e.g., > using POINT=, or KEY= on an indexed dataset. Why move around > 28K of data per I/O, if you're only after a single, 1K > observation, and if the odds of needing another observation > from the same block are very low? > > I haven't benchmarked this scenario, but I'd expect random > access to an indexed SAS dataset, with KEY=foo / UNIQUE, to > behave similarly to random access to a VSAM KSDS, where a > moderate-sized block (control interval, in VSAM-ese) of 4K is typical. > > (I suspect that if you can use SASFILE to keep the entire SAS > dataset in cache, the blocksize, as it relates to random I/O > performance, is probably irrelevant.) >


Back to: Top of message | Previous page | Main SAS-L page