LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2001, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 12 Dec 2001 12:00:47 -0500
Reply-To:     "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject:      Re: SASFILE efficiency?
Comments: To: "Vyverman, Koen" <koen.vyverman@FID-INTL.COM>
Content-Type: text/plain; charset=iso-8859-1

Koen,

Feeling rather enthused right after the introduction of SASFILE, I have done some experimentation with it. My conclusion has been that the only area of performance where SASFILE shines is direct-access reading using POINT=. You know that without SASFILE, reading a file with POINT= in the sequence of the observation number is one thing, while doing the same randomly or backwards is, performance-wise, a totally different story. Because after having been loaded up, the file needs no rebuffering, the speed of SET POINT= read becomes independent of the read sequence.

Beyond that, I have found no performance advantages in using SASFILE. Being ignorant of the exact reason why that happens, I could only conjecture. Let us consider reading a file sequentially. Without SASFILE, many records are loaded from disk and into the buffer at once (at Rate1), and then transferred from the buffer memory to the operating memory one by one (at Rate2). The summary performance then depends on the amount of data lifted to the buffer at once (effectively, bufsize) and the balance between Rate1 and Rate2. It is commonplace to think that Rate2 is much higher than Rate1. Judging from the profound effect selective reading with the trailing @ has on performance (effectively cutting down on the Rate2 operations), I am not so sure Rate2 is *much* higher that Rate1. Besides, given a generous bufsize, the operations at Rate1 may occur so infrequently that they get kind of lost in the whole balance.

I doubt that SAS leaves index on disk while lifting the data in memory. I would be rather inclined to think that, in the light of the conjecture above, using the index cuts down primarily on Rate2 operations, which are the only operations performed on the SASFILE loaded data set, and most frequent operations performed on a disk-resident file. That may explain why index processing does not occur noticeably faster with SASFILE than without it.

That having been said, I have observed a slight improvement from using SASFILE when I ran my tests, but my test file were not as large as yours, so the load time (requiring one full Rate1 operation) was practically negligible. And even if it had not been, I would probably have failed to notice, because I was using the SASFILE statement in the form

SASFILE stuff LOAD ;

so that the step statistics would not have shown the load time, anyway. In your case, the loading time might be essential.

From what I understand (or not) about SASFILE now, its main usefulness lies in the applications where an external in-memory index (hash, key-index, bitmap, format) is maintained with pointers to the observation numbers as satellite information (rather than loading all the satellites in parallel arrays). Then if the file POINTed to has been loaded by the SASFILE statement beforehand and is thus buffer-resident, one can expect really juicy performance benefits.

Kind regards, ================== Paul M. Dorfman Jacksonville, FL ==================

> -----Original Message----- > From: Vyverman, Koen [mailto:koen.vyverman@FID-INTL.COM] > Sent: Wednesday, December 12, 2001 9:00 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: SASFILE efficiency? > > > LS, > > I would be interested to learn whether anyone has adopted the > SASFILE statement and noted a significant reduction in program > execution time ... > > As it is, I'm having a 100MB indexed SAS data set here, and > a reporting macro crunching its way through it by means of > data steps with subsetting WHERE statements. > > Encouraged by what I read about SASFILE, I decided to try > the following: > sasfile dataset load; > %report(...) > sasfile dataset close; > > And see what happens: nothing much. In fact, whereas my %report > used to take about an hour to run, with the SASFILE statements > it takes on the average 25% _longer_! > > My set-up here is SAS8.2 on WinNT4.0 (SP6), ultra-wide SCSI > hard disk with lots of space, 512MB of RAM. Using the perfor- > mance monitor, I can see that upon loading the dataset into > memory, the expected amount of RAM is being eaten away, so > that part at least works as advertized. > > Would it be unreasonable to suspect that the SAS index file > is actually _not_ being memorized along with the data set, > thereby still necessitating physical disk-reads of said index, > as opposed to the supposedly faster memory access? > > But even then, I fail to comprehend why the process would > overall take longer to run, unless my box here uses some > sort of frighteningly slow RAM ... > > Any input/feedback appreciated, > Koen. > > --------------------------------- > Koen Vyverman > Database Marketing Manager > Fidelity Investments - Luxembourg > --------------------------------- > >

Blue Cross Blue Shield of Florida, Inc., and its subsidiary and affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.


Back to: Top of message | Previous page | Main SAS-L page