Date: Wed, 12 Dec 2001 12:00:47 -0500
Reply-To: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject: Re: SASFILE efficiency?
Content-Type: text/plain; charset=iso-8859-1
Feeling rather enthused right after the introduction of SASFILE, I have done
some experimentation with it. My conclusion has been that the only area of
performance where SASFILE shines is direct-access reading using POINT=. You
know that without SASFILE, reading a file with POINT= in the sequence of the
observation number is one thing, while doing the same randomly or backwards
is, performance-wise, a totally different story. Because after having been
loaded up, the file needs no rebuffering, the speed of SET POINT= read
becomes independent of the read sequence.
Beyond that, I have found no performance advantages in using SASFILE. Being
ignorant of the exact reason why that happens, I could only conjecture. Let
us consider reading a file sequentially. Without SASFILE, many records are
loaded from disk and into the buffer at once (at Rate1), and then
transferred from the buffer memory to the operating memory one by one (at
Rate2). The summary performance then depends on the amount of data lifted to
the buffer at once (effectively, bufsize) and the balance between Rate1 and
Rate2. It is commonplace to think that Rate2 is much higher than Rate1.
Judging from the profound effect selective reading with the trailing @ has
on performance (effectively cutting down on the Rate2 operations), I am not
so sure Rate2 is *much* higher that Rate1. Besides, given a generous
bufsize, the operations at Rate1 may occur so infrequently that they get
kind of lost in the whole balance.
I doubt that SAS leaves index on disk while lifting the data in memory. I
would be rather inclined to think that, in the light of the conjecture
above, using the index cuts down primarily on Rate2 operations, which are
the only operations performed on the SASFILE loaded data set, and most
frequent operations performed on a disk-resident file. That may explain why
index processing does not occur noticeably faster with SASFILE than without
That having been said, I have observed a slight improvement from using
SASFILE when I ran my tests, but my test file were not as large as yours, so
the load time (requiring one full Rate1 operation) was practically
negligible. And even if it had not been, I would probably have failed to
notice, because I was using the SASFILE statement in the form
SASFILE stuff LOAD ;
so that the step statistics would not have shown the load time, anyway. In
your case, the loading time might be essential.
From what I understand (or not) about SASFILE now, its main usefulness lies
in the applications where an external in-memory index (hash, key-index,
bitmap, format) is maintained with pointers to the observation numbers as
satellite information (rather than loading all the satellites in parallel
arrays). Then if the file POINTed to has been loaded by the SASFILE
statement beforehand and is thus buffer-resident, one can expect really
juicy performance benefits.
Paul M. Dorfman
> -----Original Message-----
> From: Vyverman, Koen [mailto:koen.vyverman@FID-INTL.COM]
> Sent: Wednesday, December 12, 2001 9:00 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: SASFILE efficiency?
> I would be interested to learn whether anyone has adopted the
> SASFILE statement and noted a significant reduction in program
> execution time ...
> As it is, I'm having a 100MB indexed SAS data set here, and
> a reporting macro crunching its way through it by means of
> data steps with subsetting WHERE statements.
> Encouraged by what I read about SASFILE, I decided to try
> the following:
> sasfile dataset load;
> sasfile dataset close;
> And see what happens: nothing much. In fact, whereas my %report
> used to take about an hour to run, with the SASFILE statements
> it takes on the average 25% _longer_!
> My set-up here is SAS8.2 on WinNT4.0 (SP6), ultra-wide SCSI
> hard disk with lots of space, 512MB of RAM. Using the perfor-
> mance monitor, I can see that upon loading the dataset into
> memory, the expected amount of RAM is being eaten away, so
> that part at least works as advertized.
> Would it be unreasonable to suspect that the SAS index file
> is actually _not_ being memorized along with the data set,
> thereby still necessitating physical disk-reads of said index,
> as opposed to the supposedly faster memory access?
> But even then, I fail to comprehend why the process would
> overall take longer to run, unless my box here uses some
> sort of frighteningly slow RAM ...
> Any input/feedback appreciated,
> Koen Vyverman
> Database Marketing Manager
> Fidelity Investments - Luxembourg
Blue Cross Blue Shield of Florida, Inc., and its subsidiary and
affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.