LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2001, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 13 Dec 2001 12:26:54 -0000
Reply-To:     "Vyverman, Koen" <koen.vyverman@FID-INTL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Vyverman, Koen" <koen.vyverman@FID-INTL.COM>
Subject:      Re: SASFILE efficiency?
Content-Type: text/plain; charset="iso-8859-1"


My remarks on SASFILE performance, or the lack thereof, have sparked the usual bunch of useful and interesting comments. Thanks are due to -- in no particular order -- Kevin Viel, Bill Viergever, Paul Dorfman, David Cassell, po' Puddin' Man, and Soeren Hvidkjaer for their feedback. My original message is included below the sig.

First, as a general comment, lack of RAM was never an issue. I kept a close watch on the NT Performance Monitor, and even with the full 100MB data set loaded in memory, the amount of available RAM never dipped below 250MB. The system swap file was never used. The index file is another 11MB, so that would hardly impact available memory either.

Secondly, my timing measurements did not include the time required to load/close the SASFILE. And even if they did, the statements execute in a matter of, say 10 seconds, so that's surely negligible compared to the typical 1 hour run-time of the reporting macro.

The general experience with using SASFILE seems to be that its efficiency benefits are rather restricted to a certain class of processing. Evidence given by Kevin shows that run- ning a PROC MEANS on a sizeable data set certainly benefits from SASFILE. Paul's eloquent argumentation indicates the same for direct-access with POINT=. There may be others, but from what I've seen, subsetting with a WHERE-clause on an indexed key-variable is not one of them.

On the question whether the index is loaded into memory along with the data set, opinions are divided. Whether it is or not, may be a largely nuncupatory matter, as the ex- pected efficiency boost with WHERE processing fails to mani- fest itself. Given time though, I will attempt some more rigorous testing and report back in due time.

Finally, David suggested a re-think of my report process flow, to see whether some efficiency gains might be achieved by re-arranging things. So, to satisfy curiosity on one hand and on the other perhaps solicit some useful strategies that I may have overlooked, here's an outline of what I'm doing:

The large data set, let's call it PAIRS, has three variables: TOKEN, NEXT_TOKEN, and PROBABILITY. The exercise is one of simulation, in that I wish to build strings of tokens based on the content of PAIRS. This works as follows: I pick a random TOKEN-value to initialize a string. I then subset PAIRS to this particular TOKEN-value, which gives me a small data set containing the possible NEXT_TOKEN values, and their relative probabilities. Proceeding, this small data set is fed to Dale McLerran's %RANSAMP macro, which, using the PROBABILITY variable as the statistical weight, produces a random sample of size 1. The NEXT_TOKEN becomes the next token in the output string, and the procedure repeats after replacing TOKEN by NEXT_TOKEN. This goes on and on in a macro loop, until either no matching records are found in PAIRS (i.e. the process stumbles upon a value of NEXT_TOKEN which does not appear as a TOKEN) or until a predefined maximal number of tokens has been generated in the output string.

Keeping this structure in mind, the only improvement that readily presents itself would consist of taking the actual processing that happens in %RANSAMP out of there, and inclu- ding it in the data step where I subset PAIRS on the given TOKEN-value. This would eliminate the I/O associated with creating the small TOKEN / NEXT_TOKEN lookup data set, and I could pass the randomly selected new value on as a macro variable. Come to think of it, I'll just go ahead and do that :-)

Thanks again for your time and thoughts, Koen.

--------------------------------- Koen Vyverman Database Marketing Manager Fidelity Investments - Luxembourg ---------------------------------

> -----Original Message----- > From: Vyverman, Koen [mailto:koen.vyverman@FID-INTL.COM] > Sent: Wednesday, December 12, 2001 15:00 > To: SAS-L@LISTSERV.UGA.EDU > Subject: SASFILE efficiency? > > > LS, > > I would be interested to learn whether anyone has adopted the > SASFILE statement and noted a significant reduction in program > execution time ... > > As it is, I'm having a 100MB indexed SAS data set here, and > a reporting macro crunching its way through it by means of > data steps with subsetting WHERE statements. > > Encouraged by what I read about SASFILE, I decided to try > the following: > sasfile dataset load; > %report(...) > sasfile dataset close; > > And see what happens: nothing much. In fact, whereas my %report > used to take about an hour to run, with the SASFILE statements > it takes on the average 25% _longer_! > > My set-up here is SAS8.2 on WinNT4.0 (SP6), ultra-wide SCSI > hard disk with lots of space, 512MB of RAM. Using the perfor- > mance monitor, I can see that upon loading the dataset into > memory, the expected amount of RAM is being eaten away, so > that part at least works as advertized. > > Would it be unreasonable to suspect that the SAS index file > is actually _not_ being memorized along with the data set, > thereby still necessitating physical disk-reads of said index, > as opposed to the supposedly faster memory access? > > But even then, I fail to comprehend why the process would > overall take longer to run, unless my box here uses some > sort of frighteningly slow RAM ... > > Any input/feedback appreciated, > Koen. > > --------------------------------- > Koen Vyverman > Database Marketing Manager > Fidelity Investments - Luxembourg > --------------------------------- >

Back to: Top of message | Previous page | Main SAS-L page