| Date: | Wed, 28 Jan 2004 14:42:37 GMT |
| Reply-To: | julierog@ix.netcom.com |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Roger Lustig <trovato@VERIZON.NET> |
| Subject: | Re: SAS Performance |
| Content-Type: | text/plain; charset=us-ascii; format=flowed |
Ben:
Some things to think about:
--PROC FREQ will, if it can, build its table in memory. That table
needs at least 16 bytes/value, so with 1E7 items, you're looking at
160MB of RAM. Perhaps more than was available, so lots of swapping
to/from virtual memory is suddenly taking place.
--I don't know how PROC FREQ builds the table, but it's either doing an
insertion sort of some kind or building a separate index. The former
uses time on the order of N^2; the latter uses more RAM.
--Even V9, with SYNCSORT, can't break the sort barrier of O(n*log(n)).
--If you must do something like this, remember that:
----RAM is cheap.
----So are hard drives.
----SASFILE will load a file into memory.
----If you don't need the variable I, you can drop it.
Best,
Roger
PS: Are you doing this to see whether RANUNI will repeat itself? Given
the precision in a SAS numeric value, that's going to take a *lot* of
cases, unless (as someone suggested) you round.
ben.powell@CLA.CO.UK wrote:
> SAS operation I know won't always increase by the same factor as the number
> of operations, but I was surprised this clunked out on me after I gave up
> waiting for the proc freq to finish when after 1 hour 48 minutes it had
> still only read 0.68 of the observations. P4 2.4 512mb 40Gb single HDD
> (ATA). What exactly was going on here? Reducing the obs to 1E6 from 10E6 is
> handled in just 25 secs. Obviously the hdd was being crunched, as was ram
> with the P4 twiddling its thumbs, but I can't understand why the volume of
> data to write was so huge (1:48/0.25 = 26000%) - is this not unreasonable?
>
> data a;
> i = 0;
> do until (i = 10000000);
> x = ranuni(12345);
> output;
> i = i + 1;
> end;
> run;
>
> proc freq data = a noprint;
> table x / out = b;
> run;
>
> proc sort data = b;
> by descending count;
> run;
|