Date: Fri, 4 Oct 2002 21:46:18 GMT
Reply-To: Mauro Morandin <second_name@LIBERO.IT>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mauro Morandin <second_name@LIBERO.IT>
Subject: Re: SAS is slow? (123 mb/sec on pc???)
Content-Type: text/plain; charset=us-ascii; format=flowed
Gregg P. Snell wrote:
> Have you ever benchmarked SAS on HW/SW platforms other than AIX? I've been
> using sas for over 20 years now on OS/390, HP/UX, OpenVMS, Solaris, Mac,
> Windoze, and lately, AIX. And the OS which has given me the most fits, with
> respect to performance tuning, is AIX. Primarily because AIX 4.x.x insists
> on caching any file that it reads. So if you merge data set A with B to get
> C, the silly OS will attempt to cache all three of them and will not release
> the data until the job is finished.
I add: The silly OS will not release the data at all. Unless some
application program asks for memory ... then it will release it's I/O
buffers starting from the oldest. If you have a lot of RAM and few
processes, by tomorrow A, B and C are still there.
> Now, make datasets A and B multi-gb
> views, one to an Oracle table no less, and watch your system totally thrash
> in VM. While Solaris or even Windoze will happily process each file
> sequentially and actually run faster than AIX, even with huge Shark
> Me thinks your problem is with the a transaction-optimized OS like AIX,
> rather than with SAS.
> Gregg P. Snell
> Data Savant Consulting
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
> Mauro Morandin
> Sent: Thursday, October 03, 2002 7:56 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: SAS is slow? (123 mb/sec on pc???)
> Hi there,
> thanks for the many, many ideas and thoughts.
> The topic is really interesting and far too broad to be covered
> in an email or two. I surely feel the need to quantify how fast
> SAS is compared to other languages, but I want to do it on real
> problems. So, I really don't understand how you could be so enthusiastic
> about Dorfman running a useless program and showing everyone
> that SAS can read a 100MB file in less than a second. Everyone was like
> "Hurray, SAS is really fast ...." ... at what ????? Reading a file into
> its input buffer and throwing it away. So what now ????
> I already hear you: "But that's what you told us to do?"
> But does it make sense just to read it? To try how fast the interpreter
> is YOU HAVE TO USE THE INTERPRETER ( ... AS MUCH AS YOU CAN WITH
> DIFFERENT INSTRUCTIONS AND LOOPS). This makes sense to me. And then do
> the same thing with other languages. This not only makes sure that you
> USE the interpreter with possibly a lot of different instructions, but
> also makes sure that YOU don't incur in some I/O bottleneck, which would
> of course false your results, because you don't want to measure your
> hard disk/memory speed but the speed of your SAS interpreter.
> To all the people who say: "I don't understand why someone should spend
> it's time to write a program which runs some seconds faster than that?"
> I answer: "Because this is just a 100Mbyte test program. You see what
> happens if you have a 100 Gbyte DW? These 2 seconds could become 24
> hours. And if you're 24 hours late with your reports they could be useless."
> Said that, I explain why I sometimes feel disappointed with the
> performance of SAS. I'm now a freelance SAS consultant. I have been a
> SAS employee some years ago, for several years. I don't like people not
> beeing honest about SAS. And saying that SAS is a compiled language is
> not honest, because it makes other people (mostly managers) believe that
> a SAS program is as fast as a program written in C.
> I have seen SAS "go really fast" with PROC SORT and PROC MEANS. Really
> fast for me means hitting the I/O bandwidth limit, which can be around
> 50-100 Mbyte/s for a server PC/UNIX with 4 disks in RAID0. This is
> enough for a lot of application domains, so I don't feel the need to
> look for something to speed things up a bit. But SAS is not only PROC's.
> The problems I have to deal with are mostly DW problems, like building
> fact tables and dimensions with surrogate keys and a lot of computed
> variables. The fact tables are big beasts and I find myself looking at
> the performance monitor on AIX to see what SAS does. I look at the SAS
> log .... hmmmm ... data step ..... I look at the monitor .... less than
> 10Mbyte/s .... then ... proc sql ... hmmmm ... 6 tables star schema join
> ...... hmmm .... monitor says .... 5-8Mbyte/s.
> My figures on SAS performance on AIX RISC6000 S85 are:
> PROC SORT (900Mbyte) in 2:00 (2 minutes)
> DATA STEP (just a set statement) (900Mbyte) 0:20 seconds
> These are good figures, but I can't build a DW only with PROC SORT's and
> SET statements.
> I can't show you the code, but you can be pretty sure that I know all
> the tricks how to write tight SAS code. Moreover, we have five SAS
> programmers on the project who look into each other's code.
> I love SAS, because it makes my life much easier. It is such a powerful
> framework. But sometimes I feel the need to go faster than that, and I
> don't like hearing people say that SAS is compiled.
> The last thing I did last Wednesday was writing a SAS program (a data
> step) to split a 660 Mbyte XML file into pieces of 100.000 records each.
> I can't show you the code, because I don't own the copyright (I'm just
> the author), but I can surely rewrite the program in Python. This is the
> first thought I had when I saw the disappointing 1.5-2 Mbyte/s of SAS
> throughput (on AIX). I have done a similar program in Python some months
> ago, which did more than 5 Mbyte/s (on my laptop).
> Anyway, I will surely send a copy of my python program to SAS-L. By the
> way: With Python I have the choice to rewrite part of the code in C if I
> need more speed.
> I suppose you also never saw a SAS project reduce it's scope, because
> SAS + HARDWARE + software requirements were not chosen appropriately. So
> where are you living, .... in HARDWARE VALHALLA ??? :-)
> mauro morandin
> SAS consultant
> red hat certified engineer
red hat certified engineer
matrix srl via postioma di salvarosa 25b tel +39-423-724620
31033 castelfranco veneto (tv) fax +39-423-770798