LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2002, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 4 Oct 2002 18:20:24 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: SAS is slow? (123 mb/sec on pc???)
Comments: To: Mauro Morandin <my_family_name@LIBERO.IT>
Content-Type: text/plain; charset="iso-8859-1"

This discussion seems to divide naturally into two threads:

1) compilers vs. database programming environments: Anyone can understand that an organization would prefer to pay once to buy or build a program. Buying and maintaining a programming environment such as SAS, PLSQL/Oracle, MS Office/SQLServer, Focus, or the like takes substantial resources and time to learn. Many organizations are maintaining several programming environments concurrently.

To understand why organizations buy and maintain database programming environments, one only has to consider the useful life of a single compiled application program. I can think of a few that have remained unchanged and in place for a number of years. Typically in those cases, users have to adapt to the programs. In most cases application programs undergo almost continual modification and updating as requirements, data properties, operating systems, and hardware change. Most organizations opt for programming environments that will support application development or database programming or both. In fact, it seems likely that we will see even fewer instances of classic compiled programs as server pages, Java, VB/NET, database access engines,and other late binding methods further blur the distinction between compilers and interpreters. The trend toward running programs under database programming environments rather than as executables on operating systems, plus the fact that neither program compilation nor interpretation (except for XML parsing) takes much CPU or clock time, makes compilation vs. database programming environment a dead issue. Data access middleware and database objects have become necessary extensions of traditional operating systems;

2) relative performance of programming environments:

Which database programming systems an organization buys and maintains, as I see it, has become the central question today. For those of us who work with large and complex collections of data, the SAS programming environment offers both a full procedural programming language, a complete implementation of the primary query language, SQL, and a host of hooks and handles into files, database systems, and other data sources. SAS spans all major computing platforms and operating systems, and it offers advanced statistical and mathematic procedures.

During the last 24 hours I've posted a couple examples at different ends of the database programming spectrum. The response to 'Compare two datasets without re-sorting?' explains how the SAS SQL compiler combines dynamic indexing and scanning behind the scenes to make short work of a common task involving very large sets of data. None of the RDBMS' make it that simple and easy. The response to 'Better way to code this???' demonstrates how to compile a format from data in an ill-structured and highly repetitive program file. SAS not only performs this task in a few fractions of seconds, it also reports errors in data and provides views of results. (It also shows another example of the DoW loop construct.) In the vast regions of computing space outside OLTP databases and PC office environments, SAS rules. Show me the exceptions.

To answer your last question, SAS Mecca isn't anywhere near Hardware Valhalla. I have a faster machine at home than at the office. That forces us to program smarter. Don't make us brag about it ;)

Sigurd the SQLizer

-----Original Message----- From: Mauro Morandin [mailto:my_family_name@LIBERO.IT] Sent: Thursday, October 03, 2002 8:56 PM To: SAS-L@LISTSERV.UGA.EDU Subject: SAS is slow? (123 mb/sec on pc???)

Hi there,

thanks for the many, many ideas and thoughts.

The topic is really interesting and far too broad to be covered in an email or two. I surely feel the need to quantify how fast SAS is compared to other languages, but I want to do it on real problems. So, I really don't understand how you could be so enthusiastic about Dorfman running a useless program and showing everyone that SAS can read a 100MB file in less than a second. Everyone was like "Hurray, SAS is really fast ...." ... at what ????? Reading a file into its input buffer and throwing it away. So what now ???? I already hear you: "But that's what you told us to do?"

But does it make sense just to read it? To try how fast the interpreter is YOU HAVE TO USE THE INTERPRETER ( ... AS MUCH AS YOU CAN WITH DIFFERENT INSTRUCTIONS AND LOOPS). This makes sense to me. And then do the same thing with other languages. This not only makes sure that you USE the interpreter with possibly a lot of different instructions, but also makes sure that YOU don't incur in some I/O bottleneck, which would of course false your results, because you don't want to measure your hard disk/memory speed but the speed of your SAS interpreter.

To all the people who say: "I don't understand why someone should spend it's time to write a program which runs some seconds faster than that?" I answer: "Because this is just a 100Mbyte test program. You see what happens if you have a 100 Gbyte DW? These 2 seconds could become 24 hours. And if you're 24 hours late with your reports they could be useless."

Said that, I explain why I sometimes feel disappointed with the performance of SAS. I'm now a freelance SAS consultant. I have been a SAS employee some years ago, for several years. I don't like people not beeing honest about SAS. And saying that SAS is a compiled language is not honest, because it makes other people (mostly managers) believe that a SAS program is as fast as a program written in C. I have seen SAS "go really fast" with PROC SORT and PROC MEANS. Really fast for me means hitting the I/O bandwidth limit, which can be around 50-100 Mbyte/s for a server PC/UNIX with 4 disks in RAID0. This is enough for a lot of application domains, so I don't feel the need to look for something to speed things up a bit. But SAS is not only PROC's. The problems I have to deal with are mostly DW problems, like building fact tables and dimensions with surrogate keys and a lot of computed variables. The fact tables are big beasts and I find myself looking at the performance monitor on AIX to see what SAS does. I look at the SAS log .... hmmmm ... data step ..... I look at the monitor .... less than 10Mbyte/s .... then ... proc sql ... hmmmm ... 6 tables star schema join .... hmmm .... monitor says .... 5-8Mbyte/s.

My figures on SAS performance on AIX RISC6000 S85 are: PROC SORT (900Mbyte) in 2:00 (2 minutes) DATA STEP (just a set statement) (900Mbyte) 0:20 seconds

These are good figures, but I can't build a DW only with PROC SORT's and SET statements.

I can't show you the code, but you can be pretty sure that I know all the tricks how to write tight SAS code. Moreover, we have five SAS programmers on the project who look into each other's code.

I love SAS, because it makes my life much easier. It is such a powerful framework. But sometimes I feel the need to go faster than that, and I don't like hearing people say that SAS is compiled.

The last thing I did last Wednesday was writing a SAS program (a data step) to split a 660 Mbyte XML file into pieces of 100.000 records each. I can't show you the code, because I don't own the copyright (I'm just the author), but I can surely rewrite the program in Python. This is the first thought I had when I saw the disappointing 1.5-2 Mbyte/s of SAS throughput (on AIX). I have done a similar program in Python some months ago, which did more than 5 Mbyte/s (on my laptop).

Anyway, I will surely send a copy of my python program to SAS-L. By the way: With Python I have the choice to rewrite part of the code in C if I need more speed.

I suppose you also never saw a SAS project reduce it's scope, because SAS + HARDWARE + software requirements were not chosen appropriately. So where are you living, .... in HARDWARE VALHALLA ??? :-)

Ciao. Mauro.

-- mauro morandin SAS consultant red hat certified engineer mauro.morandin%%%%%%%%%%%%%@ieee.org


Back to: Top of message | Previous page | Main SAS-L page