LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 12 Jul 2007 22:52:33 +1000
Reply-To:     d@dkvj.biz
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David Johnson <d@DKVJ.BIZ>
Subject:      Re: Proc means VS Proc SQL for summarizing
In-Reply-To:  <OF5A1BF798.36FE691A-ON65257315.004B62B5-65257315.004C87D5@MCKINSEY.COM>
Content-Type: text/plain; charset="us-ascii"

I've sniped at these comparisons before because as they stand I don't think they prove anything.

Seeing Sig discount it as well gives me the courage again to suggest this is not a good comparison.

On the face of it, you ran two steps, and got different timing. Have you considered:

. Something may have been using CPU or I/O on your system during the first test, and had gone by the second? What would that do to the timing?

. The second test is using data that was read a moment ago, and is probably still largely resident in memory. What will that do to the I/O part of the step?

I see someone has also challenged your use of TYPES instead of NWay, which would have been my preference, but whether this is probative, only another test might show.

It seems aeons ago I was shown how to do these tests, and it involved a minimum of three, and perhaps more iterations of each step, one after another (not interspersed) and taking a mean value for the run and CPU times to determine expected run times. I was also taught to run some other large step between the two processes to flush the memory and remove the advantage given to the second step.

This gives the memory advantage to second and subsequent executions and equally affects each step. Based on this you now have a foundation for stating one has a benefit over the other.

It may sound quite fussy, but when trying to compare processes to unload data from a SQL server some time ago, I set up a weekend run using SQL Pass through, Library access and SQL access with various options set and changed. Running over 60 hours gave me lots of runs I trapped from a log parse and gave me a clear idea of what worked without being concerned with contention with other users, backups or network processes. The final choice still had to run in peak time, but I was quite certain that at the best of time it was the best, and at the worst time it would perform better than the alternatives. Bear in mind that queuing theory demonstrates that if you place a slow job in the path, everything else will slow down, and your slow job will crawl. Making it efficient affects and benefits everybody, most importantly YOU.

Kind regards

David

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Diwakar Sharma Sent: Wednesday, 11 July 2007 11:56 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Proc means VS Proc SQL for summarizing

Hi SAS-L,

I used to be a firm believer that whatever Proc SQL can do, SAS (DATA & Proc) can do better. Recently I tried performing a summary using both Proc means and Proc SQL, and found Proc SQL taking less time than PROC means.

Here is the code I used

proc means data=test_data MISSING NOPRINT; class category; types category; output out=out_means sum(sales)=sum_sales ; run; proc sql _METHOD; create table out_sql as select category , sum(sales) as sum_sales from test_data group by category; quit; run;

Here is the log

242 proc means data=test_data MISSING NOPRINT; 243 class category; 244 types category; 245 output out=out_means 246 sum(sales)=sum_sales 247 ; 248 run;

NOTE: There were 407649 observations read from the data set WORK.TEST_DATA. NOTE: The data set WORK.OUT_MEANS has 29 observations and 4 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 6.26 seconds cpu time 6.96 seconds

249 proc sql _METHOD; 250 create table out_sql as 251 select category , sum(sales) as sum_sales 252 from test_data 253 group by category;

NOTE: SQL execution methods chosen are:

sqxcrta sqxsumg sqxsort sqxsrc( WORK.TEST_DATA ) NOTE: Table WORK.OUT_SQL created, with 29 rows and 2 columns.

254 quit; NOTE: PROCEDURE SQL used (Total process time): real time 3.51 seconds cpu time 1.54 seconds

254! run;

Could someone please let me know the reason of this difference ??

Best regards, Diwakar Sharma +=========================================================+ This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. +=========================================================+


Back to: Top of message | Previous page | Main SAS-L page