Date: Thu, 12 Jul 2007 22:52:33 +1000
Reply-To: d@dkvj.biz
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David Johnson <d@DKVJ.BIZ>
Subject: Re: Proc means VS Proc SQL for summarizing
In-Reply-To: <OF5A1BF798.36FE691A-ON65257315.004B62B5-65257315.004C87D5@MCKINSEY.COM>
Content-Type: text/plain; charset="us-ascii"
I've sniped at these comparisons before because as they stand I don't think
they prove anything.
Seeing Sig discount it as well gives me the courage again to suggest this is
not a good comparison.
On the face of it, you ran two steps, and got different timing. Have you
considered:
. Something may have been using CPU or I/O on your system during the first
test, and had gone by the second? What would that do to the timing?
. The second test is using data that was read a moment ago, and is probably
still largely resident in memory. What will that do to the I/O part of the
step?
I see someone has also challenged your use of TYPES instead of NWay, which
would have been my preference, but whether this is probative, only another
test might show.
It seems aeons ago I was shown how to do these tests, and it involved a
minimum of three, and perhaps more iterations of each step, one after
another (not interspersed) and taking a mean value for the run and CPU times
to determine expected run times. I was also taught to run some other large
step between the two processes to flush the memory and remove the advantage
given to the second step.
This gives the memory advantage to second and subsequent executions and
equally affects each step. Based on this you now have a foundation for
stating one has a benefit over the other.
It may sound quite fussy, but when trying to compare processes to unload
data from a SQL server some time ago, I set up a weekend run using SQL Pass
through, Library access and SQL access with various options set and changed.
Running over 60 hours gave me lots of runs I trapped from a log parse and
gave me a clear idea of what worked without being concerned with contention
with other users, backups or network processes. The final choice still had
to run in peak time, but I was quite certain that at the best of time it was
the best, and at the worst time it would perform better than the
alternatives. Bear in mind that queuing theory demonstrates that if you
place a slow job in the path, everything else will slow down, and your slow
job will crawl. Making it efficient affects and benefits everybody, most
importantly YOU.
Kind regards
David
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Diwakar Sharma
Sent: Wednesday, 11 July 2007 11:56 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Proc means VS Proc SQL for summarizing
Hi SAS-L,
I used to be a firm believer that whatever Proc SQL can do, SAS (DATA &
Proc) can do better.
Recently I tried performing a summary using both Proc means and Proc SQL,
and found Proc SQL taking less time than PROC means.
Here is the code I used
proc means data=test_data MISSING NOPRINT;
class category;
types category;
output out=out_means
sum(sales)=sum_sales
;
run;
proc sql _METHOD;
create table out_sql as
select category , sum(sales) as sum_sales
from test_data
group by category;
quit; run;
Here is the log
242 proc means data=test_data MISSING NOPRINT;
243 class category;
244 types category;
245 output out=out_means
246 sum(sales)=sum_sales
247 ;
248 run;
NOTE: There were 407649 observations read from the data set
WORK.TEST_DATA.
NOTE: The data set WORK.OUT_MEANS has 29 observations and 4 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 6.26 seconds
cpu time 6.96 seconds
249 proc sql _METHOD;
250 create table out_sql as
251 select category , sum(sales) as sum_sales
252 from test_data
253 group by category;
NOTE: SQL execution methods chosen are:
sqxcrta
sqxsumg
sqxsort
sqxsrc( WORK.TEST_DATA )
NOTE: Table WORK.OUT_SQL created, with 29 rows and 2 columns.
254 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 3.51 seconds
cpu time 1.54 seconds
254! run;
Could someone please let me know the reason of this difference ??
Best regards,
Diwakar Sharma
+=========================================================+
This message may contain confidential and/or privileged
information. If you are not the addressee or authorized to
receive this for the addressee, you must not use, copy,
disclose or take any action based on this message or any
information herein. If you have received this message in
error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.
+=========================================================+