Date: Wed, 11 Jul 2007 14:35:34 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Proc means VS Proc SQL for summarizing
In-Reply-To: <OF5A1BF798.36FE691A-ON65257315.004B62B5-65257315.004C87D5@MCKINSEY.COM>
Content-Type: text/plain; format=flowed
Diwakar ,
Stated:
"I used to be a firm believer that whatever Proc SQL can do, SAS (DATA &
Proc) can do better."
Hmm bad belief. Data Step code and Proc SQL Code dont process information
the same, comparing them properly takes someone who is well extremely good
and knowledgable in both. I know a few things that the SQL does better than
the Data Step and vis versa. What you need to focus on is when to one over
the other. Exploit the pros in each of teh two methods.
Toby Dunn
If anything simply cannot go wrong, it will anyway. Murphys Law #2.
The buddy system is essential to your survival; it gives the enemy somebody
else to shoot at.
Murphys Law #
Tell a man there are 300 billion stars in the universe and he'll believe
you. Tell him a bench has wet paint on it and he'll have to touch to be
sure. Murphys Law #9
From: Diwakar Sharma <Diwakar_Sharma@MCKINSEY.COM>
Reply-To: Diwakar_Sharma@MCKINSEY.COM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Proc means VS Proc SQL for summarizing
Date: Wed, 11 Jul 2007 19:25:55 +0530
Hi SAS-L,
I used to be a firm believer that whatever Proc SQL can do, SAS (DATA &
Proc) can do better.
Recently I tried performing a summary using both Proc means and Proc SQL,
and found Proc SQL taking less time than PROC means.
Here is the code I used
proc means data=test_data MISSING NOPRINT;
class category;
types category;
output out=out_means
sum(sales)=sum_sales
;
run;
proc sql _METHOD;
create table out_sql as
select category , sum(sales) as sum_sales
from test_data
group by category;
quit; run;
Here is the log
242 proc means data=test_data MISSING NOPRINT;
243 class category;
244 types category;
245 output out=out_means
246 sum(sales)=sum_sales
247 ;
248 run;
NOTE: There were 407649 observations read from the data set
WORK.TEST_DATA.
NOTE: The data set WORK.OUT_MEANS has 29 observations and 4 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 6.26 seconds
cpu time 6.96 seconds
249 proc sql _METHOD;
250 create table out_sql as
251 select category , sum(sales) as sum_sales
252 from test_data
253 group by category;
NOTE: SQL execution methods chosen are:
sqxcrta
sqxsumg
sqxsort
sqxsrc( WORK.TEST_DATA )
NOTE: Table WORK.OUT_SQL created, with 29 rows and 2 columns.
254 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 3.51 seconds
cpu time 1.54 seconds
254! run;
Could someone please let me know the reason of this difference ??
Best regards,
Diwakar Sharma
+=========================================================+
This message may contain confidential and/or privileged
information. If you are not the addressee or authorized to
receive this for the addressee, you must not use, copy,
disclose or take any action based on this message or any
information herein. If you have received this message in
error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.
+=========================================================+
_________________________________________________________________
http://newlivehotmail.com