Date: Tue, 14 Dec 2004 08:36:52 -0600
Reply-To: "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Subject: Re: Proc Summary vs. Means run times
Content-Type: text/plain; charset="US-ASCII"
Sharma,
Neither proc summary nor proc means requires a sort step regardless of
whether you use a 'BY' or 'class' statement. You may be saying don't
you have to atleast have one when using a 'BY' statement, I say no,
simple do 'by <some variable(s)> notsorted', SAS wont complain. However
the resulting data set or listing may have unexpected results.
Toby Dunn
"It's OK to figure out murder mysteries, but you shouldn't need to
figure out code. You should be able to read it." -Steve C McConnell
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Sharma, Diwakar (Corporate)
Sent: Tuesday, December 14, 2004 7:26 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Proc Summary vs. Means run times
Hi,
Just one question on the issue.
Proc summary requires a Sort step preceding it, whereas Proc Means does
not (given that u are using class statement). Should we not consider
this while comparing the run times ???
Regards,
Diwakar
GECIS
__________________________________
Diwakar Sharma
GECIS Analytics
2nd Floor (Bay 9), Surya Park, 99, Electronic City,
Bangalore 560 100 INDIA
Phone: +91-80-28528700 X 8706;
Mobile: +91-80-9342529595 <http://web.analytics.gecis.capital.ge.com/>
Disclaimer
This e-mail, together with any attachments, is confidential. It may be
read, copied and used only by the intended recipient. If you have
received it in error, please notify the sender immediately by e-mail or
telephone. Then please delete it from your computer without making any
copies or disclosing it to any other person. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly
forbidden.
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Mike
Rhoads
Sent: Tuesday, December 14, 2004 6:45 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Proc Summary vs. Means run times
I certainly concur with David Cassell's recommendation of SAS user group
meetings at various levels as a great way to keep up with the real
"inside information" on SAS. In addition to presentations by SAS
developers, the demo areas are a great chance to meet some of these
folks and ask questions. They are very articulate and enthusiastic about
the software they have created (as well they should be!).
As far as MEANS vs. SUMMARY goes, though, you don't need to be much of
an insider. The Overview page of the PROC SUMMARY documentation
contains the
following:
"The SUMMARY procedure is very similar to the MEANS procedure. Except
for the differences discussed in the following section, all the
information in The MEANS Procedure also applies to PROC SUMMARY."
The reasons for this apparent duplication were explained well by Ian in
his earlier message.
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: Michael Murff [mailto:mjm33@msm1.byu.edu]
Sent: Monday, December 13, 2004 5:59 PM
To: SAS-L@LISTSERV.UGA.EDU; Mike Rhoads
Subject: Re: Proc Summary vs. Means run times
Hi Mike,
How does one become privy to what SAS does behind the scenes? Have they
revealed some of their source code, presumably written in C? I thought
they kept such under very tight lock and key due to competitors like
SPSS and STATA. My understanding is that the procs are pre-compiled
binaries, and that datastep code is sort of translated down to C syntax.
Could you elaborate or refer me to other sources (papers) that would
have more info. as to what goes on "under the hood" when a SAS proc or
datastep code is submitted.
Thanks,
Michael Murff
PS--Perhaps I should relist this under a new topic, but I'll have to
consult the said SAS etiquette paper, to be sure about that :)
>>> Mike Rhoads <RHOADSM1@WESTAT.COM> 12/13/2004 3:46:44 PM >>>
Dave,
Welcome to the group!
Actually, PROC MEANS and PROC SUMMARY run exactly the same code behind
the scenes. There are a couple of very minor differences, mainly that
by default PROC MEANS produces printed output and PROC SUMMARY does not.
So I suspect the differences you are seeing in output format and
execution time are because you are using a BY statement in your PROC
MEANS vs. a CLASS statement in PROC SUMMARY. Try using the same
statement in both, and you should get identical output and
near-identical run times.
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
David Meyer
Sent: Monday, December 13, 2004 5:32 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Proc Summary vs. Means run times
Hi SASLers,
As a new-ish SAS guy, I have been following the SASL discussion as much
as I can and I have been learning a lot (THANKS ALL). As I have been
improving, I have caught the "try to write tighter code" bug from some
of you and since I am working with large data sets (millions of records
each), reducing run time is a very practical obsession to have.
I have recently discovered Proc Summary and been playing with it and
Proc Means. I think that I found Summary to be about 35 to 45% of the
run time of Proc Means (plus I like the "class variable crude" summary
data line in Proc Summary and I like the way the data is displayed in
the output window better then Means). If all I wanted is basic summary
stats (mean min max
std) should I always be using Summary going forward? Am I making any
assumptions that I should worry about / or are incorrect? Do any of you
suggest places for me to go and read up on these basic statistical
Procs?
TIA and thanks for all of your discussion on other topics,
Dave
Below are the code and log results:
625 proc summary data=visit_sum missing print;
626 class member_no;
627 var day_diff ;
628 output out=diffs mean=Mean std=STDev ;
629 run;
NOTE: There were 48 observations read from the dataset WORK.VISIT_SUM.
NOTE: The data set WORK.DIFFS has 13 observations and 5 variables.
NOTE: PROCEDURE SUMMARY used:
real time 0.62 seconds
cpu time 0.05 seconds
630
631
632 proc means data=visit_sum missing print;
633 by member_no;
634 var day_diff ;
635 output out=diffs1 mean=Mean std=STDev ;
636 run;
NOTE: There were 48 observations read from the dataset WORK.VISIT_SUM.
NOTE: The data set WORK.DIFFS1 has 12 observations and 5 variables.
NOTE: PROCEDURE MEANS used:
real time 0.28 seconds
cpu time 0.03 seconds