LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 14 Dec 2004 08:36:52 -0600
Reply-To:     "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Subject:      Re: Proc Summary vs. Means run times
Comments: To: "Sharma, Diwakar (Corporate)" <diwakar.sharma@GE.COM>
Content-Type: text/plain; charset="US-ASCII"

Sharma,

Neither proc summary nor proc means requires a sort step regardless of whether you use a 'BY' or 'class' statement. You may be saying don't you have to atleast have one when using a 'BY' statement, I say no, simple do 'by <some variable(s)> notsorted', SAS wont complain. However the resulting data set or listing may have unexpected results.

Toby Dunn

"It's OK to figure out murder mysteries, but you shouldn't need to figure out code. You should be able to read it." -Steve C McConnell

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sharma, Diwakar (Corporate) Sent: Tuesday, December 14, 2004 7:26 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Proc Summary vs. Means run times

Hi,

Just one question on the issue. Proc summary requires a Sort step preceding it, whereas Proc Means does not (given that u are using class statement). Should we not consider this while comparing the run times ???

Regards, Diwakar

GECIS __________________________________ Diwakar Sharma GECIS Analytics 2nd Floor (Bay 9), Surya Park, 99, Electronic City, Bangalore 560 100 INDIA Phone: +91-80-28528700 X 8706; Mobile: +91-80-9342529595 <http://web.analytics.gecis.capital.ge.com/> Disclaimer This e-mail, together with any attachments, is confidential. It may be read, copied and used only by the intended recipient. If you have received it in error, please notify the sender immediately by e-mail or telephone. Then please delete it from your computer without making any copies or disclosing it to any other person. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Mike Rhoads Sent: Tuesday, December 14, 2004 6:45 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Proc Summary vs. Means run times

I certainly concur with David Cassell's recommendation of SAS user group meetings at various levels as a great way to keep up with the real "inside information" on SAS. In addition to presentations by SAS developers, the demo areas are a great chance to meet some of these folks and ask questions. They are very articulate and enthusiastic about the software they have created (as well they should be!).

As far as MEANS vs. SUMMARY goes, though, you don't need to be much of an insider. The Overview page of the PROC SUMMARY documentation contains the following:

"The SUMMARY procedure is very similar to the MEANS procedure. Except for the differences discussed in the following section, all the information in The MEANS Procedure also applies to PROC SUMMARY."

The reasons for this apparent duplication were explained well by Ian in his earlier message.

Mike Rhoads Westat RhoadsM1@Westat.com

-----Original Message----- From: Michael Murff [mailto:mjm33@msm1.byu.edu] Sent: Monday, December 13, 2004 5:59 PM To: SAS-L@LISTSERV.UGA.EDU; Mike Rhoads Subject: Re: Proc Summary vs. Means run times

Hi Mike,

How does one become privy to what SAS does behind the scenes? Have they revealed some of their source code, presumably written in C? I thought they kept such under very tight lock and key due to competitors like SPSS and STATA. My understanding is that the procs are pre-compiled binaries, and that datastep code is sort of translated down to C syntax. Could you elaborate or refer me to other sources (papers) that would have more info. as to what goes on "under the hood" when a SAS proc or datastep code is submitted.

Thanks,

Michael Murff

PS--Perhaps I should relist this under a new topic, but I'll have to consult the said SAS etiquette paper, to be sure about that :)

>>> Mike Rhoads <RHOADSM1@WESTAT.COM> 12/13/2004 3:46:44 PM >>> Dave,

Welcome to the group!

Actually, PROC MEANS and PROC SUMMARY run exactly the same code behind the scenes. There are a couple of very minor differences, mainly that by default PROC MEANS produces printed output and PROC SUMMARY does not.

So I suspect the differences you are seeing in output format and execution time are because you are using a BY statement in your PROC MEANS vs. a CLASS statement in PROC SUMMARY. Try using the same statement in both, and you should get identical output and near-identical run times.

Mike Rhoads Westat RhoadsM1@Westat.com

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of David Meyer Sent: Monday, December 13, 2004 5:32 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Proc Summary vs. Means run times

Hi SASLers,

As a new-ish SAS guy, I have been following the SASL discussion as much as I can and I have been learning a lot (THANKS ALL). As I have been improving, I have caught the "try to write tighter code" bug from some of you and since I am working with large data sets (millions of records each), reducing run time is a very practical obsession to have.

I have recently discovered Proc Summary and been playing with it and Proc Means. I think that I found Summary to be about 35 to 45% of the run time of Proc Means (plus I like the "class variable crude" summary data line in Proc Summary and I like the way the data is displayed in the output window better then Means). If all I wanted is basic summary stats (mean min max std) should I always be using Summary going forward? Am I making any assumptions that I should worry about / or are incorrect? Do any of you suggest places for me to go and read up on these basic statistical Procs?

TIA and thanks for all of your discussion on other topics,

Dave

Below are the code and log results:

625 proc summary data=visit_sum missing print; 626 class member_no; 627 var day_diff ; 628 output out=diffs mean=Mean std=STDev ; 629 run;

NOTE: There were 48 observations read from the dataset WORK.VISIT_SUM. NOTE: The data set WORK.DIFFS has 13 observations and 5 variables. NOTE: PROCEDURE SUMMARY used: real time 0.62 seconds cpu time 0.05 seconds

630 631 632 proc means data=visit_sum missing print; 633 by member_no; 634 var day_diff ; 635 output out=diffs1 mean=Mean std=STDev ; 636 run;

NOTE: There were 48 observations read from the dataset WORK.VISIT_SUM. NOTE: The data set WORK.DIFFS1 has 12 observations and 5 variables. NOTE: PROCEDURE MEANS used: real time 0.28 seconds cpu time 0.03 seconds


Back to: Top of message | Previous page | Main SAS-L page