LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 23 Jun 2003 14:32:09 -0400
Reply-To:   sashole@bellsouth.net
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization:   Sashole of Florida
Subject:   Re: Proc Means and Sorting
Comments:   To: Jack Hamilton <JackHamilton@firsthealth.com>
In-Reply-To:   <sef6e7c7.006@firsthealth.com>
Content-Type:   text/plain; charset="us-ascii"

> -----Original Message----- > From: Jack Hamilton [mailto:JackHamilton@firsthealth.com] > > PROC MEANS is one of the procedures which can use > multithreading in version 9. The thread sorting BY-group X > might finish before the thread sorting BY-group A, so your > paragraph 1) below doesn't apply.

Jack,

Hmmm... If a BY is being used, then MEANS accepts the data either already sorted or grouped (with the NOTSORTED option), so there is no need to sort them. Now it can use the THREADS option to act according to the SAS log

NOTE: Multiple concurrent threads will be used to summarize data.

But I find it rather incredible if SAS fails to interleave the threads being summarized into the original order. As already been written, there is no in-your-face evidence of this in the docs, but let us look at it this way. Suppose we have an unordered input and use the CLASS statement. In this case, MEANS actually does sort the input [implicitly, by populating its internal AVL tree(s), whence the nodes are returned in key order]. And we know it is guaranteed that in such a case, the aggregated output will be physically ordered by the CLASS variables, and of course if will happen regardless of whether THREADS of NOTHREADS is used, otherwise the procedure would deliver inconsistent results. Further, we know that BY and CLASS will produce the same key-order output if the input is sorted beforehand. I surely would expect it to be the case irrespective of whether I used THREADS to improve the performance or not!

On the practical side, I have just run MEANS against a sizeable test input (~ 10m obs divided into ~10k groups by a distinct key to make the use of the multiple threads - in my case two - more pronounced) a number of times testing for all the cases mentioned above, with ordered, unordered, and grouped input, with CLASS and BY (including NOTSORTED), with THREADS and NOTHREADS. Saving the -l from checking out my logs, let me just distill the results into the satements that the output has always come in the expected order, that is:

1) If the input is sorted and BY is used, or CLASS is used (regardless of the input order), the aggregated output is always in the BY (CLASS) variables order. 2) If the input is grouped and BY is used with NOTSORTED, the input key order is strictly maintained.

Kind regards, ------------------- Paul M. Dorfman Jacksonville, FL -------------------

> > I'd also be quite surprised if PROC MEANS stops working in > the expected manner, but I don't see a guarantee in the > documentation that it won't. > Maybe I'm just overlooking something obvious, as this seems > to be one of the fundamental characteristics of SAS processing. > > > > > -- > JackHamilton@FirstHealth.com > Manager, Technical Development > Metrics Department, First Health > West Sacramento, California USA > > >>> paul_dorfman@HOTMAIL.COM 06/23/2003 10:26 AM >>> > Matt, > > I do not think that what you require is stated in the > documentation as a separate, explicit paragraph, but I also > think that it is not necessary. The documentation, in part, > does say that > > "Comparison of the BY and CLASS Statements > Using the BY statement is similar to using the CLASS > statement and the NWAY option in that PROC MEANS summarizes > each BY group as an independent subset of the input data... > However, unlike the CLASS statement, the BY statement > requires that you previously sort BY variables." > > From which it follows that: > > 1) With BY, input is processed one BY-group at a time. I > cannot think of any concievable reason why any two BY-groups > should be processed out of input order. (Somewhat > counter-analogically to Proc SORT NOEQUALS, where not > maintaining the relative order of the records within the same > BY-group may be used to improve performance). > > 2) The doc's statement "BY statement requires that you > previously sort BY variables" is ionly accurate if the > NOTSORTED option is not used. Otherwise, the only actual > requirement is that the BY-variables be *grouped*, in which > case it goes without saying that the input order of the > BY-variables will be maintained in the output. > > As I've never observed any deviations from this [expected] > behavior, I would be quite surprised to see an evidence to > the contrary. > > Kind regards, > --------------------------- > Paul M. Dorfman > Jacksonville, FL > --------------------------- > > > > > > >From: m n <iced_phoenix@YAHOO.COM> > >Reply-To: m n <iced_phoenix@YAHOO.COM> > > > >Dear c.s.sas, > > > >Does SAS documentation (V8) make any guarantee that an > output dataset > >from proc means will maintain the same sort order as the original > dataset? > >In other words, if I give proc means a dataset sorted by x1, x2, x3 > and > >set the by group to x1, x2, x3, am I guaranteed that the output set > will > >remain sorted (though summarized) ? > > > >Code Example: > > > > proc sort data=test; > > by x1, x2, x3; > > run; > > > > proc means data=test sum; > > by x1, x2, x3; > > var x4; > > output out=test2 sum=; > > run; > > > > /* Must I sort test2 by x1, x2, x3 here to guarantee a sorted > dataset? > >*/ > > > >I would greatly appreciate a quote from SAS documentation > that answers > this > >question. Thank you all for your help. > > > >Matt > > _________________________________________________________________ > Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail


Back to: Top of message | Previous page | Main SAS-L page