LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2000, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 10 Jul 2000 22:57:55 GMT
Reply-To:   sashole@mediaone.net
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:   Re: index variable by group
Comments:   To: yorgiv@MY-DEJA.COM
Content-Type:   text/plain; format=flowed

Yorgi,

One way of doing this is

data a; do index=1 by 1 until(last.b); set a; by b; output; end; run;

Another way would be

data a; set a; by b; if not first.b then index ++ 1; else index = 1; run;

Yet another way could be

data a; set a; by b; if first.b then index = 0; index ++ 1; run;

To say nothing of... well, I am sure you see by now how many ways are there to do this simple thing in SAS, but they basically boil down to the following: For the first record in a by-group, set the index variable to 1, and add 1 for each subsequent record. Or _before_ the first record in a group, set the index to 0, and then increment for each subsequent record (including the first one).

On the efficiency side, the first solution will run fastest because both setting index to 1 and incrementing it are done implicitly by the iterative DO, which works a little bit more rapidly than doing the same "by hand". Also, until/while conditionals are evaluated faster than explicit IFs. The second solution will be second fastest, because in the if-then-else, most frequent condition is evaluated first (provided that a by-group contains more than one record). And the third one will be the slowest, for it has one extra sum operation per by-group.

All that having been said, the performance differences can be noticed only with a lot of distinct B-values, and even then they are subtle. For instance, for 1,000,000 by-groups with b=1,2,3...1e6, and 10 observations in each group, my s/390 machine executes the above in 16.97, 17.29, and 17.40 CPU seconds, respectively. Given a difference this vanishingly negligible, the question is, who cares? The way I see the answer is that saving a CPU second at the expense of huge intellectual resources and programming time is a waste, but since the three pieces above differ by a mere flick of thought, why not settle for the fastest one, all the more that the knowledge acquired as a result of reflections upon the number and cost of instructions we make computer execute might turn out to be beneficial under more performance-sensitive circumstances, where it might really matter.

Kind regards, ==================== Paul M. Dorfman Jacksonville, Fl ====================

>From: yorgiv@MY-DEJA.COM > >I would like to find a way to create an index variable in a single data >set by a group. An example will help of course: > >A B >5 1 >3 2 >12 2 >8 1 >1 2 >2 1 > >then i do this: > >proc sort; > by B A; >run; >Now what I want is this: > >B A index >1 2 1 >1 5 2 >1 8 3 >2 1 1 >2 3 2 >2 12 3 > >I want to do this so that I can then use the ranuni function to >randomly select an A from each group of B. But I have trouble creating >the index variable as above. ________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com


Back to: Top of message | Previous page | Main SAS-L page