|
Yorgi,
One way of doing this is
data a;
do index=1 by 1 until(last.b);
set a; by b;
output;
end;
run;
Another way would be
data a;
set a; by b;
if not first.b then index ++ 1;
else index = 1;
run;
Yet another way could be
data a;
set a; by b;
if first.b then index = 0;
index ++ 1;
run;
To say nothing of... well, I am sure you see by now how many ways are there
to do this simple thing in SAS, but they basically boil down to the
following: For the first record in a by-group, set the index variable to 1,
and add 1 for each subsequent record. Or _before_ the first record in a
group, set the index to 0, and then increment for each subsequent record
(including the first one).
On the efficiency side, the first solution will run fastest because both
setting index to 1 and incrementing it are done implicitly by the iterative
DO, which works a little bit more rapidly than doing the same "by hand".
Also, until/while conditionals are evaluated faster than explicit IFs. The
second solution will be second fastest, because in the if-then-else, most
frequent condition is evaluated first (provided that a by-group contains
more than one record). And the third one will be the slowest, for it has one
extra sum operation per by-group.
All that having been said, the performance differences can be noticed only
with a lot of distinct B-values, and even then they are subtle. For
instance, for 1,000,000 by-groups with b=1,2,3...1e6, and 10 observations in
each group, my s/390 machine executes the above in 16.97, 17.29, and 17.40
CPU seconds, respectively. Given a difference this vanishingly negligible,
the question is, who cares? The way I see the answer is that saving a CPU
second at the expense of huge intellectual resources and programming time is
a waste, but since the three pieces above differ by a mere flick of thought,
why not settle for the fastest one, all the more that the knowledge acquired
as a result of reflections upon the number and cost of instructions we make
computer execute might turn out to be beneficial under more
performance-sensitive circumstances, where it might really matter.
Kind regards,
====================
Paul M. Dorfman
Jacksonville, Fl
====================
>From: yorgiv@MY-DEJA.COM
>
>I would like to find a way to create an index variable in a single data
>set by a group. An example will help of course:
>
>A B
>5 1
>3 2
>12 2
>8 1
>1 2
>2 1
>
>then i do this:
>
>proc sort;
> by B A;
>run;
>Now what I want is this:
>
>B A index
>1 2 1
>1 5 2
>1 8 3
>2 1 1
>2 3 2
>2 12 3
>
>I want to do this so that I can then use the ranuni function to
>randomly select an A from each group of B. But I have trouble creating
>the index variable as above.
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
|