|
On Wed, 24 Oct 2007 00:12:24 -0700, a320@HOTMAIL.COM wrote:
>Hello,
>
>My data has the following format:
>Data A;
>ID Year Type
>1 1999 A
>1 2000 A
>1 2001 B
>1 2001 C
>2 1988 H
>3 1989 C
>4 2001 G
>4 1998 Y
>5 2001 B
>
>
>
>I want to select a random 20% sample of the IDs.
>
>So for example,
>
>The output could be:
>
>4 2001 G
>4 1998 Y
>
>or the output could be:
>
>5 2001 B
>
>
>The way I approach it is:
>Data B;
>set A;
>by ID;
>retain X;
>if first.ID then X = ranuni(4544);
>run;
>
>Data C;
>set B;
>if X < 0.20 then output;
>end;
>
>This way I would extract 20% of the IDs. My question is: is there a
>better/more efficient way to do this?
>
>Thanks.
Combine the two DATA steps into one.
|