|
On Thu, 24 Jul 2008 16:23:05 -0700, Matt <mmandel2@NYC.RR.COM> wrote:
>I have a table with many duplicate key values (se_no). When I
>encounter a duplicate, I need to determine which one to keep based on
>the values in id1 and id2. The optimal observation is one in which
>both values are 0; next best is 0 and 6; last is 6 and 0. I need a
>method for reading in each observation until I reach one that I can
>write to a new dataset. Here's a sample of the data:
>
>input se_no $10. id1 id2
>;
>datalines;
>2200156354 . .
>2200156446 0 0
>2200156446 0 6
>2200156446 0 6
>2200156446 6 6
>2200156453 . .
>2200156495 0 6
>2200156495 6 6
>2200156495 0 6
>2200156495 6 6
>2200156511 . .
>;
>
>I would greatly appreciate any suggestions from you experts out there.
>
>Thanks!
>Matt
This SQL solution seems rather natural:
proc sql;
create table test2(drop = opt) as
select distinct * ,
case cats(id1,id2) when '00' then 'A'
when '06' then 'B'
when '60' then 'C'
when '66' then 'D'
else 'Z'
end as opt
from test
group by se_no
having opt = min(opt);
quit;
|