Date: Tue, 5 Aug 2003 10:42:19 -0700
Reply-To: "Huang, Ya" <yhuang@AMYLIN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Huang, Ya" <yhuang@AMYLIN.COM>
Subject: Re: List manipulation
Content-Type: text/plain; charset="iso-8859-1"
Robert,
Here is what I just came up:
data xx;
input Set Value $;
cards;
1 A
1 B
1 C
2 D
3 A
3 F
4 H
4 I
4 J
4 K
5 K
5 A
;
proc sql;
create table grp as
select value,set, case when grp=' ' then max(grp) else grp end as grp
from (select value, set,
case when count(*) > 1 then compress(value||put(count(*),best.)) end as grp
from xx
group by value)
group by set
order by grp,set,value
;
data setlst (keep=grp newset);
set grp;
length newset $50;
retain newset;
by grp set;
if first.grp then newset=trim(left(put(set,best.)));
else if compress(put(set,best.)) ne scan(newset,-1,'_') then
newset=compress(newset||'_'||put(set,best.));
if last.grp then output;
run;
data grp (drop=grp);
merge grp setlst;
by grp;
run;
proc print;
run;
----------------------
Obs Value Set newset
1 D 2 2
2 A 1 1_3_5
3 B 1 1_3_5
4 C 1 1_3_5
5 A 3 1_3_5
6 F 3 1_3_5
7 A 5 1_3_5
8 H 4 4_5
9 I 4 4_5
10 J 4 4_5
11 K 4 4_5
12 K 5 4_5
Note, the proc sql step already regrouped the sets,
but with different names. The next two data steps
are designed to get the set names like your sample data.
Also note that I added a new set 5 with two records,
one is linked to set 1 via value='A', another
is linked to set 4 via value='K', the final result
split these two, so that one is grouped with 1,3 as 1,3,5.
Another is grouped with 4 as 4,5.
Kind regards,
Ya Huang
-----Original Message-----
From: Robert Stratton [mailto:rstratton@PHD.CO.UK]
Sent: Tuesday, August 05, 2003 7:54 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: List manipulation
All
I have a number of subsets of values within a table, membership of a
set is indicated by a "set" variable- e.g:
Set Value
1 A
1 B
1 C
2 D
3 A
3 F
4 H
4 I
4 J
4 K
If any value in any set is the same as any value in any other set, I
need to create a new set which incorporates both sets of values-
including those that don't match: e.g:
Set Value
1_3 A
1_3 B
1_3 C
1_3 A
1_3 F
2 D
4 H
4 I
4 J
4 K
What the new set is called and whether it contains duplicates are
arbitrary.
I have the feeling this is part of some more classical problem, has
anyone done something similar? I've tried various macro driven sql
arrangements, but with no luck.
Thanks for your help.
Robert