|
Sid,
If you want to include all cities within a distance of 5000 miles from
each other, then I don't see how a cluster algorithm would help.
I left out some code because all of your input data was already sorted by
distance, and I changed city to c1 and added a distance of 0 to further
simplify the code.
The only thing I added was a proc sort do get rid of duplicate groups:
data have (drop=i);
infile cards dlm = ',' dsd missover;
array cities(*) $ c1-c9;
array distance(*) d1-d9;
input c1 $ d1 c2 $ d2 c3 $ d3 c4 $ d4 c5 $
d5 c6 $ d6 c7 $ d7 c8 $ d8 c9 $ d9;
do i=1 to 9;
if distance(i) gt 5000 then do;
call missing (cities(i));
call missing (distance(i));
end;
end;
cards;
Berlin,0,Cairo,1795,Calcutta,4368,Chicago,4405,Caracas,5247,HK,5440,Cape,59
81,Honolulu,7309,BA,7402
BA,0,Caracas,3168,Cape,4269,Chicago,5598,Cairo,7345,Berlin,7402,Honolulu,75
61,Calcutta,10265,HK,11472
Cairo,0,Berlin,1795,Calcutta,3539,Cape,4500,HK,5061,Chicago,6129,Caracas,63
38,BA,7345,Honolulu,8838
Calcutta,0,HK,1648,Cairo,3539,Berlin,4368,Cape,6024,Honolulu,7047,Chicago,7
980,Caracas,9605,BA,10265
Cape,0,BA,4269,Cairo,4500,Berlin,5981,Calcutta,6024,Caracas,6365,HK,7375,Ch
icago,8494,Honolulu,11534
Caracas,0,Chicago,2501,BA,3168,Berlin,5247,Honolulu,6013,Cairo,6338,Cape,63
65,Calcutta,9605,HK,10167
Chicago,0,Caracas,2501,Honolulu,4250,Berlin,4405,BA,5598,Cairo,6129,HK,7793
,Calcutta,7980,Cape,8494
HK,0,Calcutta,1648,Cairo,5061,Berlin,5440,Honolulu,5549,Cape,7375,Chicago,7
793,Caracas,10167,BA,11472
Honolulu,0,Chicago,4250,HK,5549,Caracas,6013,Calcutta,7047,Berlin,7309,BA,7
561,Cairo,8838,Cape,11534
;
proc sort data=have out=want nodupkey;
by c1 c2 c3 c4 c5 c6 c7 c8 c9;
run;
Does that accomplish what you want?
Art
--------
On Sat, 20 Dec 2008 14:14:43 -0500, Sid N <nsid31@GMAIL.COM> wrote:
>Hi,
>
>I am trying to group a list of cities based on distances from each other.
>The distance matrix as provided in the below dataset shows distances from
>each city to the eight closest cities (c1-c8). The corresponding distances
>(in ascending order) to each of these cities are in columns d1-d8.
>
>data have;
>infile cards dlm = ',' dsd missover;
>input city $ c1 $ d1 c2 $ d2 c3 $ d3 c4 $ d4 c5 $ d5 c6 $ d6 c7 $ d7 c8 $
d8;
>cards;
>Berlin,Cairo,1795,Calcutta,4368,Chicago,4405,Caracas,5247,HK,5440,Cape,598
1,Honolulu,7309,BA,7402
>BA,Caracas,3168,Cape,4269,Chicago,5598,Cairo,7345,Berlin,7402,Honolulu,756
1,Calcutta,10265,HK,11472
>Cairo,Berlin,1795,Calcutta,3539,Cape,4500,HK,5061,Chicago,6129,Caracas,633
8,BA,7345,Honolulu,8838
>Calcutta,HK,1648,Cairo,3539,Berlin,4368,Cape,6024,Honolulu,7047,Chicago,79
80,Caracas,9605,BA,10265
>Cape,BA,4269,Cairo,4500,Berlin,5981,Calcutta,6024,Caracas,6365,HK,7375,Chi
cago,8494,Honolulu,11534
>Caracas,Chicago,2501,BA,3168,Berlin,5247,Honolulu,6013,Cairo,6338,Cape,636
5,Calcutta,9605,HK,10167
>Chicago,Caracas,2501,Honolulu,4250,Berlin,4405,BA,5598,Cairo,6129,HK,7793,
Calcutta,7980,Cape,8494
>HK,Calcutta,1648,Cairo,5061,Berlin,5440,Honolulu,5549,Cape,7375,Chicago,77
93,Caracas,10167,BA,11472
>Honolulu,Chicago,4250,HK,5549,Caracas,6013,Calcutta,7047,Berlin,7309,BA,75
61,Cairo,8838,Cape,11534
>;
>run;
>
>I would like to group these cities such that all cities in a group are
>within 5000 miles of each other. I am looking for the least possible
number
>of groupings that can be attained. The output I am looking for is like
below:
>
>City Group
>HK 1
>Calcutta 1
>BA 2
>... and so on.
>
>Can PROC CLUSTER be used for this purpose? Thank you in advance for any
>suggestions.
>
>Sid
|