Sid,

If you want to include all cities within a distance of 5000 miles from each other, then I don't see how a cluster algorithm would help.

I left out some code because all of your input data was already sorted by distance, and I changed city to c1 and added a distance of 0 to further simplify the code.

The only thing I added was a proc sort do get rid of duplicate groups:

data have (drop=i); infile cards dlm = ',' dsd missover; array cities(*) \$ c1-c9; array distance(*) d1-d9; input c1 \$ d1 c2 \$ d2 c3 \$ d3 c4 \$ d4 c5 \$ d5 c6 \$ d6 c7 \$ d7 c8 \$ d8 c9 \$ d9; do i=1 to 9; if distance(i) gt 5000 then do; call missing (cities(i)); call missing (distance(i)); end; end; cards; Berlin,0,Cairo,1795,Calcutta,4368,Chicago,4405,Caracas,5247,HK,5440,Cape,59 81,Honolulu,7309,BA,7402 BA,0,Caracas,3168,Cape,4269,Chicago,5598,Cairo,7345,Berlin,7402,Honolulu,75 61,Calcutta,10265,HK,11472 Cairo,0,Berlin,1795,Calcutta,3539,Cape,4500,HK,5061,Chicago,6129,Caracas,63 38,BA,7345,Honolulu,8838 Calcutta,0,HK,1648,Cairo,3539,Berlin,4368,Cape,6024,Honolulu,7047,Chicago,7 980,Caracas,9605,BA,10265 Cape,0,BA,4269,Cairo,4500,Berlin,5981,Calcutta,6024,Caracas,6365,HK,7375,Ch icago,8494,Honolulu,11534 Caracas,0,Chicago,2501,BA,3168,Berlin,5247,Honolulu,6013,Cairo,6338,Cape,63 65,Calcutta,9605,HK,10167 Chicago,0,Caracas,2501,Honolulu,4250,Berlin,4405,BA,5598,Cairo,6129,HK,7793 ,Calcutta,7980,Cape,8494 HK,0,Calcutta,1648,Cairo,5061,Berlin,5440,Honolulu,5549,Cape,7375,Chicago,7 793,Caracas,10167,BA,11472 Honolulu,0,Chicago,4250,HK,5549,Caracas,6013,Calcutta,7047,Berlin,7309,BA,7 561,Cairo,8838,Cape,11534 ;

proc sort data=have out=want nodupkey; by c1 c2 c3 c4 c5 c6 c7 c8 c9; run;

Does that accomplish what you want?

>Hi, > >I am trying to group a list of cities based on distances from each other. >The distance matrix as provided in the below dataset shows distances from >each city to the eight closest cities (c1-c8). The corresponding distances >(in ascending order) to each of these cities are in columns d1-d8. > >data have; >infile cards dlm = ',' dsd missover; >input city \$ c1 \$ d1 c2 \$ d2 c3 \$ d3 c4 \$ d4 c5 \$ d5 c6 \$ d6 c7 \$ d7 c8 \$ d8; >cards; >Berlin,Cairo,1795,Calcutta,4368,Chicago,4405,Caracas,5247,HK,5440,Cape,598 1,Honolulu,7309,BA,7402 >BA,Caracas,3168,Cape,4269,Chicago,5598,Cairo,7345,Berlin,7402,Honolulu,756 1,Calcutta,10265,HK,11472 >Cairo,Berlin,1795,Calcutta,3539,Cape,4500,HK,5061,Chicago,6129,Caracas,633 8,BA,7345,Honolulu,8838 >Calcutta,HK,1648,Cairo,3539,Berlin,4368,Cape,6024,Honolulu,7047,Chicago,79 80,Caracas,9605,BA,10265 >Cape,BA,4269,Cairo,4500,Berlin,5981,Calcutta,6024,Caracas,6365,HK,7375,Chi cago,8494,Honolulu,11534 >Caracas,Chicago,2501,BA,3168,Berlin,5247,Honolulu,6013,Cairo,6338,Cape,636 5,Calcutta,9605,HK,10167 >Chicago,Caracas,2501,Honolulu,4250,Berlin,4405,BA,5598,Cairo,6129,HK,7793, Calcutta,7980,Cape,8494 >HK,Calcutta,1648,Cairo,5061,Berlin,5440,Honolulu,5549,Cape,7375,Chicago,77 93,Caracas,10167,BA,11472 >Honolulu,Chicago,4250,HK,5549,Caracas,6013,Calcutta,7047,Berlin,7309,BA,75 61,Cairo,8838,Cape,11534 >; >run; > >I would like to group these cities such that all cities in a group are >within 5000 miles of each other. I am looking for the least possible number >of groupings that can be attained. The output I am looking for is like below: > >City Group >HK 1 >Calcutta 1 >BA 2 >... and so on. > >Can PROC CLUSTER be used for this purpose? Thank you in advance for any >suggestions. > >Sid

