Date: Wed, 27 Jun 2007 15:31:51 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
Subject: Geo clustering
Content-Type: text/plain; charset="iso-8859-1"
I'm trying to find geographical patterns.
The data is a large mailbase of prospects (several million
observations). All were mailed a solicitation, some responded, most
didn't. For each prospect I have the ZIP code and hence can look up
approximate latitude and longitude. I can also calculate the distance
in miles between any two ZIP codes.
I'd like to identify areas with high and low response rate that are
sufficiently large and stable, for example, by grouping individual ZIP
codes into relatively few large clusters (maybe 2 to 5).
I started by grouping the data by ZIP code, calculated response rate
for each ZIP, and then did hierarchical clustering. The results were
not very good, partly because in some ZIPs there were few responders,
and partly because the clusters turned out too round.
I suspect there must be a better way, but what is it? Maybe Kohonen's