Date: Wed, 11 Oct 2006 08:22:48 -0400
Reply-To: Peter Flom <flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <flom@NDRI.ORG>
Subject: Re: To statisticians : Cluster Analysis
Content-Type: text/plain; charset=US-ASCII
>>> Cat <job.alerte@GMAIL.COM> 10/11/06 5:07 AM >>>
I'm new to clusters analysis and I have to plan one in order to
identify 2 groups of patients who show responsiveness (or not) to a
I had a look on SAS documentation related to Proc Cluster (114 pages!)
but I'm quite bothered because there are many distances which are
proposed. I don't know which strategy I should plan: I think the best
would be to choose the distance leading to the grouping which best fits
with my responsiveness criterion. But this means that I'd have to
identify clusters resulting to each distance and test association with
my responsiveness criterion each time.
Do you know which distances are the most currently used ?
Are you sure it is cluster analysis that you want? It seems unlikely to
me, based on what you've written.
You don't give many details, but I imagine you have a number of
patients, some variables about each one,
including whether or not they respond to treatment. Am I right so far?
If I am, I suggest that you do not want cluster analysis at all, you
probably want logistic regression.
Is 'response' to treatment a yes/no variable, or ordered scale, or some
continuous variable, or what? If it
is a dichotomy or ordered scale, then you want logistic. If it is a
count, you want some sort of count regression (maybe Poisson, or neg.
binomial, or who knows?) If it is continuous, you may want OLS
BUT all of the above depends on whether the IVs are independent (if not,
you will want MIXED or NLMIXED, or GLIMMIX), and whether this data is
from a survey (in which case, you may want one of the SURVEY PROCs).
Cluster analysis is for seeing which groups of subjects are 'close' to
each other in some sort of multidimensional space. Usually, it is
intended to discover groupings that may exist, but which you don't know
about. Here, you already know the group membership, so cluster is not
what you want.
Now, if all of the above is somehow wrong, and you DO want cluster, the
choice of distance will depend on the nature of your variables and on
what distance makes sense. OTOH, usually the different distnaces
correlate very highly, and often give similar results in Cluster