LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 11 Oct 2006 08:22:48 -0400
Reply-To:     Peter Flom <flom@NDRI.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Peter Flom <flom@NDRI.ORG>
Subject:      Re: To statisticians : Cluster Analysis
Comments: To: job.alerte@GMAIL.COM
Content-Type: text/plain; charset=US-ASCII

>>> Cat <job.alerte@GMAIL.COM> 10/11/06 5:07 AM >>> <<< I'm new to clusters analysis and I have to plan one in order to identify 2 groups of patients who show responsiveness (or not) to a treatment.

I had a look on SAS documentation related to Proc Cluster (114 pages!) but I'm quite bothered because there are many distances which are proposed. I don't know which strategy I should plan: I think the best would be to choose the distance leading to the grouping which best fits with my responsiveness criterion. But this means that I'd have to identify clusters resulting to each distance and test association with my responsiveness criterion each time.

Do you know which distances are the most currently used ? >>>

Are you sure it is cluster analysis that you want? It seems unlikely to me, based on what you've written. You don't give many details, but I imagine you have a number of patients, some variables about each one, including whether or not they respond to treatment. Am I right so far?

If I am, I suggest that you do not want cluster analysis at all, you probably want logistic regression.

Is 'response' to treatment a yes/no variable, or ordered scale, or some continuous variable, or what? If it is a dichotomy or ordered scale, then you want logistic. If it is a count, you want some sort of count regression (maybe Poisson, or neg. binomial, or who knows?) If it is continuous, you may want OLS regression,

BUT all of the above depends on whether the IVs are independent (if not, you will want MIXED or NLMIXED, or GLIMMIX), and whether this data is from a survey (in which case, you may want one of the SURVEY PROCs).

Cluster analysis is for seeing which groups of subjects are 'close' to each other in some sort of multidimensional space. Usually, it is intended to discover groupings that may exist, but which you don't know about. Here, you already know the group membership, so cluster is not what you want.

Now, if all of the above is somehow wrong, and you DO want cluster, the choice of distance will depend on the nature of your variables and on what distance makes sense. OTOH, usually the different distnaces correlate very highly, and often give similar results in Cluster Analysis.

HTH

Peter


Back to: Top of message | Previous page | Main SAS-L page