LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2001, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 27 Aug 2001 10:11:13 -0700
Reply-To:   Cassell.David@EPAMAIL.EPA.GOV
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject:   Re: PCA for Dummies
Comments:   To: Michael A O Ash <mash@ECONS.UMASS.EDU>
Content-type:   text/plain; charset=us-ascii

[personal email cc] Michael A O Ash wrote [in part]: > If each of the measures (machine1-machine5) were continuous, I would use > Principal Component Analysis (proc princomp in SAS or factor in Stata) to > compute the eigenvalues of the correlation matrix for machine1-5 and then > could use the entries of the first eigenvector as a1 through a5 to > generate the high-tech index, thus: > > hightech = a1*machine1 + a2*machine2 + a3*machine3 + a4*machine4 + a5*machine5 > > (with the usual caveats about principal component analysis). However, I > have been told, but don't understand why, that this is not ok for > indicator variables.

{BTW, great Subject line for your message!}

I haven't seen anyone reply to this, so I thought I would drop you a line. The problem with using PCA [or any analogous approach] on your data is not the development of the eigenvectors, but the analyses that lurks just on the other side of the matrix manipulations. The matrix math doesn't assume any particular joint distribution of the data, so you could get that 'high-tech' index. But to do anything statistical with it might get you into some trouble. If you just want your index, and at some point you might use it as a variable in some other analysis, that should be okay. That weighted sum of indicators will probably end up looking normal enough to use. But if you get to that stage, you might prefer to use something like Partial Least Squares, which is available in PROC PLS. There's even a tech document on PROC PLS at the SAS website.

David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page