| Date: | Mon, 27 Aug 2001 10:11:13 -0700 |
| Reply-To: | Cassell.David@EPAMAIL.EPA.GOV |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV> |
| Subject: | Re: PCA for Dummies |
|
| Content-type: | text/plain; charset=us-ascii |
|---|
[personal email cc]
Michael A O Ash wrote [in part]:
> If each of the measures (machine1-machine5) were continuous, I would use
> Principal Component Analysis (proc princomp in SAS or factor in Stata) to
> compute the eigenvalues of the correlation matrix for machine1-5 and then
> could use the entries of the first eigenvector as a1 through a5 to
> generate the high-tech index, thus:
>
> hightech = a1*machine1 + a2*machine2 + a3*machine3 + a4*machine4 +
a5*machine5
>
> (with the usual caveats about principal component analysis). However, I
> have been told, but don't understand why, that this is not ok for
> indicator variables.
{BTW, great Subject line for your message!}
I haven't seen anyone reply to this, so I thought I would drop you a line.
The problem with using PCA [or any analogous approach] on your data is
not the development of the eigenvectors, but the analyses that lurks just
on
the other side of the matrix manipulations. The matrix math doesn't assume
any particular joint distribution of the data, so you could get that
'high-tech'
index. But to do anything statistical with it might get you into some
trouble.
If you just want your index, and at some point you might use it as a
variable
in some other analysis, that should be okay. That weighted sum of
indicators
will probably end up looking normal enough to use. But if you get to that
stage, you might prefer to use something like Partial Least Squares, which
is
available in PROC PLS. There's even a tech document on PROC PLS at the
SAS website.
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|